datapine is a browser-based business intelligence platform. When paired with the CData Connect AI, you get access to your Azure Data Lake Storage data directly from your datapine visualizations and dashboards. This article describes connecting to Azure Data Lake Storage in CData Connect AI and building a simple Azure Data Lake Storage-connected visualization in datapine.
CData Connect AI provides a pure SQL Server interface for Azure Data Lake Storage, allowing you to query data from Azure Data Lake Storage without replicating the data to a natively supported database. Using optimized data processing out of the box, CData Connect AI pushes all supported SQL operations (filters, JOINs, etc.) directly to Azure Data Lake Storage, leveraging server-side processing to return the requested Azure Data Lake Storage data quickly.
Configure Azure Data Lake Storage Connectivity for datapine
Connectivity to Azure Data Lake Storage from datapine is made possible through CData Connect AI. To work with Azure Data Lake Storage data from datapine, we start by creating and configuring a Azure Data Lake Storage connection.
-
Log into Connect AI, click Sources, and then click Add Connection
π Adding a Connection
- Select "Azure Data Lake Storage" from the Add Connection panel
π Selecting a data source
-
Enter the necessary authentication properties to connect to Azure Data Lake Storage.
Authenticating to a Gen 1 DataLakeStore Account
Gen 1 uses OAuth 2.0 in Entra ID (formerly Azure AD) for authentication.
For this, an Active Directory web application is required. You can create one as follows:
- Sign in to your Azure Account through the
- Select "Entra ID" (formerly Azure AD).
- Select "App registrations".
- Select "New application registration".
- Provide a name and URL for the application. Select Web app for the type of application you want to create.
- Select "Required permissions" and change the required permissions for this app. At a minimum, "Azure Data Lake" and "Windows Azure Service Management API" are required.
- Select "Key" and generate a new key. Add a description, a duration, and take note of the generated key. You won't be able to see it again.
To authenticate against a Gen 1 DataLakeStore account, the following properties are required:
- Schema: Set this to ADLSGen1.
- Account: Set this to the name of the account.
- OAuthClientId: Set this to the application Id of the app you created.
- OAuthClientSecret: Set this to the key generated for the app you created.
- TenantId: Set this to the tenant Id. See the property for more information on how to acquire this.
- Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.
Authenticating to a Gen 2 DataLakeStore Account
To authenticate against a Gen 2 DataLakeStore account, the following properties are required:
- Schema: Set this to ADLSGen2.
- Account: Set this to the name of the account.
- FileSystem: Set this to the file system which will be used for this account.
- AccessKey: Set this to the access key which will be used to authenticate the calls to the API. See the property for more information on how to acquire this.
- Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.
π Configuring a connection (Salesforce is shown)
-
Click Save & Test
-
Navigate to the Permissions tab in the Add Azure Data Lake Storage Connection page and update the User-based permissions.
π Updating permissions
Add a Personal Access Token
When connecting to Connect AI through the REST API, the OData API, or the Virtual SQL Server, a Personal Access Token (PAT) is used to authenticate the connection to Connect AI. It is best practice to create a separate PAT for each service to maintain granularity of access.
-
Click on the Gear icon () at the top right of the Connect AI app to open the settings page.
-
On the Settings page, go to the Access Tokens section and click Create PAT.
-
Give the PAT a name and click Create.
π Creating a new PAT
-
The personal access token is only visible at creation, so be sure to copy it and store it securely for future use.
With the connection configured and a PAT generated, you are ready to connect to Azure Data Lake Storage data from datapine.
Connecting to Azure Data Lake Storage from datapine
Once you configure your connection to Azure Data Lake Storage in Connect AI, you are ready to connect to Azure Data Lake Storage from datapine.
- Log into datapine
- Click Connect to navigate to the "Connect" page
- Select MS SQL Server as the data source
- In the Integration step, fill in the connection properties and click "Save and Proceed"
- Set the Internal Name
- Set Database Name to the name of the connection we just configured (e.g. ADLS1)
- Set Host / IP to "tds.cdata.com"
- Set Username to your Connect AI username (e.g. [email protected])
- Set Password to the corresponding PAT
- Set Database Port to "14333"
π Configuring the connection to CData Connect AI
- In the Data Schema step, select the tables and fields to visualize and click "Save and Proceed"
π Selecting tables and fields to visualize (Salesforce is shown)
- In the References step, define any relationships between your selected tables and click "Save and Proceed"
π Defining foreign key relationships
- In the Data Transfer step, click "Go to Analyzer"
Visualize Azure Data Lake Storage Data in datapine
After connecting to CData Connect AI, you are ready to visualize your Azure Data Lake Storage data in datapine. Simply select the dimensions and measures you wish to visualize!
π Visualizing data in datapine (Salesforce is shown)
Having connect to Azure Data Lake Storage from datapine, you are now able to visualize and analyze real-time Azure Data Lake Storage data no matter where you are. To get live data access to hundreds of SaaS, Big Data, and NoSQL sources directly from datapine, try CData Connect AI today!