Databricks is a leading AI cloud-native platform that unifies data engineering, machine learning, and analytics at scale.
Its powerful data lakehouse architecture combines the performance of data warehouses with the flexibility of data lakes.
Integrating Databricks with CData Connect AI
gives organizations live, real-time access to Okta data without the need for complex ETL pipelines or
data duplication—streamlining operations and reducing time-to-insights.
In this article, we'll walk through how to configure a secure, live connection from Databricks to Okta
using CData Connect AI. Once configured, you'll be able to access Okta data directly from Databricks notebooks
using standard SQL—enabling unified, real-time analytics across your data ecosystem.
Overview
Here is an overview of the simple steps:
-
Step 1 — Connect and Configure:
In CData Connect AI, create a connection to your Okta source, configure user permissions,
and generate a Personal Access Token (PAT).
-
Step 2 — Query from Databricks:
Install the CData JDBC driver in Databricks, configure your notebook with the connection details,
and run SQL queries to access live Okta data.
Prerequisites
Before you begin, make sure you have the following:
-
An active Okta account.
-
A CData Connect AI account. You can log in or
sign up for a free trial here.
-
A Databricks account. Sign up or log in here.
Step 1: Connect and Configure a Okta Connection in CData Connect AI
1.1 Add a Connection to Okta
CData Connect AI uses a straightforward, point-and-click interface to connect to available data sources.
-
Log into Connect AI, click Sources on the left, and then
click Add Connection in the top-right.
👁 Adding a Connection in CData Connect AI
- Select "Okta" from the Add Connection panel.
👁 Selecting a data source
-
Enter the necessary authentication properties to connect to Okta.
To connect to Okta, set the Domain connection string property to your Okta domain.
You will use OAuth to authenticate with Okta, so you need to create a custom OAuth application.
Creating a Custom OAuth Application
From your Okta account:
- Sign in to your Okta developer edition organization with your administrator account.
- In the Admin Console, go to Applications > Applications.
- Click Create App Integration.
- For the Sign-in method, select OIDC - OpenID Connect.
- For Application type, choose Web Application.
- Enter a name for your custom application.
- Set the Grant Type to Authorization Code. If you want the token to be automatically refreshed, also check Refresh Token.
- Set the callback URL:
- For desktop applications and headless machines, use http://localhost:33333 or another port number of your choice. The URI you set here becomes the property.
- For web applications, set the callback URL to a trusted redirect URL. This URL is the web location the user returns to with the token that verifies that your application has been granted access.
- In the Assignments section, either select Limit access to selected groups and add a group, or skip group assignment for now.
- Save the OAuth application.
- The application's Client Id and Client Secret are displayed on the application's General tab. Record these for future use. You will use the Client Id to set the OAuthClientId and the Client Secret to set the OAuthClientSecret.
- Check the Assignments tab to confirm that all users who must access the application are assigned to the application.
- On the Okta API Scopes tab, select the scopes you wish to grant to the OAuth application. These scopes determine the data that the app has permission to read, so a scope for a particular view must be granted for the driver to have permission to query that view. To confirm the scopes required for each view, see the view-specific pages in Data Model < Views in the Help documentation.
👁 Configuring a connection (Salesforce is shown)
-
Click Save & Test in the top-right.
-
Navigate to the Permissions tab on the Okta Connection page
and update the user-based permissions based on your preferences.
👁 Updating permissions
1.2 Generate a Personal Access Token (PAT)
When connecting to Connect AI through the REST API, the OData API, or the Virtual SQL Server,
a Personal Access Token (PAT) is used to authenticate the connection to Connect AI. PAT functions as an
alternative to your login credentials for secure, token-based authentication. It is a best practice to
create a separate PAT for each service to maintain granularity of access.
-
Click on the Gear icon () at the top right of the Connect AI app to open the settings page.
-
On the Settings page, go to the Access Tokens section and click Create PAT.
-
Give the PAT a name and click Create.
👁 Creating a new PAT
-
Note: The personal access token is only visible at creation, so be sure to copy it and store it securely for future use.
Step 2: Connect and Query Okta Data in Databricks
Follow these steps to establish a connection from Databricks to Okta.
You'll install the CData JDBC Driver for Connect AI, add the JAR file to your cluster, configure your notebooks,
and run SQL queries to access live Okta data data.
2.1 Install the CData JDBC Driver for Connect AI
-
In CData Connect AI, click the Integrations page on the left.
Search for JDBC or Databricks, click Download,
and select the installer for your operating system.
-
Once downloaded, run the installer and follow the instructions:
-
For Windows: Run the setup file and follow the installation wizard.
-
For Mac/Linux: Unpack the archive and move the folder to /opt or
/Applications. Make sure you have execute permissions.
-
After installation, locate the JAR file in the installation directory:
2.2 Install the JAR File on Databricks
-
Log in to Databricks. In the navigation pane, click Compute on the left. Start or create a compute cluster.
👁 Launching a compute cluster in Databricks
-
Click on the running cluster, go to the Libraries tab, and click Install New at the top right.
👁 Accessing the Libraries tab in Databricks
-
In the Install Library dialog, select DBFS, and drag and drop the
cdata.jdbc.connect.jar file. Click Install.
👁 Uploading the JDBC driver JAR to DBFS
2.3 Query Okta Data in a Databricks Notebook
Notebook Script 1 — Define JDBC Connection:
- Paste the following script into the notebook cell:
driver = "cdata.jdbc.connect.ConnectDriver"
url = "jdbc:connect:AuthScheme=Basic;User=your_username;Password=your_pat;URL=https://cloud.cdata.com/api/;DefaultCatalog=Your_Connection_Name;"
- Replace:
- your_username - With your CData Connect AI username
- your_pat - With your CData Connect AI Personal Access Token (PAT)
- Your_Connection_Name - With the name of your Connect AI data source, from the Sources page
- Run the script.
Notebook Script 2 — Load DataFrame from Okta data:
- Add a new cell for this second script. From the menu on the right side of your notebook, click Add cell below.
- Paste the following script into the new cell:
remote_table = spark.read.format("jdbc") \
.option("driver", "cdata.jdbc.connect.ConnectDriver") \
.option("url", "jdbc:connect:AuthScheme=Basic;User=your_username;Password=your_pat;URL=https://cloud.cdata.com/api/;DefaultCatalog=Your_Connection_Name;") \
.option("dbtable", "YOUR_SCHEMA.YOUR_TABLE") \
.load()
- Replace:
- your_username - With your CData Connect AI username
- your_pat - With your CData Connect AI Personal Access Token (PAT)
- Your_Connection_Name - With the name of your Connect AI data source, from the Sources page
- YOUR_SCHEMA.YOUR_TABLE - With your schema and table, for example, Okta.Users
- Run the script.
Notebook Script 3 — Preview Columns:
- Similarly, add a new cell for this third script.
- Paste the following script into the new cell:
display(remote_table.select("ColumnName1", "ColumnName2"))
- Replace ColumnName1 and ColumnName2 with the actual columns from your Okta structure (e.g. Id, ProfileFirstName, etc.).
- Run the script.
👁 Previewing Okta data data in Databricks notebook
You can now explore, join, and analyze live Okta data directly within Databricks
notebooks—without needing to know the complexities of the back-end API and without replicating Okta data.
Try CData Connect AI Free for 14 Days
Ready to simplify real-time access to Okta data?
Start your free 14-day trial of CData Connect AI today
and experience seamless, live connectivity from Databricks to Okta.
Low code, zero infrastructure, zero replication — just seamless, secure access to your
most critical data and insights.