![]() |
VOOZH | about |
Databricks is a leading AI cloud-native platform that unifies data engineering, machine learning, and analytics at scale. Its powerful data lakehouse architecture combines the performance of data warehouses with the flexibility of data lakes. Integrating Databricks with CData Connect AI gives organizations live, real-time access to Amazon Athena data without the need for complex ETL pipelines or data duplication—streamlining operations and reducing time-to-insights.
In this article, we'll walk through how to configure a secure, live connection from Databricks to Amazon Athena using CData Connect AI. Once configured, you'll be able to access Amazon Athena data directly from Databricks notebooks using standard SQL—enabling unified, real-time analytics across your data ecosystem.
CData provides the easiest way to access and integrate live data from Amazon Athena. Customers use CData connectivity to:
Users frequently integrate Athena with analytics tools like Tableau, Power BI, and Excel for in-depth analytics from their preferred tools.
To learn more about unique Amazon Athena use cases with CData, check out our blog post: https://www.cdata.com/blog/amazon-athena-use-cases.
Here is an overview of the simple steps:
Before you begin, make sure you have the following:
CData Connect AI uses a straightforward, point-and-click interface to connect to available data sources.
To authorize Amazon Athena requests, provide the credentials for an administrator account or for an IAM user with custom permissions: Set to the access key Id. Set to the secret access key.
Note: Though you can connect as the AWS account administrator, it is recommended to use IAM user credentials to access AWS services.
To obtain the credentials for an IAM user, follow the steps below:
To obtain the credentials for your AWS root account, follow the steps below:
If you are using the CData Data Provider for Amazon Athena 2018 from an EC2 Instance and have an IAM Role assigned to the instance, you can use the IAM Role to authenticate. To do so, set to true and leave and empty. The CData Data Provider for Amazon Athena 2018 will automatically obtain your IAM Role credentials and authenticate with them.
In many situations it may be preferable to use an IAM role for authentication instead of the direct security credentials of an AWS root user. An AWS role may be used instead by specifying the . This will cause the CData Data Provider for Amazon Athena 2018 to attempt to retrieve credentials for the specified role. If you are connecting to AWS (instead of already being connected such as on an EC2 instance), you must additionally specify the and of an IAM user to assume the role for. Roles may not be used when specifying the and of an AWS root user.
For users and roles that require Multi-factor Authentication, specify the and connection properties. This will cause the CData Data Provider for Amazon Athena 2018 to submit the MFA credentials in a request to retrieve temporary authentication credentials. Note that the duration of the temporary credentials may be controlled via the (default 3600 seconds).
In addition to the and properties, specify , and . Set to the region where your Amazon Athena data is hosted. Set to a folder in S3 where you would like to store the results of queries.
If is not set in the connection, the data provider connects to the default database set in Amazon Athena.
👁 Configuring a connection (Salesforce is shown)When connecting to Connect AI through the REST API, the OData API, or the Virtual SQL Server, a Personal Access Token (PAT) is used to authenticate the connection to Connect AI. PAT functions as an alternative to your login credentials for secure, token-based authentication. It is a best practice to create a separate PAT for each service to maintain granularity of access.
Follow these steps to establish a connection from Databricks to Amazon Athena. You'll install the CData JDBC Driver for Connect AI, add the JAR file to your cluster, configure your notebooks, and run SQL queries to access live Amazon Athena data data.
C:\Program Files\CData\CData JDBC Driver for Connect AI\lib\cdata.jdbc.connect.jar
/Applications/CData/CData JDBC Driver for Connect AI/lib/cdata.jdbc.connect.jar
driver = "cdata.jdbc.connect.ConnectDriver" url = "jdbc:connect:AuthScheme=Basic;User=your_username;Password=your_pat;URL=https://cloud.cdata.com/api/;DefaultCatalog=Your_Connection_Name;"
remote_table = spark.read.format("jdbc") \
.option("driver", "cdata.jdbc.connect.ConnectDriver") \
.option("url", "jdbc:connect:AuthScheme=Basic;User=your_username;Password=your_pat;URL=https://cloud.cdata.com/api/;DefaultCatalog=Your_Connection_Name;") \
.option("dbtable", "YOUR_SCHEMA.YOUR_TABLE") \
.load()
display(remote_table.select("ColumnName1", "ColumnName2"))
You can now explore, join, and analyze live Amazon Athena data directly within Databricks notebooks—without needing to know the complexities of the back-end API and without replicating Amazon Athena data.
Ready to simplify real-time access to Amazon Athena data? Start your free 14-day trial of CData Connect AI today and experience seamless, live connectivity from Databricks to Amazon Athena.
Low code, zero infrastructure, zero replication — just seamless, secure access to your most critical data and insights.
Learn more about CData Connect AI or sign up for free trial access:
Free Trial