![]() |
VOOZH | about |
Databricks is a leading AI cloud-native platform that unifies data engineering, machine learning, and analytics at scale. Its powerful data lakehouse architecture combines the performance of data warehouses with the flexibility of data lakes. Integrating Databricks with CData Connect AI gives organizations live, real-time access to MongoDB data without the need for complex ETL pipelines or data duplication—streamlining operations and reducing time-to-insights.
In this article, we'll walk through how to configure a secure, live connection from Databricks to MongoDB using CData Connect AI. Once configured, you'll be able to access MongoDB data directly from Databricks notebooks using standard SQL—enabling unified, real-time analytics across your data ecosystem.
Accessing and integrating live data from MongoDB has never been easier with CData. Customers rely on CData connectivity to:
MongoDB's flexibility means that it can be used as a transactional, operational, or analytical database. That means CData customers use our solutions to integrate their business data with MongoDB or integrate their MongoDB data with their data warehouse (or both). Customers also leverage our live connectivity options to analyze and report on MongoDB directly from their preferred tools, like Power BI and Tableau.
For more details on MongoDB use case and how CData enhances your MongoDB experience, check out our blog post: The Top 10 Real-World MongoDB Use Cases You Should Know in 2024.
Here is an overview of the simple steps:
Before you begin, make sure you have the following:
CData Connect AI uses a straightforward, point-and-click interface to connect to available data sources.
Set the Server, Database, User, and Password connection properties to connect to MongoDB. To access MongoDB collections as tables you can use automatic schema discovery or write your own schema definitions. Schemas are defined in .rsd files, which have a simple format. You can also execute free-form queries that are not tied to the schema.
👁 Configuring a connection (Salesforce is shown)When connecting to Connect AI through the REST API, the OData API, or the Virtual SQL Server, a Personal Access Token (PAT) is used to authenticate the connection to Connect AI. PAT functions as an alternative to your login credentials for secure, token-based authentication. It is a best practice to create a separate PAT for each service to maintain granularity of access.
Follow these steps to establish a connection from Databricks to MongoDB. You'll install the CData JDBC Driver for Connect AI, add the JAR file to your cluster, configure your notebooks, and run SQL queries to access live MongoDB data data.
C:\Program Files\CData\CData JDBC Driver for Connect AI\lib\cdata.jdbc.connect.jar
/Applications/CData/CData JDBC Driver for Connect AI/lib/cdata.jdbc.connect.jar
driver = "cdata.jdbc.connect.ConnectDriver" url = "jdbc:connect:AuthScheme=Basic;User=your_username;Password=your_pat;URL=https://cloud.cdata.com/api/;DefaultCatalog=Your_Connection_Name;"
remote_table = spark.read.format("jdbc") \
.option("driver", "cdata.jdbc.connect.ConnectDriver") \
.option("url", "jdbc:connect:AuthScheme=Basic;User=your_username;Password=your_pat;URL=https://cloud.cdata.com/api/;DefaultCatalog=Your_Connection_Name;") \
.option("dbtable", "YOUR_SCHEMA.YOUR_TABLE") \
.load()
display(remote_table.select("ColumnName1", "ColumnName2"))
You can now explore, join, and analyze live MongoDB data directly within Databricks notebooks—without needing to know the complexities of the back-end API and without replicating MongoDB data.
Ready to simplify real-time access to MongoDB data? Start your free 14-day trial of CData Connect AI today and experience seamless, live connectivity from Databricks to MongoDB.
Low code, zero infrastructure, zero replication — just seamless, secure access to your most critical data and insights.
Learn more about CData Connect AI or sign up for free trial access:
Free Trial