VOOZH about

URL: https://www.cdata.com/kb/tech/databricks-cloud-mulesoft.rst

⇱ Integrate with Live Databricks Data in MuleSoft (via CData Connect AI)


Integrate with Live Databricks Data in MuleSoft (via CData Connect AI)

πŸ‘ Dibyendu Datta
Dibyendu Datta
Lead Technology Evangelist
Use CData Connect AI to connect to Databricks from the MuleSoft Anypoint Platform to integrate live Databricks data into custom reports and dashboards.

The MuleSoft Anypoint Platform enables the building, deployment, and management of APIs and integrations, facilitating seamless connectivity across applications and systems. When combined with CData Connect AI, it provides access to Databricks data for visualizations, dashboards, and more. This article explains how to use CData Connect AI to create a live connection to Databricks and how to connect and access live Databricks data from the MuleSoft Anypoint Platform.

Prerequisites

Before configuring and using MuleSoft with CData Connect AI, you must first connect a data source to your CData Connect AI account. For more information, see the Connections section.

Additionally, you need to generate a Personal Access Token (PAT) on the Settings page. Be sure to copy it down, as it serves as your password during authentication.

About Databricks Data Integration

Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:

  • Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
  • Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
  • Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
  • Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.

While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.

Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.


Getting Started


Configure Databricks Connectivity for MuleSoft

Connectivity to Databricks from MuleSoft is made possible through CData Connect AI. To work with Databricks data from MuleSoft, we start by creating and configuring a Databricks connection.

  1. Log into Connect AI, click Sources, and then click Add Connection
  2. πŸ‘ Adding a Connection
  3. Select "Databricks" from the Add Connection panel
  4. πŸ‘ Selecting a data source
  5. Enter the necessary authentication properties to connect to Databricks.

    To connect to a Databricks cluster, set the properties as described below.

    Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.

    • Server: Set to the Server Hostname of your Databricks cluster.
    • HTTPPath: Set to the HTTP Path of your Databricks cluster.
    • Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).
    πŸ‘ Configuring a connection (Salesforce is shown)
  6. Click Save & Test
  7. Navigate to the Permissions tab in the Add Databricks Connection page and update the User-based permissions. πŸ‘ Updating permissions

Add a Personal Access Token

When connecting to Connect AI through the REST API, the OData API, or the Virtual SQL Server, a Personal Access Token (PAT) is used to authenticate the connection to Connect AI. It is best practice to create a separate PAT for each service to maintain granularity of access.

  1. Click on the Gear icon () at the top right of the Connect AI app to open the settings page.
  2. On the Settings page, go to the Access Tokens section and click Create PAT.
  3. Give the PAT a name and click Create. πŸ‘ Creating a new PAT
  4. The personal access token is only visible at creation, so be sure to copy it and store it securely for future use.

With the connection configured and a PAT generated, you are ready to connect to Databricks data from Mulesoft.

Connecting to CData Connect AI

Follow these steps to establish a connection from Mulesoft to CData Connect AI through the JDBC driver:

  1. Download and install the CData Connect AI JDBC driver.
    • Open the Integrations page of CData Connect AI.
    • Search for and select JDBC.
    • Download and run the setup file.
    • When the installation is complete, the JAR file can be found in the installation directory (inside the lib folder).
  2. Log into Mulesoft Anypoint Studio or launch the desktop application.
  3. Create a new Mulesoft project. πŸ‘ Create a new MuleSoft project
    πŸ‘ Add the project name
    The new project appears in a project folder. πŸ‘ The new project is created
  4. In the Mule Palette located on the right, drag an HTTP Listener to the Message Flow area. πŸ‘ Drag the HTTP Listener to the Message Flow area
  5. Click on the HTTP Listener to configure it. πŸ‘ Click on the HTTP Listener to configure it
  6. Click the + sign on the right of Connector configuration. The HTTP Listener config dialog appears.
  7. Configure the HTTP Listener, providing a Port on which to query your data, and click OK. πŸ‘ Add the port number to configure the HTTP Listener
  8. Provide a path on which to perform the actions. The HTTP Listener is now configured. πŸ‘ Provide a path to perform the actions
  9. In the Mule Palette on the right, type database in the search bar. πŸ‘ Search for database in Mule Palette search bar
  10. Drag the database operation you want to perform to the Message Flow area. For this example, we choose Select. πŸ‘ Drag the database operation in the Message Flow area
  11. Select Generic Connection from the Connection dropdown in the Database Config dialog. πŸ‘ Select Generic Connection from the Connection dropdown
  12. Click the Configure button to configure the JDBC driver. Select Use local file from the drop-down list. πŸ‘ Select Use local file from the dropdown
  13. Locate the CData Connect AI JAR file from the JDBC driver installation and click OK. πŸ‘ Add the CData Connect AI JAR file path
  14. Provide the following information:
    • URL: the URL for the connection, for example:
       jdbc:connect:Authscheme=Basic;user=username;password=PAT
      Note: the password is the PAT created in the Prerequisites section.
    • Driver class name: Enter the Driver class name as:
       cdata.jdbc.connect.ConnectDriver
      πŸ‘ Add the URL and the Driver class name
  15. Click Test Connection. πŸ‘ Click on Test Connection
  16. If the connection is successful, provide the SQL Query Text in the editor. You can see the table metadata on the right side in the Output tab. πŸ‘ Write the SQL Query
  17. In the Mule Palette, drag Transform Message to the Message Flow area. πŸ‘ Drag Transform Message to the Message Flow area
  18. Click Transform Message to configure it. Change the Output as follows: πŸ‘ Configure Transform Message
  19. Save your project and run it. In the console, Mulesoft starts initializing the dependencies. πŸ‘ Save and Run the project
  20. Once you see the message, "Message source 'listener' on flow your_project_name successfully started", you can start querying your data at the endpoint you provided. πŸ‘ Check for the 'Message source 'listener' on flow your_project_name successfully started' message to get started
  21. Query to check out the data using the Postman application (as shown below). πŸ‘ Send an API request from Postman to check the Databricks data

SQL Access to Databricks Data from Cloud Applications

Now you have a direct connection to live Databricks data from MuleSoft Anypoint Platform. You can create more connections to ensure seamless data flow, automate business processes, and manage APIs - all without replicating Databricks data.

To get real-time data access to hundreds of SaaS, Big Data, and NoSQL sources (including Databricks) directly from your cloud applications, explore the CData Connect AI.