VOOZH about

URL: https://www.cdata.com/kb/tech/databricks-jdbc-snaplogic.rst

⇱ Integrate Databricks with External Services using SnapLogic


Integrate Databricks with External Services using SnapLogic

πŸ‘ Jerod Johnson
Jerod Johnson
Director, Technology Evangelism
Use CData JDBC drivers in SnapLogic to integrate Databricks with External Services.

SnapLogic is an integration platform-as-a-service (iPaaS) that allows users to create data integration flows with no code. When paired with the CData JDBC Drivers, users get access to live data from more than 250+ SaaS, Big Data and NoSQL sources, including Databricks, in their SnapLogic workflows.

With built-in optimized data processing, the CData JDBC Driver offers unmatched performance for interacting with live Databricks data. When platforms issue complex SQL queries to Databricks, the driver pushes supported SQL operations, like filters and aggregations, directly to Databricks and utilizes the embedded SQL engine to process unsupported operations client-side (often SQL functions and JOIN operations). Its built-in dynamic metadata querying lets you work with Databricks data using native data types.

About Databricks Data Integration

Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:

  • Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
  • Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
  • Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
  • Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.

While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.

Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.


Getting Started


Connect to Databricks in SnapLogic

To connect to Databricks data in SnapLogic, download and install the CData Databricks JDBC Driver. Follow the installation dialog. When the installation is complete, the JAR file can be found in the installation directory (C:/Program Files/CData/CData JDBC Driver for Databricks/lib by default).

Upload the Databricks JDBC Driver

After installation, upload the JDBC JAR file to a location in SnapLogic (for example, projects/Jerod Johnson) from the Manager tab.

πŸ‘ Uploaded JDBC Driver (Salesforce & QuickBooks Online are shown)

Configure the Connection

Once the JDBC Driver is uploaded, we can create the connection to Databricks.

  1. Navigate to the Designer tab
  2. Expand "JDBC" from Snaps and drag a "Generic JDBC - Select" snap onto the designer πŸ‘ Adding a Generic JDBC snap onto the designer
  3. Click Add Account (or select an existing one) and click "Continue"
  4. In the next form, configure the JDBC connection properties:
    • Under JDBC JARs, add the JAR file we previously uploaded
    • Set JDBC Driver Class to cdata.jdbc.databricks.DatabricksDriver
    • Set JDBC URL to a JDBC connection string for the Databricks JDBC Driver, for example:

      jdbc:databricks:Server=127.0.0.1;Port=443;TransportMode=HTTP;HTTPPath=MyHTTPPath;UseSSL=True;User=MyUser;Password=MyPassword;RTK=XXXXXX;

      NOTE: RTK is a trial or full key. Contact our Support team for more information. πŸ‘ Configuring a connection (Salesforce is shown)

      Built-In Connection String Designer

      For assistance in constructing the JDBC URL, use the connection string designer built into the Databricks JDBC Driver. Either double-click the JAR file or execute the jar file from the command-line.

      java -jar cdata.jdbc.databricks.jar

      Fill in the connection properties and copy the connection string to the clipboard.

      To connect to a Databricks cluster, set the properties as described below.

      Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.

      • Server: Set to the Server Hostname of your Databricks cluster.
      • HTTPPath: Set to the HTTP Path of your Databricks cluster.
      • Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).
      πŸ‘ Using the built-in connection string designer to generate a JDBC URL (Salesforce is shown.)
  5. After entering the connection properties, click "Validate" and "Apply"

Read Databricks Data

In the form that opens after validating and applying the connection, configure your query.

  • Set Schema name to "Databricks"
  • Set Table name to a table for Databricks using the schema name, for example: "Databricks"."Customers" (use the drop-down to see the full list of available tables)
  • Add Output fields for each item you wish to work with from the table
πŸ‘ Configuring a Select snap (Salesforce is shown)

Save the Generic JDBC - Select snap.

With connection and query configured, click the end of the snap to preview the data (highlighted below).

πŸ‘ Click the end of the snap to preview the data.

Once you confirm the results are what you expect, you can add additional snaps to funnel your Databricks data to another endpoint.

πŸ‘ Previewing data (Salesforce is shown).

Piping Databricks Data to External Services

For this article, we will load data in a Google Spreadsheet. You can use any of the supported snaps, or even use a Generic JDBC snap with another CData JDBC Driver, to move data into an external service.

  1. Start by dropping a "Worksheet Writer" snap onto the end of the "Generic JDBC - Select" snap.
  2. Add an account to connect to Google Sheets πŸ‘ Connecting to Google
  3. Configure the Worksheet Writer snap to write your Databricks data to a Google Spreadsheet πŸ‘ Writing to a Google Spreadsheet

You can now execute the fully configured pipeline to extract data from Databricks and push it into a Google Spreadsheet.

πŸ‘ Data written to Google Spreadsheets (Salesforce is shown)

Piping External Data to Databricks

As mentioned above, you can also use the JDBC Driver for Databricks in SnapLogic to write data to Databricks. Start by adding a Generic JDBC - Insert or Generic JDBC - Update snap to the dashboard.

  1. Select the existing "Account" (connection) or create a new one
  2. Configure the query:
    • Set Schema name to "Databricks"
    • Set Table name to a table for Databricks using the schema name, for example: "Databricks"."Customers" (use the drop-down to see the full list of available tables)
    πŸ‘ Configuring a INSERT snap (Salesforce is shown)
  3. Save the Generic JDBC - Insert/Update snap

At this point, you have configured a snap to write data to Databricks, inserting new records or updating existing ones.

More Information & Free Trial

Using the CData JDBC Driver for Databricks you can create a pipeline in SnapLogic for integrating Databricks data with external services. For more information about connecting to Databricks, check at our CData JDBC Driver for Databricks page. Download a free, 30 day trial of the CData JDBC Driver for Databricks and get started today.