Access Live Databricks Data in TIBCO Data Virtualization

Jerod Johnson
Director, Technology Evangelism

Use the Databricks Tibco DV Adapter to create a Databricks data source in TIBCO Data Virtualization Studio and gain access to live Databricks data from your TDV Server.

TIBCO Data Virtualization (TDV) is an enterprise data virtualization solution that orchestrates access to multiple and varied data sources. When paired with the Databricks Tibco DV Adapter, you get federated access to live Databricks data directly within TIBCO Data Virtualization. This article explains how to deploy an adapter and create a new data source based on Databricks.

With built-in optimized data processing, the CData TIBCO DV Adapter offers unmatched performance for interacting with live Databricks data. When you issue complex SQL queries to Databricks, the adapter pushes supported SQL operations, like filters and aggregations, directly to Databricks. Its built-in dynamic metadata querying allows you to work with and analyze Databricks data using native data types.

About Databricks Data Integration

Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:

Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.

While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.

Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.

Getting Started

Deploy the Databricks TIBCO DV Adapter

In a console, navigate to the bin folder in the TDV Server installation directory. If there is a current version of the adapter installed, you will need to undeploy it.
```
.\server_util.bat -server localhost -user admin -password ******** -undeploy -version 1 -name Databricks
```
Extract the CData TIBCO DV Adapter to a local folder and deploy the JAR file (tdv.databricks.jar) to the server from the extract location.
```
.\server_util.bat -server localhost -user admin -password ******** -deploy -package /PATH/TO/tdv.databricks.jar
```

You may need to restart the server to ensure the new JAR file is loaded properly, which can be accomplished by running the composite.bat script located at: C:\Program Files\TIBCO\TDV Server <version>\bin. Note that reauthenticating to the TDV Studio is required after restarting the server.

Sample Restart Call

.\composite.bat monitor restart

Once you deploy the adapter, you can create a new data source in TDV Studio for Databricks.

Create a Databricks Data Source in TDV Studio

With the Databricks Tibco DV Adapter, you can easily create a data source for Databricks and introspect the data source to add resources to TDV.

Create the Data Source

Right-click on the folder you wish to add the data source to and select New -> New Data Source
Scroll until you find the adapter (e.g. Databricks) and click Next
Name the data source (e.g. CData Databricks Source)
Fill in the required connection properties

To connect to a Databricks cluster, set the properties as described below.

Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.

Server: Set to the Server Hostname of your Databricks cluster.
HTTPPath: Set to the HTTP Path of your Databricks cluster.
Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).

👁 Filling in Connection Information (Salesforce is shown.)

Click Create & Close.

Introspect the Data Source

Once the data source is created, you can introspect the data source by right-clicking and selecting Open. In the dashboard, click Add/Remove Resources and select the Tables, Views, and Stored Procedures to include as part of the data source. Click Next and Finish to add the selected Databricks tables, views, and stored procedures as resources.

👁 Introspecting the Data Source (Salesforce is shown.)

After creating and introspecting the data source, you are ready to work with Databricks data in TIBCO Data Virtualization just like you would any other relational data source. You can create views, query using SQL, publish the data source, and more.

Ready to get started?

Learn more:

TIBCO DV Adapters

URL: https://www.cdata.com/kb/tech/databricks-tdv-setup.rst