Query Spark Data in DataGrip

Jerod Johnson
Director, Technology Evangelism

Create a Data Source for Spark in DataGrip and use SQL to query live Spark data.

DataGrip is a database IDE that allows SQL developers to query, create, and manage databases. When paired with the CData JDBC Driver for Apache Spark, DataGrip can work with live Spark data. This article shows how to establish a connection to Spark data in DataGrip and use the table editor to load Spark data.

Create a New Driver Definition for Spark

The steps below describe how to create a new Data Source in DataGrip for Spark.

In DataGrip, click File -> New > Project and name the project 👁 Creating a new DataGrip project.
In the Database Explorer, click the plus icon () and select Driver. 👁 Adding a new Driver.
In the Driver tab:
- Set Name to a user-friendly name (e.g. "CData Spark Driver")
- Set Driver Files to the appropriate JAR file. To add the file, click the plus (), select "Add Files," navigate to the "lib" folder in the driver's installation directory and select the JAR file (e.g. cdata.jdbc.sparksql.jar).
- Set Class to cdata.jdbc.sparksql.SparkSQL.jar
Click "Apply" then "OK" to save the Connection 👁 A configured Driver (Salesforce is shown).

Configure a Connection to Spark

Once the connection is saved, click the plus (), then "Data Source" then "CData Spark Driver" to create a new Spark Data Source.
In the new window, configure the connection to Spark with a JDBC URL.
Built-in Connection String Designer

For assistance in constructing the JDBC URL, use the connection string designer built into the Spark JDBC Driver. Either double-click the JAR file or execute the jar file from the command-line.
```
 java -jar cdata.jdbc.sparksql.jar
 
```
Fill in the connection properties and copy the connection string to the clipboard.

Set the Server, Database, User, and Password connection properties to connect to SparkSQL.
👁 Using the built-in connection string designer to generate a JDBC URL (Salesforce is shown.)
Set URL to the connection string, e.g.,
```
jdbc:sparksql:Server=127.0.0.1;
```
Click "Apply" and "OK" to save the connection string 👁 A configured Data Source (Salesforce is shown).

At this point, you will see the data source in the Data Explorer.

Execute SQL Queries Against Spark

To browse through the Spark entities (available as tables) accessible through the JDBC Driver, expand the Data Source.

👁 Exploring the data (Salesforce is shown.)

To execute queries, right click on any table and select "New" -> "Query Console."

👁 Opening a new Query Console.

In the Console, write the SQL query you wish to execute. For example:

SELECT City, Balance FROM Customers

👁 Querying with SQL (Salesforce is shown.)

Download a free, 30-day trial of the CData JDBC Driver for Apache Spark and start working with your live Spark data in DataGrip. Reach out to our Support Team if you have any questions.

Ready to get started?

Download a free trial of the Apache Spark Driver to get started:

Download Now

Learn more:

👁 Apache Spark Icon
Apache Spark JDBC Driver

Rapidly create and deploy powerful Java applications that integrate with Apache Spark.

URL: https://www.cdata.com/kb/tech/spark-jdbc-datagrip.rst