![]() |
VOOZH | about |
The Apache Solr platform is a popular, blazing-fast, open source enterprise search solution built on Apache Lucene.
Apache Solr is equipped with the Data Import Handler (DIH), which can import data from databases and, XML, CSV, and JSON files. When paired with the CData JDBC Driver for Databricks, you can easily import Databricks data to Apache Solr. In this article, we show step-by-step how to use CData JDBC Driver in Apache Solr Data Import Handler and import Databricks data for use in enterprise search.
Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:
While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.
Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.
> solr create -c CDataCoreFor this article, Solr is running as a standalone instance in the local environment and you can access the core at this URL: http://localhost:8983/solr/#/CDataCore/core-overview
DatabricksUniqueKeyπ Define schema in Solr for Databricks data.
Now we are ready to use Databricks data in Solr.
In this section, we walk through configuring the Data Import Handler.
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" />
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">solr-data-config.xml</str>
</lst>
</requestHandler>
<dataConfig>
<dataSource driver="cdata.jdbc.databricks.DatabricksDriver" url="jdbc:databricks:Server=127.0.0.1;Port=443;TransportMode=HTTP;HTTPPath=MyHTTPPath;UseSSL=True;User=MyUser;Password=MyPassword;">
</dataSource>
<document>
<entity name="Customers"
query="SELECT Id,DatabricksColumn1,DatabricksColumn2,DatabricksColumn3,DatabricksColumn4,DatabricksColumn5,DatabricksColumn6,DatabricksColumn7,LastModifiedDate FROM Customers"
deltaQuery="SELECT Id FROM Customers where LastModifiedDate >= '${dataimporter.last_index_time}'"
deltaImportQuery="SELECT Id,DatabricksColumn1,DatabricksColumn2,DatabricksColumn3,DatabricksColumn4,DatabricksColumn5,DatabricksColumn6,DatabricksColumn7,LastModifiedDate FROM Customers where Id=${dataimporter.delta.Id}">
<field column="Id" name="Id" ></field>
<field column="DatabricksColumn1" name="DatabricksColumn1" ></field>
<field column="DatabricksColumn2" name="DatabricksColumn2" ></field>
<field column="DatabricksColumn3" name="DatabricksColumn3" ></field>
<field column="DatabricksColumn4" name="DatabricksColumn4" ></field>
<field column="DatabricksColumn5" name="DatabricksColumn5" ></field>
<field column="DatabricksColumn6" name="DatabricksColumn6" ></field>
<field column="DatabricksColumn7" name="DatabricksColumn7" ></field>
<field column="LastModifiedDate" name="LastModifiedDate" ></field>
</entity>
</document>
</dataConfig>> solr stop -all > solr start
Using the CData JDBC Driver for Databricks you are able to create an automated import of Databricks data into Apache Solr. Download a free, 30 day trial of any of the hundreds of CData JDBC Drivers and get started today.
Download a free trial of the Databricks Driver to get started:
Download NowLearn more:
π Databricks IconRapidly create and deploy powerful Java applications that integrate with Databricks.