![]() |
VOOZH | about |
CData Sync for Azure Data Lake Storage is a stand-alone application that provides solutions for a variety of replication scenarios such as replicating sandbox and production instances into your database. Both Sync for Windows and Sync for Java include a command-line interface (CLI) that makes it easy to manage multiple Azure Data Lake Storage connections. In this article we show how to use the CLI to replicate multiple Azure Data Lake Storage accounts.
You can save connection and email notification settings in an XML configuration file. To replicate multiple Azure Data Lake Storage accounts, use multiple configuration files. Below is an example configuration to replicate Azure Data Lake Storage to SQLite:
<?xml version="1.0" encoding="UTF-8" ?> <CDataSync> <DatabaseType>SQLite</DatabaseType> <DatabaseProvider>System.Data.SQLite</DatabaseProvider> <ConnectionString>Schema=ADLSGen2;Account=myAccount;FileSystem=myFileSystem;AccessKey=myAccessKey;InitiateOAuth=GETANDREFRESH;</ConnectionString> <ReplicateAll>False</ReplicateAll> <NotificationUserName></NotificationUserName> <DatabaseConnectionString>Data Source=C:\my.db</DatabaseConnectionString> <TaskSchedulerStartTime>09:51</TaskSchedulerStartTime> <TaskSchedulerInterval>Never</TaskSchedulerInterval> </CDataSync>
<?xml version="1.0" encoding="UTF-8" ?> <CDataSync> <DatabaseType>SQLite</DatabaseType> <DatabaseProvider>org.sqlite.JDBC</DatabaseProvider> <ConnectionString>Schema=ADLSGen2;Account=myAccount;FileSystem=myFileSystem;AccessKey=myAccessKey;InitiateOAuth=GETANDREFRESH;</ConnectionString> <ReplicateAll>False</ReplicateAll> <NotificationUserName></NotificationUserName> <DatabaseConnectionString>Data Source=C:\my.db</DatabaseConnectionString> </CDataSync>
Gen 1 uses OAuth 2.0 in Entra ID (formerly Azure AD) for authentication.
For this, an Active Directory web application is required. You can create one as follows:
To authenticate against a Gen 1 DataLakeStore account, the following properties are required:
To authenticate against a Gen 2 DataLakeStore account, the following properties are required:
Sync enables you to control replication with standard SQL. The REPLICATE statement is a high-level command that caches and maintains a table in your database. You can define any SELECT query supported by the Azure Data Lake Storage API. The statement below caches and incrementally updates a table of Azure Data Lake Storage data:
REPLICATE Resources;
You can specify a file containing the replication queries you want to use to update a particular database. Separate replication statements with semicolons. The following options are useful if you are replicating multiple Azure Data Lake Storage accounts into the same database:
You can use a different table prefix in the REPLICATE SELECT statement:
REPLICATE PROD_Resources SELECT * FROM Resources
Alternatively, you can use a different schema:
REPLICATE PROD.Resources SELECT * FROM Resources
After you have configured the connection strings and replication queries, you can run Sync with the following command-line options:
ADLSSync.exe -g MyProductionADLSConfig.xml -f MyProductionADLSSync.sql
java -Xbootclasspath/p:c:\sqlitejdbc.jar -jar ADLSSync.jar -g MyProductionADLSConfig.xml -f MyProductionADLSSync.sql
Learn more or sign up for a free trial:
CData Sync