VOOZH about

URL: https://www.cdata.com/kb/tech/azuredatalake-cloud-google-adk.rst

โ‡ฑ How to Connect to Live Azure Data Lake Storage Data from Google ADK Agents (via CData Connect AI)


How to Connect to Live Azure Data Lake Storage Data from Google ADK Agents (via CData Connect AI)

๐Ÿ‘ Jerod Johnson
Jerod Johnson
Director, Technology Evangelism
Leverage the CData Connect AI Remote MCP Server to enable Google ADK agents to securely read and take actions on your Azure Data Lake Storage data for you.

Google ADK (Agent Development Kit) is a powerful, model-agnostic framework for building AI agents that can interact with various data sources and services. When combined with CData Connect AI Remote MCP, you can leverage Google ADK to build intelligent agents that interact with your Azure Data Lake Storage data in real-time through natural language queries. This article outlines the process of connecting to Azure Data Lake Storage using Connect AI Remote MCP and configuring a Google ADK agent to interact with your Azure Data Lake Storage data through ADK Web.

CData Connect AI offers a dedicated cloud-to-cloud interface for connecting to Azure Data Lake Storage data. The CData Connect AI Remote MCP Server enables secure communication between Google ADK agents and Azure Data Lake Storage. This allows your agents to read from and take actions on your Azure Data Lake Storage data, all without the need for data replication to a natively supported database. With its inherent optimized data processing capabilities, CData Connect AI efficiently channels all supported SQL operations, including filters and JOINs, directly to Azure Data Lake Storage. This leverages server-side processing to swiftly deliver the requested Azure Data Lake Storage data.

In this article, we show how to configure a Google ADK agent to conversationally explore (or Vibe Query) your data using natural language. With Connect AI you can build agents with access to live Azure Data Lake Storage data, plus hundreds of other sources.

Step 1: Configure Azure Data Lake Storage Connectivity for Google ADK

Connectivity to Azure Data Lake Storage from Google ADK agents is made possible through CData Connect AI Remote MCP. To interact with Azure Data Lake Storage data from your ADK agent, we start by creating and configuring a Azure Data Lake Storage connection in CData Connect AI.

  1. Log into Connect AI, click Sources, and then click Add Connection
  2. ๐Ÿ‘ Adding a Connection
  3. Select "Azure Data Lake Storage" from the Add Connection panel
  4. ๐Ÿ‘ Selecting a data source
  5. Enter the necessary authentication properties to connect to Azure Data Lake Storage.

    Authenticating to a Gen 1 DataLakeStore Account

    Gen 1 uses OAuth 2.0 in Entra ID (formerly Azure AD) for authentication.

    For this, an Active Directory web application is required. You can create one as follows:

    1. Sign in to your Azure Account through the
    2. Select "Entra ID" (formerly Azure AD).
    3. Select "App registrations".
    4. Select "New application registration".
    5. Provide a name and URL for the application. Select Web app for the type of application you want to create.
    6. Select "Required permissions" and change the required permissions for this app. At a minimum, "Azure Data Lake" and "Windows Azure Service Management API" are required.
    7. Select "Key" and generate a new key. Add a description, a duration, and take note of the generated key. You won't be able to see it again.

    To authenticate against a Gen 1 DataLakeStore account, the following properties are required:

    • Schema: Set this to ADLSGen1.
    • Account: Set this to the name of the account.
    • OAuthClientId: Set this to the application Id of the app you created.
    • OAuthClientSecret: Set this to the key generated for the app you created.
    • TenantId: Set this to the tenant Id. See the property for more information on how to acquire this.
    • Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.

    Authenticating to a Gen 2 DataLakeStore Account

    To authenticate against a Gen 2 DataLakeStore account, the following properties are required:

    • Schema: Set this to ADLSGen2.
    • Account: Set this to the name of the account.
    • FileSystem: Set this to the file system which will be used for this account.
    • AccessKey: Set this to the access key which will be used to authenticate the calls to the API. See the property for more information on how to acquire this.
    • Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.
    ๐Ÿ‘ Configuring a connection (Salesforce is shown)
  6. Click Save & Test
  7. Navigate to the Permissions tab in the Add Azure Data Lake Storage Connection page and update the User-based permissions. ๐Ÿ‘ Updating permissions

Add a Personal Access Token

A Personal Access Token (PAT) is used to authenticate the connection to Connect AI from your Google ADK agent. It is best practice to create a separate PAT for each service to maintain granularity of access.

  1. Click on the Gear icon () at the top right of the Connect AI app to open the settings page.
  2. On the Settings page, go to the Access Tokens section and click Create PAT.
  3. Give the PAT a name and click Create. ๐Ÿ‘ Creating a new PAT
  4. The personal access token is only visible at creation, so be sure to copy it and store it securely for future use.

With the connection configured and a PAT generated, we are ready to connect to Azure Data Lake Storage data from your Google ADK agent.

Step 2: Configure Your Google ADK Agent for CData Connect AI

Follow these steps to configure your Google ADK agent to connect to CData Connect AI. You can use our pre-built agent as a starting point, available at https://github.com/CDataSoftware/adk-mcp-client, or follow the instructions below to create your own.

  1. Ensure you have the Google ADK Python SDK installed. If not, install it using pip:
    pip install google-genkit google-adk
  2. Create or update your agent's configuration file (typically agent.py) to include the CData Connect AI MCP connection. You'll need to configure the MCP toolset with your Connect AI credentials.
  3. Set up your environment variables or configuration for the MCP server connection. Create a .env file in your project root with the following variables:
    MCP_SERVER_URL=https://mcp.cloud.cdata.com/mcp
    MCP_USERNAME=YOUR_EMAIL
    MCP_PASSWORD=YOUR_PAT
     
    Replace YOUR_EMAIL with your Connect AI email address and YOUR_PAT with the Personal Access Token created in Step 1.
  4. Configure your agent.py file to use the CData Connect AI MCP Server. Here's an example configuration:
    import os
    import base64
    from google.adk.agents import LlmAgent
    from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
    from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPConnectionParams
    from dotenv import load_dotenv
    
    # Load environment variables
    load_dotenv()
    
    # Get configuration from environment
    MCP_SERVER_URL = os.getenv('MCP_SERVER_URL', 'https://mcp.cloud.cdata.com/mcp')
    MCP_USERNAME = os.getenv('MCP_USERNAME', '')
    MCP_PASSWORD = os.getenv('MCP_PASSWORD', '')
    
    # Create auth header for MCP server
    auth_header = {}
    if MCP_USERNAME and MCP_PASSWORD:
     credentials = f"{MCP_USERNAME}:{MCP_PASSWORD}"
     auth_header = {"Authorization": f"Basic {base64.b64encode(credentials.encode()).decode()}"}
    
    # Define your agent with CData MCP tools
    root_agent = LlmAgent(
     model='gemini-2.0-flash-exp', # You can use any supported model
     name='data_query_assistant',
     instruction="""You are a data query assistant with access to Azure Data Lake Storage data through CData Connect AI.
     
     You can help users explore and query their Azure Data Lake Storage data in real-time.
     Use the available MCP tools to:
     - List available databases and schemas
     - Explore table structures
     - Execute SQL queries
     - Provide insights about the data
     
     Always explain what you're doing and format results clearly.""",
     
     tools=[
     MCPToolset(
     connection_params=StreamableHTTPConnectionParams(
     url=MCP_SERVER_URL,
     headers=auth_header
     )
     )
     ],
    )
     
  5. Run your agent with ADK Web. From your project directory, execute:
    adk web --port 5000 .

    Note: If you installed ADK with pip install --user, the adk command may not be in your PATH. You can either:

    • Use the full path: ~/Library/Python/3.x/bin/adk (on macOS)
    • Add to PATH: export PATH="$HOME/Library/Python/3.x/bin:$PATH"
    • Use a virtual environment where the PATH is automatically configured
  6. Open the ADK Web interface in your browser (typically http://localhost:5000).
  7. Select your agent from the dropdown menu (it will be named based on the name parameter in your agent configuration).
  8. Start interacting with your Azure Data Lake Storage data through natural language queries. Your agent now has access to your Azure Data Lake Storage data through the CData Connect AI MCP Server.

Step 3: Build Intelligent Agents with Live Azure Data Lake Storage Data Access

With your Google ADK agent configured and connected to CData Connect AI, you can now build sophisticated agents that interact with your Azure Data Lake Storage data using natural language. The MCP integration provides your agents with powerful data access capabilities.

Available MCP Tools for Your Agent

Your Google ADK agent has access to the following CData Connect AI MCP tools:

  • queryData: Execute SQL queries against connected data sources and retrieve results
  • getCatalogs: Retrieve a list of available connections from CData Connect AI
  • getSchemas: Retrieve database schemas for a specific catalog
  • getTables: Retrieve database tables for a specific catalog and schema
  • getColumns: Retrieve column metadata for a specific table
  • getProcedures: Retrieve stored procedures for a specific catalog and schema
  • getProcedureParameters: Retrieve parameter metadata for stored procedures
  • executeProcedure: Execute stored procedures with parameters

Example Use Cases

Here are some examples of what your Google ADK agents can do with live Azure Data Lake Storage data access:

  • Data Analysis Agent: Build an agent that analyzes trends, patterns, and anomalies in your Azure Data Lake Storage data
  • Report Generation Agent: Create agents that generate custom reports based on natural language requests
  • Data Quality Agent: Develop agents that monitor and validate data quality in real-time
  • Business Intelligence Agent: Build agents that answer complex business questions by querying multiple data sources
  • Automated Workflow Agent: Create agents that trigger actions based on data conditions in Azure Data Lake Storage

Testing Your Agent

Once deployed to ADK Web, you can interact with your agent through natural language queries. For example:

  • "Show me all customers from the last 30 days"
  • "What are the top performing products this quarter?"
  • "Analyze sales trends and identify anomalies"
  • "Generate a summary report of active projects"
  • "Find all records that match specific criteria"

Your Google ADK agent will automatically translate these natural language queries into appropriate SQL queries and execute them against your Azure Data Lake Storage data through the CData Connect AI MCP Server, providing real-time insights without requiring users to write complex SQL or understand the underlying data structure.

Get CData Connect AI

To get live data access to hundreds of SaaS, Big Data, and NoSQL sources directly from your Google ADK agents and cloud applications, try CData Connect AI today!