VOOZH about

URL: https://dzone.com/articles/grounding-gemini-google-search-data-sources

⇱ Grounding Gemini With Google Search and Other Data Sources


Related

  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Grounding Gemini With Google Search and Other Data Sources

Grounding Gemini With Google Search and Other Data Sources

Take advantage of Google Gemini's 1M token limit to send context. You can also combine this approach with the Grounding with Google Search feature.

By Mar. 20, 25 · Tutorial
Likes
Comment
Save
5.0K Views

Join the DZone community and get the full member experience.

Join For Free

When you only have a few data sources (e.g., PDFs, JSON) that are required in your generative AI application, building RAG might not be worth the time and effort. 

In this article, I'll show how you can use Google Gemini to retrieve context from three data sources. I'll also show how you can combine the context and ground results using Google search. This enables the end user to combine real-time information from Google Search with their internal data sources. 

Application Overview

I'll only cover the code needed for Gemini and getting the data rather than building the entire application. Please note that this code is for demonstration purposes only. If you want to implement it, follow best practices such as using a key management service for API keys, error handling, etc. 

This application can answer any question related to events occurring in Philadelphia (I'm only using Philadelphia as an example because I found some good public data.) The data sources I used to send context to Gemini were a Looker report that has a few columns related to car crashes in Philadelphia for 2023, Ticketmaster events occurring for the following week, and weather for the following week. 

Parts of the code below were generated using Gemini 1.5 Pro and Anthropic Claude Sonnet 3.5.

Data Sources

I have all my code in three different functions for the API calls to get data in a file called api_handlers. App.py imports from api_handlers and sends the data to Gemini. Let's break down the sources in more detail.

Application files


Looker

Looker is Google's enterprise BI capability. Looker is an API-first platform. Almost anything you can do in the UI can be achieved using the Looker SDK. In this example, I'm executing a Looker report and saving the results to JSON. Here's a screenshot of the report in Looker. 

Looker report


Here's the code to get data from the report using the Looker SDK.

Python
def get_crash_data():

 import looker_sdk
 from looker_sdk import models40 as models
 import os
 import json
 
 sdk = looker_sdk.init40("looker.ini")

 look_id = "Enter Look ID"
 
 try:
 response = sdk.run_look(look_id=look_id, result_format="json")
 print('looker done')
 return json.loads(response)
 
 except Exception as e:
 print(f"Error getting Looker data: {e}")
 return []


This code imports looker_sdk, which is required to interact with Looker reports, dashboards, and semantic models using the API. Looker.ini is a file where the Looker client ID and secret are stored. 

This document shows how to get API credentials from Looker. You get the look_id from the Looker's Look URL. A Look in Looker is a report with a single visual. After that, the run_look command executes the report and saves the data to JSON. The response is returned when this function is called.

Ticketmaster

Here's the API call to get events coming from Ticketmaster.

Python
def get_philly_events():
 import requests
 from datetime import datetime, timedelta
 
 base_url = "https://app.ticketmaster.com/discovery/v2/events"
 
 start_date = datetime.now()
 end_date = start_date + timedelta(days=7)
 
 params = {
 "apikey": "enter",
 "city": "Philadelphia",
 "stateCode": "PA",
 "startDateTime": start_date.strftime("%Y-%m-%dT%H:%M:%SZ"),
 "endDateTime": end_date.strftime("%Y-%m-%dT%H:%M:%SZ"),
 "size": 50,
 "sort": "date,asc"
 }
 
 try:
 response = requests.get(base_url, params=params)
 if response.status_code != 200:
 return []
 
 data = response.json()
 events = []
 
 for event in data.get("_embedded", {}).get("events", []):
 venue = event["_embedded"]["venues"][0]
 event_info = {
 "name": event["name"],
 "date": event["dates"]["start"].get("dateTime", "TBA"),
 "venue": event["_embedded"]["venues"][0]["name"],
 "street": venue.get("address", {}).get("line1", "")
 }
 events.append(event_info)
 
 return events
 
 except Exception as e:
 print(f"Error getting events data: {e}")
 return []


I'm using the Ticketmaster Discovery API  to get the name, date, venue, and street details for the next 7 days. Since this is an HTTP GET request, you can use the requests library to make the GET request. If the result is successful, the response gets saved as JSON to the data variable. After that, the code loops through the data, and puts the information in a dictionary called events_info, which gets appended to the events list.

The final piece of data is weather. Weather data comes from NOAA weather API, which is also free to use.

Python
def get_philly_weather_forecast():
 import requests
 from datetime import datetime, timedelta
 import json
 
 lat = "39.9526"
 lon = "-75.1652"
 url = f"https://api.weather.gov/points/{lat},{lon}"
 
 try:
 # Get API data
 response = requests.get(url, headers={'User-Agent': 'weatherapp/1.0'})
 response.raise_for_status()
 
 grid_data = response.json()
 forecast_url = grid_data['properties']['forecast']
 
 # Get forecast data
 forecast_response = requests.get(forecast_url)
 forecast_response.raise_for_status()
 forecast_data = forecast_response.json()
 
 weather_data = {
 "location": "Philadelphia, PA",
 "forecast_generated": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
 "data_source": "NOAA Weather API",
 "daily_forecasts": []
 }
 
 # Process forecast data - take 14 periods to get 7 full days
 periods = forecast_data['properties']['periods'][:14] # Get 14 periods (7 days × 2 periods per day)
 
 # Group periods into days
 current_date = None
 daily_data = None
 
 for period in periods:
 period_date = period['startTime'][:10] # Get just the date part of period
 is_daytime = period['isDaytime']
 
 # If we're starting a new day
 if period_date != current_date:
 # Save the previous day's data if it exists
 if daily_data is not None:
 weather_data["daily_forecasts"].append(daily_data)
 
 # Start a new daily record
 current_date = period_date
 daily_data = {
 "date": period_date,
 "forecast": {
 "day": None,
 "night": None,
 "high_temperature": None,
 "low_temperature": None,
 "conditions": None,
 "detailed_forecast": None
 }
 }
 
 # Update the daily data based on whether it's day or night
 period_data = {
 "temperature": {
 "value": period['temperature'],
 "unit": period['temperatureUnit']
 },
 "conditions": period['shortForecast'],
 "wind": {
 "speed": period['windSpeed'],
 "direction": period['windDirection']
 },
 "detailed_forecast": period['detailedForecast']
 }
 
 if is_daytime:
 daily_data["forecast"]["day"] = period_data
 daily_data["forecast"]["high_temperature"] = period_data["temperature"]
 daily_data["forecast"]["conditions"] = period_data["conditions"]
 daily_data["forecast"]["detailed_forecast"] = period_data["detailed_forecast"]
 else:
 daily_data["forecast"]["night"] = period_data
 daily_data["forecast"]["low_temperature"] = period_data["temperature"]
 
 # Append the last day's data
 if daily_data is not None:
 weather_data["daily_forecasts"].append(daily_data)
 
 # Keep only 7 days of forecast
 weather_data["daily_forecasts"] = weather_data["daily_forecasts"][:7]
 
 return json.dumps(weather_data, indent=2)
 
 except Exception as e:
 print(f"Error with NOAA API: {e}")
 return json.dumps({
 "error": str(e),
 "location": "Philadelphia, PA",
 "forecast_generated": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
 "daily_forecasts": []
 }, indent=2)


The API doesn't require a key but it does require latitude and longitude in the request. The API request is made and saved as JSON in forecast_data

The weather data is broken out by two periods in a day: day and night. The code loops through 14 times times and keeps only 7 days of forecast.  I'm interested in temperature, forecast details, and wind speed. It also gets the high and low temperatures.

Bringing It All Together

Now that we have the necessary code to get our data, we will have to execute those functions and send them to Gemini as the initial context. You can get the Gemini API key from Google AI Studio. The code below adds the data to Gemini's chat history.

Python
from flask import Flask, render_template, request, jsonify
import os
from google import genai
from google.genai import types
from api_handlers import get_philly_events, get_crash_data, get_philly_weather_forecast
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

app = Flask(__name__)

# Initialize Gemini client
client = genai.Client(
 api_key='Enter Key Here',
)

# Global chat history
chat_history = []

def initialize_context():
 	
 try:
 # Get API data
 events = get_philly_events()
 looker_data = get_crash_data()
 weather_data = get_philly_weather_forecast()
 
 # Format events data
 events_formatted = "\n".join([
 f"- {event['name']} at {event['venue']} {event['street']} on {event['date']}" 
 for event in events
 ])
 
 # Create system context
 system_context = f"""You are a helpful AI assistant focused on Philadelphia.
You have access to the following data that was loaded when you started:

Current Philadelphia Events (Next 7 Days):
{events_formatted}

Crash Analysis Data:
{looker_data}

Instructions:
1. Use this event and crash data when answering relevant questions
2. For questions about events, reference the specific events listed above
3. For questions about crash data, use the analysis provided
4. For other questions about Philadelphia, you can provide general knowledge
5. Always maintain a natural, conversational tone
6. Use Google Search when needed for current information not in the provided data

Remember: Your events and crash data is from system initialization and represents that point in time."""

 # Add context to chat history
 chat_history.append(types.Content(
 role="user",
 parts=[types.Part.from_text(text=system_context)]
 ))

 print("Context initialized successfully")
 return True
 
 except Exception as e:
 print(f"Error initializing context: {e}")
 return False


The final step is to get the message from the user and call Gemini's Flash 2.0 model. Notice how the model also takes a parameter called tools=[types.Tool(google_search=types.GoogleSearch())]This is the parameter that uses Google search to ground results. If the answer isn't in one of the data sources provided, Gemini will do a Google search to find the answer. 

This is useful if you had information, such as events that weren't in Ticketmaster, but you wanted to know about them. I used Gemini to help get a better prompt to give during the initial context initialization.

Python
from flask import Flask, render_template, request, jsonify
import os
from google import genai
from google.genai import types
from api_handlers import get_philly_events, get_crash_data, get_philly_weather_forecast
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

app = Flask(__name__)

# Initialize Gemini client
client = genai.Client(
 api_key='Enter Key Here',
)

# Global chat history
chat_history = []

def initialize_context():
 """Initialize context with events and Looker data"""
 try:
 # Get initial data
 events = get_philly_events()
 looker_data = get_crash_data()
 weather_data = get_philly_weather_forecast()
 
 # Format events data to present better
 events_formatted = "\n".join([
 f"- {event['name']} at {event['venue']} {event['street']} on {event['date']}" 
 for event in events
 ])
 
 # Create system context
 system_context = f"""You are a helpful AI assistant focused on Philadelphia.
You have access to the following data that was loaded when you started:

Philadelphia Events for the next 7 Days:
{events_formatted}

Weather forecast for Philadelphia:
{weather_data}

Crash Analysis Data:
{looker_data}

Instructions:
1. Use this events, weather, and crash data when answering relevant questions
2. For questions about events, reference the specific events listed above
3. For questions about crash data, use the analysis provided
4. For questions about weather, use the data provided
5. For other questions about Philadelphia, you can provide general knowledge
6. Use Google Search when needed for current information not in the provided data

Remember: Your events and crash data is from system initialization and represents that point in time."""

 # Add context to chat history
 chat_history.append(types.Content(
 role="user",
 parts=[types.Part.from_text(text=system_context)]
 ))

 print("Context initialized successfully")
 return True
 
 except Exception as e:
 print(f"Error initializing context: {e}")
 return False

@app.route('/')
def home():
 return render_template('index.html')

@app.route('/chat', methods=['POST'])
def chat():
 try:
 user_message = request.json.get('message', '')
 if not user_message:
 return jsonify({'error': 'Message required'}), 400

 # Add user message to history
 chat_history.append(types.Content(
 role="user",
 parts=[types.Part.from_text(text=user_message)]
 ))

 # Configure generation settings
 generate_content_config = types.GenerateContentConfig(
 temperature=0.9,
 top_p=0.95,
 top_k=40,
 max_output_tokens=8192,
 tools=[types.Tool(google_search=types.GoogleSearch())],
 )

 # Generate response using full chat history
 response = client.models.generate_content(
 model="gemini-2.0-flash",
 contents=chat_history,
 config=generate_content_config,
 )

 # Add assistant response to history
 chat_history.append(types.Content(
 role="assistant",
 parts=[types.Part.from_text(text=response.text)]
 ))

 return jsonify({'response': response.text})

 except Exception as e:
 print(f"Error in chat endpoint: {e}")
 return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
 # Initialize context before starting
 print("Initializing context...")
 if initialize_context():
 app.run(debug=True)
 else:
 print("Failed to initialize context")
 exit(1)


Final Words

I'm sure there are other ways to initialize context rather than using RAG. This is just one approach that also grounds Gemini using Google search.

AI API Google Search

Opinions expressed by DZone contributors are their own.

Related

  • Self-Hosted Inference Doesn’t Have to Be a Nightmare: How to Use GPUStack
  • The Hidden Risk of SaaS-Based AI: You’re Training Models You Don’t Control
  • The Embed Is the Product: Rethinking AI Distribution
  • Scaling AI Workloads in Java Without Breaking Your APIs

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

Let's be friends: