VOOZH about

URL: https://thenewstack.io/how-to-build-an-ai-agent-that-uses-rag-to-increase-accuracy/

⇱ How To Build an AI Agent That Uses RAG To Increase Accuracy - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-07-29 07:59:15
How To Build an AI Agent That Uses RAG To Increase Accuracy
tutorial,
AI / Software Development

How To Build an AI Agent That Uses RAG To Increase Accuracy

This tutorial shows how to build an agent using a retriever to extract context from unstructured data, while invoking an API to get more data.
Jul 29th, 2024 7:59am by Janakiram MSV
👁 Featued image for: How To Build an AI Agent That Uses RAG To Increase Accuracy
Photo by Rick Rothenberg on Unsplash.

The combination of retrieval augmented generation (RAG) and function calls can greatly improve the capabilities of LLM-based applications. RAG agents based on function calling combine the benefits of both approaches, relying on external knowledge bases for accurate data retrieval and executing specific functions for efficient task completion.

Function calling within the RAG framework enables more structured retrieval processes. For example, a function can be predefined to extract specific information based on user queries, which the RAG system will retrieve from a comprehensive knowledge base. This method ensures that the responses are both relevant and precisely tailored to the application’s requirements.

👁 Image

In this tutorial, we will build an agent that’s designed to help the product manager of an ecommerce company analyze sales and the product portfolio. It uses a retriever to extract context from unstructured data stored in PDFs, while invoking an API to get sales information.

The agent has access to a set of tools and also to a vector database. The initial prompt and the registered tools are sent to the LLM. If the LLM response includes a subset of tools, the agent executes them and collects the context. If the LLM doesn’t recommend executing any of the tools, the agent then performs a semantic search in the vector database and retrieves the context. Irrespective of where the context is gathered from, it is added to the original prompt and sent to the LLM.

To simplify the configuration, I created a Docker Compose file to run the MySQL database and Flask API layers. The PDFs are indexed separately and ingested into ChromaDB. It’s assumed that you have access to the OpenAI environment.

Start by cloning the Git repository and follow the steps below to configure the agent on your machine.

git clone https://github.com/janakiramm/rag-agent.git

Step 1: Launch the DB and the API server

Switch to the api directory and run the Docker Compose file to launch the database and the corresponding API server.

docker compose up -d --build

The API server exposes four API endpoints:

get_top_selling_products
get_top_categories
get_sales_trends
get_revenue_by_category

You can invoke these endpoints from curl.

curl "http://localhost:5000/api/sales/top-products?start_date=2023-04-01&end_date=2023-06-30"
curl "http://localhost:5000/api/sales/top-categories?start_date=2023-04-01&end_date=2023-06-30"
curl "http://localhost:5000/api/sales/trends?start_date=2023-04-01&end_date=2023-06-30"
curl "http://localhost:5000/api/sales/revenue-by-category?start_date=2023-04-01&end_date=2023-06-30"

👁 Image

Step 2: Index PDFs and Store Vectors in Chroma DB

Under the data directory, you will find a PDF that contains a description of a few products from the electronics category. Our task is to index it and store the embedding vectors in Chroma.

👁 Image

For this, launch the Index-Datasheet Jupyter Notebook and run all the cells.

👁 Image

This loads the PDF, performs chunking, generates the embeddings and finally stores the vectors in ChromaDB.

The last cell of this Notebook performs a simple semantic search to validate the indexing process.

👁 Image

Now, we have two entities that can help us get the context: 1) API, and 2) vector database.

Step 3: Run the RAG Agent

The agent code is available in the RAG-Agent Jupyter Notebook. Launch it and run all the cells to see it in action.

This Notebook contains the logic to decide between executing the tools and performing a semantic search.

I wrapped the REST API calls within the tools.py which is available in the root directory of the repo, which we import into the agent.

from tools import (
 get_top_selling_products,
 get_top_categories,
 get_sales_trends,
 get_revenue_by_category
)

Since we decided to persist the Chroma collection from the indexing process performed in the previous step, we will simply load it.

chroma_client = chromadb.PersistentClient(path="./data")
embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
collection = chroma_client.get_or_create_collection(name="products", embedding_function=embedding_function)

Based on the available tools, we pass them along with the prompt to the LLM to map. The LLM then recommends the right functions to invoke. Below is a partial code snippet from the map_tools function.

….
messages = [{"role": "user", "content": prompt}]
 response = llm.chat.completions.create(
 model=model,
 messages=messages,
 tools=tools,
 tool_choice="auto"
 )
 
 # Ensure response has valid tool_calls
 response_message = response.choices[0].message
 tool_calls = getattr(response_message, 'tool_calls', None)

 functions = []
 if tool_calls:
 for tool in tool_calls:
 function_name = tool.function.name
 arguments = json.loads(tool.function.arguments)
 functions.append({
 "function_name": function_name,
 "arguments": arguments
 })

 return functions

Similarly, we have a retriever responsible for extracting the context from the vector database.

def retriever(query):
 vector = embedding_function([query])
 results = collection.query( 
 query_embeddings=vector,
 n_results=5,
 include=["documents"]
 )
 res = " \n".join(str(item) for item in results['documents'][0])
 return res

We have a simple helper function to send the gathered context and the original prompt to the LLM.

def generate_response(prompt,context):
 input_text = (
 "Based on the below context, respond with an accurate answer. If you don't find the answer within the context, say I do not know. Don't repeat the question\n\n"
 f"{context}\n\n"
 f"{prompt}"
 )
 response = llm.chat.completions.create(
 model= model,
 messages=[
 {"role": "user", "content": input_text},
 ],
 max_tokens=150,
 temperature=0
 )

 return response.choices[0].message.content.strip()

The job of the agent is to first check whether the LLM recommends any tools and then execute them to generate the context. If not, it relies on the vector database to generate the context.

def agent(prompt):
 tools = map_tools(prompt)
 
 if tools: 
 tool_output = execute_tools(tools)
 context = json.dumps(tool_output) 
 else:
 context = retriever(prompt)
 
 response = generate_response(prompt, context)
 return response

In the below screenshot, the first response is coming from the tools/API and the second from the vector database.

👁 Image

Extending RAG Agent to Use Federated Language Models

In this scenario, we relied on OpenAI’s GPT-4o for mapping the function calls and generating the final response based on the context. By relying on the idea of federated models, we can entirely avoid sending the context to the cloud-based LLM and use a local LLM deployed at the edge to respond to queries.

In my next post (the last and final part of this series), we will see how to combine the idea of the RAG agent with federated language models. Stay tuned.

TRENDING STORIES
Janakiram MSV (Jani) is a practicing architect, research analyst, and advisor to Silicon Valley startups. He focuses on the convergence of modern infrastructure powered by cloud-native technology and machine intelligence driven by generative AI. Before becoming an entrepreneur, he spent...
Read more from Janakiram MSV
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Docker, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.