Building Agentic RAG System using LlamaIndex

Last Updated : 1 May, 2026

We are building an Agentic RAG using LlamaIndex which is a system that allows an autonomous agent to retrieve relevant information from a set of documents and generate accurate responses. This system combines the retrieval capabilities of LlamaIndex with the reasoning and decision-making capabilities of agents. Here we will use:

Agentic RAG: A system where the agent can autonomously decide how to retrieve and generate answers using multiple sources/tools.
LlamaIndex: A Python framework for building vector-based knowledge indices, allowing LLMs to retrieve relevant information from documents.

Working of Agentic RAG System

👁 user_query

Working of Agentic RAG System

Let's see how our system will be working:

1. User Query: Everything starts with a user question. The query flows into the central component i.e the Agent.

2. The Agent: The Agent acts as the "brain" of the system. Its job is to analyze the query and decide which specialized tool should handle different parts of the request. Instead of a fixed path, the agent makes dynamic decisions based on what the query needs.

3. Decision-Making and Tools: Depending on the query, the agent can choose between several tools:

DocumentRetriever Tool: Finds and fetches relevant documents for context.
Calculator Tool: Handles mathematical or computational questions.
Wikipedia Tool: Searches for factual knowledge directly from Wikipedia.

The agent can also call tools multiple timesor use a combination, depending on the task complexity.

4. LlamaIndex Query Engine: Some tools like DocumentRetriever or Calculator Tool, feed their results into the LlamaIndex Query Engine (a specialized search and synthesis engine). LlamaIndex processes and combines information from those tools to create a detailed and accurate answer.

5. Final Output: Once the agent is satisfied with the results, it sends the answer back to the user.

Note: Instead of a simple pipeline, this system lets the agent make smart, context-aware decisions about which tools or data sources to use and when to use, mimicking reasoning and planning making it an Agentic system

Step-by-Step Implementation

Let's build our Agentic RAG system which uses Llama-index:

Step 1: Install Dependencies

We will install the required packages and libraries for our system,

llama-index: For document retrieval and embeddings.
langchain: For agent and tool management.
langchain_community: Required for ChatOpenAI in LangChain 0.3.x.
openai: For LLM API access.
wikipedia: Optional tool for agent to search Wikipedia.

Step 2: Upload Documents and OpenAI API Key

We will upload some documents and files which our model can use. Files we are using here can be dowloded from here.

Creates a docs/ folder to store our knowledge documents.
Users can upload .txt files.
Example content can include notes, articles or any text relevant to queries.

To know how to extract OpenAI API key refer to: How to find and Use API Key of OpenAI.

Step 3: Import Libraries

We will import the required libraries for system,

SimpleDirectoryReader: Load documents.
GPTVectorStoreIndex: Create vector-based index for retrieval.
LLMPredictor & ServiceContext: Wrap LLM for LlamaIndex.
ChatOpenAI: OpenAI GPT model for text generation.
Tool, initialize_agent, AgentType: Build agentic reasoning system.
ConversationBufferMemory: Maintain past conversation context for agent.
wikipedia: Tool for retrieving general knowledge.

Step 4: Build the LlamaIndex Retrieval System

We will build the LlamaIndex Retrieval System in which,

SimpleDirectoryReader: Reads all uploaded documents.
LLMPredictor: Wraps GPT-3.5-turbo to work with LlamaIndex.
GPTVectorStoreIndex: Converts documents into embeddings stored in a vector store.
query_engine: Returns top 3 most relevant documents for any query.

Step 5: Define Tools for Agent

We will define the tools that are available for the agent to use:

DocumentRetriever: Uses LlamaIndex to fetch relevant docs.
Calculator: Handles numeric queries.
Wikipedia: Fetches general knowledge not in uploaded docs.
Tools: Each tool is callable by the agent automatically.

Step 6: Initialize Agent with Memory

We will initialize ConversationBufferMemory which is a short-term, in-session memory to the agent:

ConversationBufferMemory: Keeps track of past queries and responses.
AgentType.ZERO_SHOT_REACT_DESCRIPTION: Agent decides which tool to call without pre-training.
verbose=True: Shows reasoning steps in the output.

Step 7: Run the System

We run our system in which:

Users can interact with the agent.
Agent decides which tool to use, retrieves relevant info and generates answers.
Supports document queries, math calculations and Wikipedia search.

Output:

👁 Output

Output

Complete source code can be downloaded from here.

Advantages

Autonomous Reasoning: Agent decides which tool to use for each query.
Accurate Responses: LlamaIndex retrieves relevant documents before generating answers.
Multi-Tool Support: Can handle document retrieval, calculations and Wikipedia queries.
Context-Aware: Conversation memory allows follow-up questions.
Scalable & Modular: Tools and knowledge sources can be added or updated easily.
User-Friendly: Generates natural language answers interactively.

Comment

Article Tags:

Artificial Intelligence

NLP

GenAI

Explore

Introduction to AI

AI Concepts

Machine Learning in AI

Robotics and AI

Generative AI

AI Practice

Courses

URL: https://www.geeksforgeeks.org/artificial-intelligence/building-agentic-rag-system-using-llamaindex/