Implementing Branched RAG

Last Updated : 10 Feb, 2026

Branched Retrieval‑Augmented Generation (Branched RAG) is a type of RAG system where multiple retrieval paths operate in parallel to handle complex queries. Each branch retrieves and processes information independently and the combined outputs improve answer accuracy and reasoning depth.

Enables parallel retrieval from multiple sources or contexts
Improves response quality for complex or multi‑part queries
Enhances flexibility and scalability in RAG‑based systems

👁 branched_rag

Branched RAG

Implementation

Step 1: Install Required Libraries

Install the following libraries to set up the environment for implementing Branched RAG using LangGraph:

langchain: Core framework for building applications with large language models.
langgraph: Manage multi step and branched RAG workflows using graph based execution.
langchain google genai: Enables integration of Google’s Generative AI models within LangChain.
faiss cpu: High performance similarity search library for vector embeddings.
sentence transformers: Generates dense vector embeddings for semantic search and retrieval tasks.

Run the command below to install or upgrade all required packages:

Step 2: Import Required Libraries

We start by importing all the building blocks required for documents, embeddings, vector search, LLMs and graph orchestration.

Document : Standard format for storing text
RecursiveCharacterTextSplitter: Breaks text into manageable chunks
HuggingFaceEmbeddings: Converts text into numerical vectors
FAISS: Fast vector similarity search
ChatGoogleGenerativeAI: LLM for reasoning and answer generation
LangGraph: Controls multi step RAG flow using nodes

Step 3: Load Dummy Documents

In this article we use small dummy documents to simulate a real knowledge base.

Each text snippet is wrapped inside a Document object
Document provides a standardized interface i.e text content and optional metadata like source, tags, timestamps, etc.
Using dummy documents enables faster iteration and easier debugging.

Step 4: Split Documents into Chunks

Large text blocks dilute retrieval accuracy so we use chunking that breaks documents into focused, overlapping segments that helps vector search engine to retrieve precise and relevant context instead of broad, noisy passages.

RecursiveCharacterTextSplitter: splits text intelligently while preserving semantic boundaries
chunk_size=100: Limits each chunk to 100 characters for fine grained retrieval
chunk_overlap=20: Maintains continuity between adjacent chunks, preventing context loss

Step 5: Generate Embeddings and Build Vector Store

To enable semantic search, we convert text chunks into numerical vector representations. These vectors allow the system to compare meaning not just keywords making retrieval accurate and context aware.

HuggingFaceEmbeddings: Uses a Sentence Transformers model to encode text into dense vectors.
FAISS Vector Store: Stores embeddings in memory for rapid similarity search

Output:

👁 Screenshot-2026-02-10-113314

Creating Embeddings

Step 6: Create a Retriever

This step retrieves data from our vector store based on query.

Retriever acts as a clean query layer over the vector store
k=2: Retrieves the top 2 most relevant chunks for each query
The same retriever can now be reused across multiple query branches

Step 7: Initialize the LLM

The Large Language Model (LLM) is the decision engine of the Branched RAG system. Here we will use Google Gemini as LLM.

google_api_key: Passes the API key directly to the LLM client
Here we will use gemini-2.5-flash model.
temperature=0: means the LLM gives the most deterministic and repeatable output, always choosing the highest probability next token.

To know how to get Gemini API Key refer to: Google Gemini API Key

Step 8: Define the Graph State

LangGraph works using a shared, immutable state that flows through all nodes in the graph. This state acts as a single source of truth, allowing each node to read from and write to the same structured data. Each field in the state represents a key stage in the RAG lifecycle.

query: The original user question and the entry point of the graph.
branches: Sub queries generated from the original query, enabling parallel retrieval paths.
retrieved_docs: Raw content returned by the retriever across all branches, acts as the evidence pool.
context: Merged and refined knowledge created from retrieved documents and passed to the LLM.
answer: Final grounded response generated by the LLM and the output of the graph

Step 9: Implement Branched RAG Execution Nodes

1. Query Branching Node: Uses the LLM to intelligently decompose the user’s intent into multiple focused sub queries. Each branch captures a different semantic meaning, improving coverage compared to a single broad query.

2. Multi Branch Retrieval Node: Performs vector search independently for each query branch, retrieves the top k relevant chunks and aggregates all results into a unified evidence pool. This parallel retrieval is the core differentiator of Branched RAG.

3. Context Merge Node: Combines and cleans all retrieved content into a single structured context, ensuring the LLM receives clear and relevant information for reasoning.

4. Answer Generation Node: Generates the final response by grounding the LLM in the merged context. This design produces coherent, evidence based answers and significantly reduces hallucinations.

Step 10: Build and Execute the LangGraph Workflow

This step brings all the defined nodes together into a single executable workflow using LangGraph’s graph based model, where nodes specify what operations are performed and edges control when and in what order they run, allowing the user query to flow step by step until the final answer is generated.

Output:

👁 Flow-of-state

Graph formed

__start__: The entry point of the graph where the user query enters the system. An initial empty state is created to begin execution.
branch: The LLM analyzes the query and splits it into multiple sub queries, each targeting a different intent. This creates logical branches for parallel exploration.
retrieve: Each branch performs its own vector search. Relevant document chunks are fetched independently and then collected into a shared result set.
merge: All retrieved content is combined and refined. Redundant information is unified to form a single, clean context.
answer: The LLM uses the merged context along with the original query to generate a grounded final response, significantly reducing hallucinations.
__end__: The final answer is returned and the graph execution completes.

Step 11: Running the Graph

Output:

👁 Screenshot-2026-02-10-113104

Output

Branched RAG first splits the query into multiple focused sub queries.
Each sub query retrieves relevant information independently, covering different aspects of the topic.
These results are then merged into a single context, which the LLM uses to generate the final answer.
Because the answer is built from multiple retrieval paths, it is more complete, accurate and less prone to hallucinations.

You can download the code notebook from here

Comment

Article Tags:

Data Science

Artificial Intelligence

Explore

Introduction to Machine Learning

Python for Machine Learning

Introduction to Statistics

Feature Engineering

Model Evaluation and Tuning

Data Science Practice

Courses

URL: https://www.geeksforgeeks.org/data-science/implementing-branched-rag/