VOOZH about

URL: https://www.geeksforgeeks.org/artificial-intelligence/rag-system-with-langchain-and-langgraph/

⇱ RAG System with LangChain and LangGraph - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

RAG System with LangChain and LangGraph

Last Updated : 15 Apr, 2026

In this article we will build a Retrieval-Augmented Generation (RAG) system that improves AI answers by combining large language models with a smart document search. It reads documents, breaks them into smaller parts, turns them into searchable vectors. When user queries it uses context from documents to produce accurate, context-aware answers. Here we will also use:

  • LangChain which loads and splits documents into chunks, creates vector embeddings to represent text and interfaces with language models to generate answers.
  • LangGraph controls the order of retrieval and generation steps, manages state and data flow across the system and enables modular, maintainable AI workflows.
👁 frame_3249
RAG System

As we can see in the workflow,

  • Documents are loaded and read using LangChain.
  • Documents are split into smaller text chunks.
  • Each chunk is converted into a vector embedding for fast searching.
  • When a user asks a question, it is sent to the LangGraph workflow.
  • LangGraph orchestrates the process, using stored vectors and the user query.
  • The system retrieves relevant chunks using LangChain’s search.
  • LangChain and LangGraph together generate a smart answer using a language model.
  • The final answer is presented back to the user.

Step-by-Step Implementation

Let's build a RAG system with the help of LangChain and LangGraph:

Step 1: Install Dependencies

We will install the require packages that will be needed such as langchain, langgraph, langchain-openai, langchain-text-splitter, langchain-community, networkx and matplotlib.

Step 2: Setup API Keys

We configure the environment variable for the OpenAI API key. This is required to authenticate and access OpenAI models.

  • os.environ["OPENAI_API_KEY"]: Sets the API key as an environment variable so that LangChain/OpenAI libraries can automatically pick it up when calling the model.

To know how to acess

Step 3: Define the Application State

We define a TypedDict called State to represent the flow of data across our RAG pipeline.

  • question: The user’s query.
  • context: A list of retrieved Document objects relevant to the query.
  • answer: The final generated response from the language model.

Step 4: Load and Split Documents

Used knowledge based file can be downloaded from here.

We will unzip, loads documents and split it into smaller chunks.

  • RecursiveCharacterTextSplitter: Breaks down long documents into chunks (chunk_size=1000) with overlap (chunk_overlap=200) to maintain context.

Step 5: Create Embeddings and Vector Store

We convert the document chunks into embeddings and store them in a vector database for similarity search.

  • OpenAIEmbeddings: Generates embeddings using OpenAI’s embedding model text-embedding-3-large.
  • InMemoryVectorStore: A lightweight in-memory store for embeddings.
  • add_documents: Stores the vector representations of all document chunks.

Output:

👁 embeddings
List of Document Chunks IDs from the Model

Step 6: Define Custom Prompt and Initialize LLM Model

Define a custom prompt template guiding the LLM to use retrieved context to answer user questions clearly and concisely. Initialize OpenAI GPT-4.1 chat model with temperature 0.3 for manageable creativity in answers.

Step 7: Define Workflow Functions

Define individual LangGraph node functions for each pipeline step:

  • retrieve(state): Perform similarity search on the vector store to get top 5 matched document chunks related to the question.
  • generate(state): Format the prompt with question + retrieved context, invoke the LLM and return the generated answer.
  • classify(state): Dummy function that identifies "advanced" questions but currently passes the question unchanged.
  • refine(state): Append a refinement note to the generated answer for clarity.

Step 8: Build the LangGraph Workflow

We define the pipeline as a graph using LangGraph.

  • StateGraph(State): Defines a graph where nodes pass along State.
  • add_sequence([retrieve, generate]): Runs retrieval first, then generation.
  • add_edge(START, "retrieve"): Connects the start of the graph to the first node.
  • compile(): Finalizes the graph for execution.

Step 9: Visualize the LangGraph Workflow

Using NetworkX and Matplotlib, we will visualize our workflow,

  • G = nx.DiGraph(): Creates an empty directed graph where edges have direction, modeling workflow steps.
  • nx.draw_networkx_nodes(...): Draws graph nodes with specified colors, sizes and borders.
  • nx.draw_networkx_edges(...): Draws arrows between nodes with custom style and curvature for clarity.
  • plt.title(), plt.tight_layout(), plt.axis('off'): Sets title, adjusts layout and hides axes for a clean plot.

Output:

👁 floww
LangGraph Worflow

With the help of langgraph we were able to visualize workflow of our application which is helpful for better communication and understanding.

Step 10: Run the System

We take user input, pass it through the graph and display the answer.

  • graph.invoke(): Executes the graph pipeline (retrieve: generate).

Output:

👁 Query
Model Output

Advantages

Let's see the advantages that are offered by this system:

  • Grounded Responses: Unlike vanilla LLMs that may hallucinate, RAG grounds the model’s answers in actual documents, improving factual accuracy.
  • Domain Adaptability: We can easily load custom datasets (PDFs, web pages, internal notes) and make the system specialized for finance, healthcare, legal, research, etc.
  • Up-to-date Knowledge: The system retrieves the latest information from external sources, overcoming the LLM’s fixed training cutoff.
  • Efficient Context Management: By using document chunking and vector search, the model only processes the most relevant text instead of the entire dataset, reducing token costs and speeding up inference.
Comment

Explore