Building a Language Model Application with LangChain: A Beginners Guide
- Build powerful AI applications using LangChain and LangGraph.
- Includes 3 CoursesIncludes 3 Courses
- With CertificateWith Certificate
- Intermediate.4 hours4 hours
- Learn Streamlit to build and deploy interactive AI applications with Python in this hands-on course.
- With CertificateWith Certificate
- Intermediate.2 hours2 hours
Introduction to LLMs
With Generative AI sweeping the world, learning to build and develop applications with large language models (LLMs) is crucial. LLMs are machine learning models that can understand and generate human-like language text.
Language models have become central to many applications due to their versatility. They can summarize lengthy documents, translate languages, provide detailed answers to complex questions, and even assist in coding tasks. By leveraging these capabilities, developers can create more intuitive and responsive applications that enhance user experiences.
Now that we understand the significance of language models in modern applications, let’s start learning what LangChain is and how it can simplify building with LLMs.
What is LangChain?
Given the many applications and the increasing reliance on AI technologies, understanding how to build and implement LLM-based applications is becoming an essential skill for developers. A great tool for this is LangChain. LangChain simplifies the process of building applications with LLMs. It provides an easy-to-use interface for integrating various data sources, APIs, and pre-trained language models, allowing developers to create sophisticated AI-driven applications with minimal effort.
There are many advantages to using LangChain. Here are two:
- Provides pre-built modules and templates, and reduces the complexity of implementing advanced features. This means we can focus more on their projects’ creative and strategic aspects rather than getting bogged down by technical details.
- With LangChain, we can quickly prototype and iterate on our ideas, reducing the time for development.
In this tutorial, we will practice using LangChain to build an application that summarizes PDFs.
Build a PDF Summarizer with LangChain
To understand how LangChain is used in developing LLM-based applications, let’s build a Gen-AI-powered PDF summary application. First, we begin by setting up our environment.
Set up the Development Environment
To build this application, make sure you have Python installed on your system. If Python is not installed, we will have to install it (see Installing Python 3). We will also need the Streamlit, LangChain, and pypdf modules along with Python. We can install these packages by executing the following commands:
pip install streamlit langchain pypdfCopy to clipboardCopy to clipboard
Now we have everything ready to start building.
Build a basic Frontend
The first step in building this application is to build the front end, for this, we will use Streamlit to quickly build a simple interface for our PDF Summarizer.
import streamlit as stimport osst.set_page_config(page_title="PDF Summarizer")st.title("PDF Summarizer")st.write("Summarize your pdf files using hte power of LLMs")st.divider()pdf = st.file_uploader("Upload your PDF",type="pdf")submit = st.button("Generate Summary")Copy to clipboardCopy to clipboard
This gives the following output:
👁 Frontend for PDF Summarizer built with Streamlit
Now that we have the interface for our application, let’s add functionality to it with LangChain and OpenAI’s ChatGPT 3.5 .
Backend using LangChain
We will begin by importing all the LangChain modules and functions necessary for our project. These include
CharacterTextSplitter,HuggingaceEmbeddings, andFAISS, which will be used for split the text from the pdf uploaded into chunks and create it’s knowledge base using embeddingsload_qa_chain,openai,ChatOpenAI,get_open_ai_callbackandPdfReadermodules; they are used to integrate ChatGPT 3.5 model with the knowledge base generated from the uploaded file to then return its summary.
To import them, the following code is used:
from langchain.text_splitter import CharacterTextSplitterfrom langchain.embeddings import HuggingFaceEmbeddingsfrom langchain import FAISSfrom langchain.chains.question_answering import load_qa_chainfrom langchain.llms import openaifrom langchain_community.chat_models import ChatOpenAIfrom langchain.callbacks import get_openai_callbackfrom pypdf import PdfReaderCopy to clipboardCopy to clipboard
After importing the required modules, we move on to building the backend functionalities. First, we build the process_text() function that splits the input text into smaller chunks using the CharacterTextSplitter(), ensuring each chunk is around 1000 characters. It then converts these chunks into embeddings using a pre-trained model from HuggingFace (sentence-transformers/all-MiniLM-L6-v2). Finally, it builds a searchable FAISS knowledge base from these embeddings and returns it.
defprocess_text(text):text_splitter = CharacterTextSplitter(separator="\n",chunk_size=1000,chunk_overlap=200,length_function=len)chunks = text_splitter.split_text(text)embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')knowledgeBase = FAISS.from_texts(chunks, embeddings)return knowledgeBaseCopy to clipboardCopy to clipboard
Next, we build the summarizer() function. This function takes the content of the uploaded PDF file and extracts its text. It processes this text into chunks and embeddings using the process_text() function to create a knowledge base. The function then formulates a query to summarize the PDF content, searches for relevant text chunks using similarity search, and uses an OpenAI language model (gpt-3.5-turbo-16k) to generate a concise summary, which is then returned.
defsummarizer(pdf):response =""pdf_reader = PdfReader(pdf)text =""# Extract text from each page of the PDFfor page in pdf_reader.pages:text += page.extract_text()or""knowledgeBase = process_text(text)query ="Summarize the content of the uploaded PDF file in approximately 5-8 sentences."# Load the question and answer chainif query:docs = knowledgeBase.similarity_search(query)OpenAIModel ="gpt-3.5-turbo-16k"llm = ChatOpenAI(model=OpenAIModel, temperature=0.1)chain = load_qa_chain(llm, chain_type='stuff')#Run the above chain through ChatGPT model to get resultswith get_openai_callback()as cost:response = chain.run(input_documents=docs, question=query)print(cost)return responseCopy to clipboardCopy to clipboard
Finally, we will need to set our OpenAI API key to use the ChatGPT 3.5 model. When the Generate Summary button is clicked, the summarizer() function is called and then the summary is displayed using the code below:
os.environ["OPENAI_API_KEY"]="YOUR_OPENAI_KEY"# Call the `summarizer()` function when the `Generate Summary` button is clickedif summarize:response = summarizer(pdf)# Display the returned summaryst.subheader("PDF Summary")st.write(response)Copy to clipboardCopy to clipboard
Run and Deploy the Application
To run this Streamlit-based application we must use the following command:
streamlit run app.pyCopy to clipboardCopy to clipboard
Note: While running a Streamlit application with document input, you may run into the following error:
AxiosError: Request failed with status code 403Copy to clipboardCopy to clipboardIn this case, running the application with the below command solves the issue:
streamlit run app.py --server.enableXsrfProtection falseCopy to clipboardCopy to clipboard
Furthermore, the application can be deployed using the Streamlit Community Cloud for free. This option is available on the localhost page, in the top right corner.
👁 Deploy option highlighted in the top right corner of the localhost page
On clicking Deploy a dialogue box opens up as shown below. Click on Deploy Now under Streamlit Community Cloud and select a Streamlit domain to host any website.
👁 Free Streamlit Community Cloud Deployment
By following these steps, we have used LangChain to build an LLM-based PDF Summarizer application and successfully run and deploy it.
Conclusion
We learned the basics of LangChain through the development of PDF Summarizer. We did this by following these steps:
- Set up our development environment, ensuring we had all the necessary tools and dependencies.
- Implemented basic functionality, creating a simple yet powerful frontend with Streamlit
- Developed a backend powered by LangChain to handle PDF text extraction and summarization.
- Discussed deploying the application using Streamlit.
By following these steps, you’ve seen how LangChain can streamline the development of applications that harness the capabilities of language models. The PDF summarizer is just the beginning. LangChain’s flexibility and power allow for the creation of various AI-driven applications, from chatbots and virtual assistants to content generation tools and beyond.
If you find this article helpful, do check out Codecademy’s Collection of AI Articles for similar articles.
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
Build AI agents with LangChain v1: step-by-step tutorial
Learn to build autonomous AI agents with LangChain v1's `create_agent` function. - Article
What is Streamlit? A Complete Guide for Building Data Apps
Learn what Streamlit is, how to install it, and build your first interactive data app with Python, no web dev skills needed. - Article
How to Build Agentic AI with LangChain and LangGraph
Learn to build AI agents with LangChain and LangGraph. Create autonomous workflows using memory, tools, and LLM orchestration.
Learn more on Codecademy
- Build powerful AI applications using LangChain and LangGraph.
- Includes 3 CoursesIncludes 3 Courses
- With CertificateWith Certificate
- Intermediate.4 hours4 hours
- Learn Streamlit to build and deploy interactive AI applications with Python in this hands-on course.
- With CertificateWith Certificate
- Intermediate.2 hours2 hours
- Learn to build autonomous AI agents that use tools, make decisions, and accomplish complex tasks using LangChain and agentic design patterns.
- Includes 6 CoursesIncludes 6 Courses
- With CertificateWith Certificate
- Intermediate.6 hours6 hours
