VOOZH about

URL: https://thenewstack.io/building-gpt-applications-on-open-source-langchain-part-2/

⇱ Building GPT Applications on Open Source LangChain, Part 2 - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-06-16 08:15:15
Building GPT Applications on Open Source LangChain, Part 2
sponsor-singlestore,sponsored-post-contributed,
AI / Data / Software Development

Building GPT Applications on Open Source LangChain, Part 2

We’ll use the fast-rising LLM application framework for a practical example of how to use a GPT to help answer a question from a PDF document.
Jun 16th, 2023 8:15am by Akmal Chaudhri
👁 Featued image for: Building GPT Applications on Open Source LangChain, Part 2
SingleStore sponsored this post. Insight Partners is an investor in SingleStore and TNS.
This is the second of two articles. In the previous article, we discussed three considerations for developers when building GPT applications with an open source stack, such as LangChain. Let’s now use LangChain for a practical example where we want to store and analyze PDF documents. We’ll obtain a PDF document, divide it into smaller parts, save the document text and its vector representations (embeddings*) in a database system and then query it. We’ll also use a GPT to help answer a question. *In a GPT, an embedding is simply a numerical representation of a word or phrase. Vectors represent the semantic meaning of words and phrases in a way that a machine-learning model can understand.
Designed for intelligent applications, SingleStore is the world’s only real-time data platform that can read, write and reason on petabyte-scale data in a few milliseconds. Insight Partners is an investor in SingleStore and TNS.
Learn More
The latest from SingleStore

Create a SingleStoreDB Cloud Account

First, sign up for a free SingleStoreDB Cloud account. Once logged in, select CLOUD > Create new workspace group from the left-hand navigation pane. Next, choose Create Workspace and just work through the wizard. Here are the recommended settings for this example:

Create Workspace Group

Workspace Group Name: LangChain Demo Group Cloud Provider: AWS Region: US East 1 (N. Virginia) Click Next.

Create Workspace

Workspace Name: langchain-demo Size: S-00 Click Create Workspace. Once the workspace is created and available, from the left-hand navigation pane, select DEVELOP > SQL Editor to create a new database, as follows: CREATE DATABASE IF NOT EXISTS pdf_db;

Create a Notebook

From the left-hand navigation pane, select DEVELOP > Notebooks. In the top right of the web page, select New Notebook > New Notebook, as shown in Figure 1 below. 👁 Image
We’ll call the notebook langchain_demo. Select a Blank notebook template from the available options. We’ll also select the Connection and Database using the drop-down menus above the notebook, as shown in Figure 2.
👁 Image

Figure 2. Connection and Database

Fill out the Notebook

First, we’ll import some libraries:
!pip install langchain --quiet
!pip install openai --quiet
!pip install pdf2image --quiet
!pip install tabulate --quiet
!pip install tiktoken --quiet
!pip install unstructured --quiet

Next, we’ll read in a PDF document. This is an article by Neal Leavitt titled “Whatever Happened to Object-Oriented Databases?” OODBs were an emerging technology during the late 1980s and early 1990s. We’ll add `leavcom.com` to the firewall by selecting the Edit Firewall option in the top right. Once the address has been added to the firewall, we’ll read the PDF file:
from langchain.document_loaders import OnlinePDFLoader
loader = OnlinePDFLoader("http://leavcom.com/pdf/DBpdf.pdf")
data = loader.load()

We can use LangChain’s OnlinePDFLoader, which makes reading a PDF file easier. Next, we’ll get some data on the document:
from langchain.text_splitter import RecursiveCharacterTextSplitter

print (f"You have {len(data)} document(s) in your data")
print (f"There are {len(data[0].page_content)} characters in your document")

The output should be:
You have 1 document(s) in your data
There are 13040 characters in your document

We’ll now split the document into pages containing 2,000 characters each, giving us seven pages:
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 2000, chunk_overlap = 0)
texts = text_splitter.split_documents(data)

print (f"You have {len(texts)} pages")

Next, we’ll create a table to store the text and embeddings. We can do this directly using the `%%sql` magic command:
%%sql

USE pdf_db;
DROP TABLE IF EXISTS pdf_docs;
CREATE TABLE IF NOT EXISTS pdf_docs (
 id INT PRIMARY KEY,
 text TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
 embedding BLOB
);

To use Python code to connect to our database, we can use the built-in `connection_url`, as follows:
from sqlalchemy import *
db_connection = create_engine(connection_url)

We’ll set our OpenAI API Key:
import openai
openai.api_key = "OpenAI API Key"

and use LangChain’s `OpenAIEmbeddings`:
from langchain.embeddings import OpenAIEmbeddings
embedder = OpenAIEmbeddings(openai_api_key = openai.api_key)

Now we are ready to obtain the vector embeddings and store them in the database system:
db_connection.execute("TRUNCATE TABLE pdf_docs")

for i, document in enumerate(texts):
 text_content = document.page_content

 embedding = embedder.embed_documents([text_content])[0]

 stmt = """
 INSERT INTO pdf_docs (
 id,
 text,
 embedding
 )
 VALUES (
 %s,
 %s,
 JSON_ARRAY_PACK_F32(%s)
 )
 """

 db_connection.execute(stmt, (i+1, text_content, str(embedding)))

We truncate the table to ensure that we start with an empty table. Then we iterate through the pages of text, obtain the embeddings from OpenAI, and store the text and embeddings in the database table. We can now ask a question, as follows:
query_text = "Will object-oriented databases be commercially successful?"

query_embedding = embedder.embed_documents([query_text])[0]

stmt = """
 SELECT
 text,
 DOT_PRODUCT_F32(JSON_ARRAY_PACK_F32(%s), embedding) AS score
 FROM pdf_docs
 ORDER BY score DESC
 LIMIT 1
"""

results = db_connection.execute(stmt, str(query_embedding))

for row in results:
 print(row[0])

Here we convert the question into vector embeddings, perform a `DOT_PRODUCT` and return only the highest-scoring value. Finally, we can use a GPT to provide an answer, based on the earlier question:
prompt = f"The user asked: {query_text}. The most similar text from the document is: {row[0]}"

response = openai.ChatCompletion.create(
 model="gpt-3.5-turbo",
 messages=[
 {"role": "system", "content": "You are a helpful assistant."},
 {"role": "user", "content": prompt}
 ]
)

print(response['choices'][0]['message']['content'])

Here is some example output: Based on the information provided in the document, it seems that object-oriented databases are not expected to be commercially successful in the near future. While they are gaining some popularity in niche markets such as CAD and telecommunications, relational databases continue to dominate the market and are expected to do so for the foreseeable future. IDC predicts that the growth rate for relational databases will be significantly higher than that of OO databases through 2004. However, OO databases still have their place in certain niche markets.

Summary

In this example, we saw the benefits of LangChain in the application development process. We also saw how easily we can convert documents from one format to another, store the content in a database system, generate vector embeddings and ask questions about the data stored in the database system. We also have the full power of SQL available if we are interested in performing additional query operations on the data. I will host a workshop on June 22 and will go through building a ChatGPT application using LangChain. I hope you can join. Sign up here.
Designed for intelligent applications, SingleStore is the world’s only real-time data platform that can read, write and reason on petabyte-scale data in a few milliseconds. Insight Partners is an investor in SingleStore and TNS.
Learn More
The latest from SingleStore
TRENDING STORIES
Akmal Chaudhri helps build global developer communities and raise awareness of technology through presentations and technical writing. He has held roles as a developer, consultant, product strategist, evangelist, technical writer and technical trainer with several Blue Chip companies and big...
Read more from Akmal Chaudhri
SingleStore sponsored this post. Insight Partners is an investor in SingleStore and TNS.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma, SingleStore, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.