VOOZH about

URL: https://thenewstack.io/build-an-advanced-rag-application-using-myscaledb-and-llamaindex/

⇱ Build an Advanced RAG Application Using MyScaleDB and LlamaIndex - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-05-20 06:45:08
Build an Advanced RAG Application Using MyScaleDB and LlamaIndex
sponsor-myscale,sponsored-post-contributed,
AI / Large Language Models

Build an Advanced RAG Application Using MyScaleDB and LlamaIndex

To enhance performance when using RAG with LLMs, we use advanced techniques such as reranking, preprocessing and filtered queries.
May 20th, 2024 6:45am by Usama Jamil
👁 Featued image for: Build an Advanced RAG Application Using MyScaleDB and LlamaIndex
Image from maphke on Shutterstock
MyScale sponsored this post.

Eight large language models (LLMs) have brought immense value with their ability to understand and generate human-like text. However, these models also come with notable challenges. They are trained on vast datasets that demand extensive cost and time.

The extensive cost and time required to train these models on large datasets make it nearly impossible to retrain them regularly. This limitation means they often lack updates with the latest data, leading to potential inaccuracies when queried about unfamiliar topics. This phenomenon is known as “hallucination,” and it can deteriorate the performance of applications and raise concerns about their reliability and authenticity.

To overcome hallucination, several techniques are employed, with retrieval-augmented generation (RAG) being the most widely used due to its efficiency and performance.

I’ll show how to design a complete advanced RAG system that can be used in production environments.

What Is Retrieval Augmented Generation

RAG is the most widely used technique to overcome hallucination. It ensures that LLMs remain up to date with the most recent information and provide better responses. It dynamically retrieves relevant external data during the model’s response generation phase. This approach allows the LLM to access the most current information without the need for frequent retraining. It makes the model’s responses more accurate and contextually appropriate.

👁 Image

The process begins with a user query, which is transformed into embeddings via an embedding model to capture its semantic essence. These embeddings then undergo a similarity search against vectors in a knowledge base or vector database to identify the most relevant information. The top “K” results from this search are integrated as additional context into the LLM.

By processing both the original query and this supplementary data, the LLM is equipped to generate more accurate and contextually relevant responses. This not only mitigates the issue of hallucinations but also ensures the model’s outputs remain up to date and reliable without frequent retraining.

What Is LlamaIndex

LlamaIndex, previously known as the GPT Index, acts like glue that helps you connect LLMs and knowledge bases. It provides some built-in methods to fetch data from different sources and use it in your RAG applications. This includes a variety of file formats, such as .pdfs and PowerPoints, as well as applications like Notion and Slack and even databases like Postgres and MyScaleDB.

LlamaIndex provides important tools that help in collecting, organizing, retrieving and integrating data with various application frameworks. It makes your data easier to access and use, allowing you to build powerful, customized LLM applications and workflows.

👁 Image

Some of the main components of LlamaIndex include:

  1. Data connectors: These allow LlamaIndex to access a variety of data sources. Whether connecting to a local file system, a cloud-based storage service or a database, these connectors facilitate the retrieval of necessary information.
  2. Index: The Index in LlamaIndex is a crucial component that organizes data in a way that makes it quickly accessible. It categorizes the information from all connected sources into a structured format that is easy to search through. This helps speed up the retrieval process and ensures that the most relevant information is available for the LLM to use when needed.
  3. Query engine: This component is designed to efficiently search through the connected data sources. It processes your queries, finds relevant information and retrieves it so that the LLM can use it for generating responses.

Each component of LlamaIndex plays a key role in enhancing the capabilities of RAG applications by ensuring that they can access and use a wide range of data efficiently.

An Overview of MyScaleDB

MyScaleDB is an open source SQL vector database specially designed and optimized to manage large volumes of data for AI applications. It’s built on top of ClickHouse, a SQL database, combining the capacity for vector similarity search with full SQL support.

Unlike specialized vector databases, MyScaleDB seamlessly integrates vector search algorithms with structured databases, allowing both vectors and structured data to be managed together in the same database. This integration offers advantages like simplified communication, flexible metadata filtering, support for SQL and vector joint queries, and compatibility with established tools typically used with versatile general-purpose databases.

The integration of MyScaleDB in RAG applications enhances RAG applications by enabling more complex data interactions, directly influencing the quality of generated content.

RAG With LlamaIndex and MyScaleDB: a Step-by-Step Guide

👁 Image

To build the RAG application, first we need to create an account on MyScaleDB that will be used as a knowledge base. MyScaleDB offers every new user free storage for up to 5 million vectors, so no initial payment is required.

👁 Image

Once you have created your account, go to the homepage and click on “+ New Cluster” in the top right corner. This will open a dialogue box like this:

👁 Image

Enter the name of the cluster and click “Next.” It will take a few seconds to initialize your cluster and after that, you can access it.

To access the cluster, you can go back to your MyScaleDB profile, hover over the three vertically aligned dots below the “Actions” text and click on the connection details.

👁 Image

Once you click on the “Connection Details,” you will see the following box:

👁 Image

These are the connection details that you need to connect to the cluster. Just create a Python notebook file in your directory and we will start building our RAG app.

Setting Up the Environment

To install the dependencies, open your terminal and enter the command:

pip install -U llama-index clickhouse-connect llama-index-postprocessor-jinaai-rerank llama-index-vector-stores-myscale

This command will install all the required dependencies. Here we use Jina Reranker, whose algorithm significantly improves the search results, with a more than 8% increase in hit rate and a 33% increase in mean reciprocal rank.

Establishing a Connection With the Knowledge Base

First, you need to establish a connection with MyScale vector DB. For this you can copy the details from the “Connection Details” page and paste them like this:

It will establish a connection with your knowledge base and create an object.

Downloading and Loading Data

Here, we will use a Nike product catalog dataset. This code will first download the .pdf and save it locally. Then, it will load the .pdf using LlamaIndex reader.

Categorizing the Data

This function categorizes the documents into different categories. We will use it as we write some filtered queries on the entire knowledge base. By categorizing documents, targeted searches can be performed, significantly improving the efficiency and relevance of retrieval in the RAG system.

Create an Index

Here we will load the data into a vector store provided by `MyScaleVectorStore`. The metadata for each document is added first and then added to the vector store. Creating an index facilitates quick and efficient search operations. By indexing the data, the system can perform fast vector-based searches, which are essential for retrieving relevant documents based on similarity measures in RAG applications.

Note: When creating an index with MyScaleDB, it uses embedding models from OpenAI. To enable this, you must add your OpenAI key as an environment variable.

Simple Query

To execute a simple query, we need to convert our existing index into a query engine. The query engine is a specialized tool that can handle and interpret search queries.

Using the query engine, we execute a query to find “I want a few running shoes.” The engine processes this query, then searches through the indexed documents to find matches that best satisfy the query terms.

Filtered Query

Here, the query engine is configured with metadata filters using the `MetadataFilters` and `ExactMatchFilter` classes. The `ExactMatchFilter` is applied to the “Category” metadata field to only include documents that are explicitly categorized as “Running.” This filter ensures that the query engine will only consider documents related to running, which can lead to more relevant and focused results. The `similarity_top_k=2` configuration limits the search to the top two most similar documents, and `vector_store_query_mode=”hybrid` suggests a combination of vector and traditional search methods for optimal results.

This output should closely match the user’s query, showing how effectively metadata filters can improve the precision of search results.

So far, we have implemented RAG in its simplest form, which may not yield the best performance. To enhance the performance and provide users with the exact answers, we will now implement a re-ranker that will further filter the retrieved documents.

Adding a Reranker to Enhance Document Retrieval

This code integrates a reranking mechanism using Jina AI to refine the documents retrieved by the initial query.

Note: You can find the Jina Reranker key here. Click on the API and scroll down the newly opened page; you will find the API key right below the Reranker API section.

Conclusion

RAG significantly helps LLMs stay updated and ensure their responses are accurate and relevant. However, simple RAG systems often aren’t used in production-ready applications due to their performance. To enhance performance, we use advanced techniques such as reranking, preprocessing and filtered queries.

The choice of vector database is another factor that affects the performance of RAG systems.

It’s crucial to select a vector database tailored to the needs of your application. MyScaleDB, being an SQL vector database, is a good choice for developers with its familiar SQL interface, in addition to being affordable, fast and optimized for production-level applications.

If you have any suggestions, please reach out to us through Twitter or Discord.

MyScale is an open-source SQL vector database that allows to effectively manage massive volumes of both structured and vector data for developing robust AI applications. It enables every developer to build production-grade GenAI applications with powerful and familiar SQL.
Learn More
TRENDING STORIES
Usama Jamil, a developer advocate at MyScale, brings with him a wealth of experience and a profound interest in data science. With a passion for exploring new trends in the AI/ML domain, Usama strives to make complex concepts accessible to...
Read more from Usama Jamil
MyScale sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: ClickHouse, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.