VOOZH about

URL: https://thenewstack.io/reduce-ai-hallucinations-with-retrieval-augmented-generation/

⇱ Reduce AI Hallucinations with Retrieval Augmented Generation - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-07-18 06:00:05
Reduce AI Hallucinations with Retrieval Augmented Generation
sponsor-datastax,sponsored-post-contributed,
AI / Data / Large Language Models

Reduce AI Hallucinations with Retrieval Augmented Generation

This newly devised technique shows promise in increasing the knowledge of LLMs by enabling prompts to be augmented with proprietary data.
Jul 18th, 2023 6:00am by Ryan Michael
👁 Featued image for: Reduce AI Hallucinations with Retrieval Augmented Generation
Image via Shutterstock.
DataStax sponsored this post.

In the rapidly evolving world of AI, large language models have come a long way, boasting impressive knowledge of the world around us. Yet LLMs, as intelligent as they are, often struggle to recognize the boundaries of their own knowledge, a shortfall that often leads them to “hallucinate” to fill in the gaps. A newly devised technique, known as retrieval augmented generation (RAG), shows promise in efficiently increasing the knowledge of these LLMs and reducing the impact of hallucination by enabling prompts to be augmented with proprietary data.

Navigating the Knowledge Gap in LLMs

LLMs are computer models capable of comprehending and generating human-like text. They’re the AI behind your digital assistant, autocorrect function and even some of your emails. Their knowledge of the world is often immense, but it isn’t perfect. Just like humans, LLMs can reach the limits of their knowledge but, instead of stopping, they tend to make educated guesses or “hallucinate” to complete the task. This can lead to results that contain inaccurate or misleading information.

In a simple world, the answer would be to provide the model with relevant proprietary information at the exact time it’s needed, right when the query is made. But determining what information is “relevant” isn’t always straightforward and requires an understanding of what the LLM has been asked to accomplish. This is where RAG comes into play.

DataStax, an IBM company, provides the real-time vector data tools that Gen AI apps need, with seamless integration with developers’ stacks of choice.
Learn More
The latest from DataStax

The Power of Embedding Models and Vector Similarity Search

Embedding models, in the world of AI, act like translators. They transform text documents into a large list of numbers, through a process known as “document encoding.” This list represents the LLM’s internal “understanding” of the document’s meaning. This string of numbers is known as a vector: a numeric representation of the attributes of a piece of data. Each data point is represented as a vector with many numerical values, where each value corresponds to a specific feature or attribute of the data.

While a string of numbers might seem meaningless to the average person, these numbers serve as coordinates in a high-dimensional space. In the same way that latitude and longitude can describe a location in a physical space, this string of numbers describes the original text’s location in semantic space, the space of all possible meanings.

Treating these numbers as coordinates enables us to measure the similarity in meaning between two documents. This measurement is taken as a distance between their respective points in the semantic space. A smaller distance would indicate a greater similarity in meaning, while a larger distance suggests a disparity in content. Consequently, information relevant to a query can be discovered by searching for documents “close to” the query in semantic space. This is the magic of vector similarity search.

👁 Image

The Idea Behind Retrieval Augmented Generation

RAG is a generative AI architecture that applies semantic similarity to automatically discover information relevant to a query.

In a RAG system, your documents are stored in a vector database (DB). Each document is indexed based on a semantic vector produced by an embedding model so that finding documents close to a given query vector can be done quickly. This essentially means that each document is assigned a numerical representation (the vector), which indicates its meaning.

👁 Image

When a query comes in, the same embedding model is used to produce a semantic vector for the query.

👁 Image

The model then retrieves similar documents from the DB using vector search, looking for documents whose vectors are close to the vector of the query.

👁 Image

Once the relevant documents have been retrieved, the query, along with these documents, is used to generate a response from the model. This way, the model doesn’t have to rely solely on its internal knowledge but can access whatever data you provide it at the right time. The model is therefore better equipped to provide more accurate and contextually appropriate responses, by incorporating proprietary data stored in a database that offers vector search as a feature.

There are a handful of so-called “vector databases” available, including DataStax Astra DB, for which vector search is now generally available (read about the news here). The main advantage of a database that enables vector search is speed. Traditional databases have to compare a query to every item in the database. In contrast, integrated vector search enables a form of indexing and includes search algorithms that vastly speed up the process, making it possible to search massive amounts of data in a fraction of the time it would take a standard database.

👁 Image

Fine-tuning can be applied to the query encoder and result generator for optimized performance. Fine-tuning is a process where the model’s parameters are slightly adjusted to better adapt to the specific task at hand.

RAG Versus Fine-Tuning

Fine-tuning offers many benefits for optimizing LLMs. But it’s also got some limitations. For one, it doesn’t allow for dynamic integration of new or proprietary data. The model’s knowledge remains static post-training, leading it to hallucinate when asked about data outside of its training set. RAG, on the other hand, dynamically retrieves and incorporates up-to-date and proprietary data from an external database, mitigating the hallucination issue and providing more contextually accurate responses. RAG gives you query-time control over exactly what information is provided to the model, allowing prompts to be tailored to specific users at the exact time a query is made.

RAG is also more computationally efficient and flexible than fine-tuning. Fine-tuning requires the entire model to be retrained for each dataset update, a time-consuming and resource-intensive task. Conversely, RAG only requires updating the document vectors, enabling easier and more efficient information management. RAG’s modular approach also allows for the fine-tuning of the retrieval mechanism separately, permitting adaptation to different tasks or domains without altering the base language model.

RAG enhances the power and accuracy of large language models, making it a compelling alternative to fine-tuning. In practice, enterprises tend to use RAG more often than fine-tuning.

Changing the Role of LLMs with RAG

Integrating RAG into LLMs doesn’t only improve the accuracy of their responses, but it also maximizes their potential. The process enables LLMs to focus on what they excel at: intelligently generating content from a prompt. The model is no longer the sole source of information because RAG provides it with relevant proprietary knowledge when required, and the corpus of knowledge accessible to the model can be expanded and updated without expensive model-training jobs.

In essence, RAG acts as a bridge, connecting the LLM to a reservoir of knowledge that goes beyond its internal capabilities. As a result, it drastically reduces the LLM’s tendency to “hallucinate” and provides a more accurate and efficient model for users.

DataStax today announced the general availability of vector search capability in Astra DB. Learn about it here

DataStax, an IBM company, provides the real-time vector data tools that Gen AI apps need, with seamless integration with developers’ stacks of choice.
Learn More
The latest from DataStax
TRENDING STORIES
Ryan Michael leads strategy and engineering for DataStax’s Emerging AI Tech team, which is responsible for developing solutions at the intersection of AI, search, and real-time data. Ryan joined DataStax with the acquisition of Kaskada, an opensource stream-processing project focused...
Read more from Ryan Michael
DataStax sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.