![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Retrieval-augmented generation (RAG) is a widely used technique that augments large language models (LLMs) and GenAI apps by providing contextual information from external sources.
This method can significantly mitigate LLMs’ annoying hallucination issues. For example, if you ask a GenAI app to write an article about sharks, a RAG approach helps to ensure that the AI doesn’t make up a new type of shark or create new “facts” about known species. In addition, RAG also allows users to use domain-specific or private data for content generation while ensuring data security.
How does RAG work? Everything starts with a query. There are three key component steps in the RAG process: retrieval, augmentation and generation.
In a basic RAG system, the embedding model, the vector database and the LLM are the three most crucial building blocks. When you build a RAG framework, you need to decide early on what technologies best suit your application.
Basically, you can use any embedding model relevant to your application data to create vector embeddings, but each model has a unique way of generating vectors. This means you need to use the same model to generate the vector embeddings for both queries and datasets.
What vector databases you need to choose depends on the size of your data, the purpose of your applications, the data requirements you need to meet and so many other factors. Therefore, if you have a large dataset and want to build a RAG app for production, it is important to choose a vector database that can handle the scale.
Choosing the right LLM can be challenging as well. Fortunately, AWS Bedrock offers a variety of pretrained models, including embedding models and LLMs, to simplify this process. AWS Bedrock is a cloud service that provides access to these models, allowing you to select the one that best fits your application. You can use the chosen model for generating vector embeddings and as the LLM component of your RAG framework.
This example shows you how to integrate LangChain, Zilliz Cloud (the managed version of Milvus) and AWS Bedrock. Let’s take a guided tour through the example.
There are four main steps to this integration:
To install the required packages, run the following script.
Once you’ve installed everything, configure the requisite environment variables to ensure that Zilliz and Bedrock can talk to each other. On the AWS side, you’ll need the AWS region name, key ID and access key. On the Zilliz side, you’ll need the cloud Uniform Resource Identifier (URI) and API key.
The AWS SDK for Python (boto3) lets you create, configure and manage AWS services. Next, you’ll create a boto3 client to connect to the AWS Bedrock Runtime service.
Use a ChatBedrock instance to gain access to all the Bedrock models. In this example, we’ll link it to `anthropic.claude-3-sonnet-20240229-v1:0`.
You can select any of the other Bedrock models, but we use this one because it provides the infrastructure for generating text responses with model-specific settings, such as a low-temperature parameter to control response variability.
Now that everything is connected, we need to get some data from external sources. In this example, we’re pulling data from a specific web source: a blog post about AI agents.
We’ll use a WebBaseLoader instance to grab that data and then leverage the loader’s BeautifulSoup SoupStrainer function to parse the relevant parts of the web page. We’re only targeting the following classes: “post-content,” “post-title” and “post-header.”
Once that data is loaded, we use a RecursiveCharacterTextSplitter instance to split it into smaller pieces, making it easier to work with and load into other components.
Now we want to use the data we loaded to generate new content. We also want to ensure the output is accurate and mitigates AI hallucination. We instruct the AI to use statistical information and hard data whenever possible to support its claims.
The response should be specific and use statistics or numbers when possible.
Next, we initialize a Zilliz vector store containing the embeddings of the chunked documents. Having the documents as vectors is what makes it possible for RAG to do a semantic search to find and retrieve documents quickly and efficiently. The output should provide accurate, insightful, relevant and fact-based answers.
To recap, here are the steps for RAG chain:
For the full code of this example, please refer to this notebook.
A RAG framework can enhance a lot of different use cases. The following list includes brief use-case descriptions. As you can see, these use cases span a variety of industries and verticals. Depending on your goals, you can find or build niche LLMs for these and other use cases.
RAG frameworks can provide detailed and accurate answers to user questions by retrieving relevant information from a large database and generating a coherent response.
Automated customer support systems can use RAG to find relevant information in support documents, manuals or FAQs and generate helpful responses to customer inquiries.
RAG frameworks can help create content by retrieving relevant information from various sources and generating articles, reports or summaries.
In recommendation systems, RAG can enhance the generation of personalized recommendations by retrieving and synthesizing information based on user preferences and past behavior.
Educational platforms can use RAG to generate personalized study materials, answer student questions and provide explanations based on a vast pool of educational resources.
RAG frameworks can benefit legal and medical professionals by allowing them to retrieve and synthesize information from case laws, medical literature and patient records to assist in decision-making and provide advice.
RAG can be used to create dynamic and interactive storytelling experiences in games, where the system generates plot twists and dialogues based on retrieved story elements and user interactions.
Researchers can use RAG to gather and summarize relevant research papers, patents or technical documents, helping them stay updated with the latest developments and find connections between different pieces of information.
Virtual assistants can use RAG to provide more accurate and contextually relevant responses by retrieving information from a knowledge base and generating appropriate replies.
Businesses can use RAG to analyze market trends, competitor strategies and customer feedback by retrieving relevant data and generating insightful reports and action plans.
Developers can use RAG frameworks to generate code snippets, documentation or explanations by retrieving relevant programming information from code repositories and technical documentation.
A RAG framework provides developers with a way to leverage large datasets, whether structured or unstructured, to build applications that are accurate and reliable. Pairing Zilliz Cloud with AWS Bedrock in a RAG framework gives you quick access to powerful tools. The prebuilt models in AWS Bedrock give you many options for building a wide range of GenAI applications. This getting-started tutorial is just the tip of the iceberg. To learn more about Zilliz Cloud, visit zilliz.com.