![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Retrieval-augmented generation (RAG) is quickly becoming a necessary element of generative AI applications. RAG endows pretrained AI models with superpowers of specialization, making them precise and accurate for vertical or task-specific applications. However, RAG also introduces new requirements around traffic, security and performance into your GenAI stack. With RAG comes new complexity and challenges that enterprises need to tackle with more sophisticated AI infrastructure.
RAG works by enhancing AI inferencing with relevant information from external data stores not included in the training corpus of the foundational model. This method provides the AI model with domain-specific knowledge without having to retrain the general model. In general, RAG models produce responses that are richer in context, more accurate and factually consistent. RAG can even be used to improve the performance of open-domain AI applications. RAG also makes AI inferencing more efficient by reducing the need for in-model data storage. This has several beneficial spillover effects.
RAG models can be smaller and more efficient because they do not need to encode all possible knowledge within their parameters. Instead, they can dynamically fetch information as needed. This can lead to reduced memory requirements and lower computational costs, as the model doesn’t need to store and process a vast amount of information internally.
While the benefits of RAG are manifest, adding what is effectively a new layer of queries, routing and traffic management adds additional complexity and security challenges.
One of the primary challenges with RAG is the increased complexity in managing traffic. RAG architectures rely on retrieving relevant documents or pieces of information in real time. This can lead to a significant surge in data traffic, which can cause bottlenecks if not managed properly. It also means that application performance depends not only on what the end user experiences from a latency and responsiveness standpoint, but also on information quality. If RAG is slow, the GenAI may still respond but with lower-quality outputs.
Security is another major concern when integrating RAG into GenAI applications. Retrieval often requires accessing proprietary databases or knowledge bases, increasing the potential attack surface. Ensuring the integrity and security of these data sources is critical to prevent data breaches or unauthorized access. RAG can also introduce new compliance issues if the data being accessed falls under regulations such as those required for finance or health care industries. Often the RAG layer is the logical place for this data, but that also means the RAG database must comply with all required regulations (HIPAA, Gramm-Leach Bliley, SOC2, etc.).
Teams should adopt robust authentication and authorization mechanisms to secure their RAG infrastructure and data retrieval process. This also means adopting robust API security for any service — internal or external — accessing a RAG stack. Employing encryption for RAG data in transit and at rest can safeguard sensitive information. Because RAG is where much of the sensitive data lies, it is also a good place for stricter authentication policies and zero trust deployments.
The effectiveness of a RAG system heavily depends on the quality of the data it retrieves. Poor quality or irrelevant data can lead to inaccurate or nonsensical outputs from the generative model. With real-time applications, data recency is also critical. If the RAG system is pulling from third-party data sources, then the GenAI application is subject to supply chain data quality risks. For enterprise apps or apps in sensitive areas like medicine or law, tolerance for bad responses due to poor data quality is close to zero.
To overcome this, teams should invest in maintaining high-quality and up-to-date data sources, and build automated data pipelines with redundant quality checks. They also should continuously monitor user behaviors and feedback for signs of data quality problems. Continuous monitoring and evaluation of the system’s output can also provide insights into areas that need improvement.
If you are delivering GenAI applications, you likely have RAG in your present or in your future. The benefits are tremendous. However, successful RAG rollouts require planning and thought. While RAG introduces significant benefits by enhancing the specialization and accuracy of generative AI applications, it also brings a set of complex challenges. Effective traffic management, stringent security measures, performance optimization, ensuring data quality and handling integration complexity are essential to successfully implementing RAG in GenAI stacks. For application delivery teams wrestling with GenAI challenges, RAG is a powerful way to make almost everything in their AI apps run better — with the right preparation and mindset.