VOOZH about

URL: https://thenewstack.io/building-an-extensible-genai-copilot-what-we-learned/

⇱ Building an Extensible GenAI Copilot: What We Learned - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-09-18 09:00:01
Building an Extensible GenAI Copilot: What We Learned
sponsor-rafay,sponsored-post-contributed,
AI / Large Language Models

Building an Extensible GenAI Copilot: What We Learned

Working through the complexities of developing an internal copilot helped us push the boundaries of what we believed possible with GenAI.
Sep 18th, 2024 9:00am by Rajat Tiwari
👁 Featued image for: Building an Extensible GenAI Copilot: What We Learned
Featured image by Vizag Explore on Unsplash.
Rafay sponsored this post.

Our generative AI (GenAI) journey began with a single use case: How could we make it easier for our customers to navigate the vast landscape of documentation and features within our platform?

It’s crucial that our users can quickly access and understand the product information they need to use our enterprise platform. So, our product team tasked our development team with developing a solution to simplify this process and enhance the overall user experience.

A few months ago, we launched Rafay Copilot, a GenAI-driven bot designed to do just that. This was no small feat. We faced many obstacles when building Copilot — from overcoming the steep learning curve of the AI landscape to ensuring the accuracy of responses and seamless integration with our existing systems. These obstacles forced us to rethink and refine our approach, pushing the boundaries of what we believed was possible with GenAI.

However, these challenges provided us with invaluable insights. As we worked through the complexities of creating Rafay Copilot, we began to see the broader potential of Gen AI. The problems we solved and our breakthroughs led us to realize that what we had developed could be expanded far beyond its original scope and into other areas of our platform.

Defining the Copilot Architecture

GenAI has the potential to apply to many use cases in our product, so a major goal while defining the Rafay Copilot architecture was to ensure that it could be easily extended to support other use cases. Such an architecture should be flexible and scalable to support more advanced use cases such as agents or other types of copilots.

👁 GenAI chatbot architecture powered by RAG

We used the LangChain framework to chain the requests before sending them to the large language models (LLMs). Qdrant serves as our vector database, efficiently storing and retrieving text embeddings. To monitor the system’s behavior and performance, we rely on Langfuse, which helps us track costs and debug AI services effectively.

Retrieval-augmented generation (RAG) allows the chatbot to access and retrieve specific private data, such as company documentation or proprietary knowledge, to provide more accurate and contextually relevant responses. Langfuse is our observability platform for monitoring and debugging AI applications.

LangGraph, while not currently implemented, could potentially be used for creating more complex, multistep AI workflows. LangSmith, another tool in the LangChain ecosystem, offers debugging and testing capabilities for LLM applications, which we may explore in future iterations.

API Gateway and AI Service Layer

All incoming requests to Rafay Copilot go through a centralized API gateway, which we call rafay-hub. This gateway’s primary responsibility is to handle authentication and standardize API requests for our upstream services, ensuring that only authorized users can access the system. Once authenticated and standardized, the request is forwarded to the AI service, which is a proxy for all our agents.

The AI service’s primary role is to decide which agent service should handle the request. We’ve designed this so that it’s easy for us to extend the system in the future. For example, we have an agent service specifically for interacting with OpenAI’s API for our GenAI chatbot. However, the modular design of this layer allows us to add new agents as needed, whether they involve other generative AI models or entirely different types of AI services.

Agent Services and LLM Integration

Once the AI service determines the appropriate agent, the request is sent to the corresponding agent service. This agent interacts directly with the underlying LLMs. The prompts that guide these interactions are stored in a Kubernetes (K8s) config map, which allows us to adjust them quickly based on system performance or user feedback. This configuration-driven approach enables rapid iteration and fine-tuning, ensuring our chatbot delivers the best possible responses for users.

Vector Database and Observability

To provide accurate and up-to-date information from our documentation, we’ve implemented a Kubernetes cron job that regularly pulls data from our GitHub repository, where all the docs are stored in Markdown format. This job processes the data and stores it in Qdrant, our vector database, making it easy for the chatbot to retrieve relevant information quickly.

For observability, we’ve integrated Langfuse into the agent services as a callback. This integration allows us to monitor the system in real time, providing insights into costs, responses and customer feedback when interacting with Copilot.

In summary, the Rafay Copilot architecture is thoughtfully designed to meet our users’ evolving needs. It focuses on flexibility, scalability and ease of future expansion.

Overcoming Challenges

Even though the architecture diagram above may make it appear that the process was straightforward, we faced multiple hurdles while building Rafay Copilot. Development posed significant challenges, which I’ve outlined below to give you a clearer picture.

  • Steep learning curve: For any organization beginning its AI journey, understanding the AI landscape presents a steep learning curve.
  • Evaluating LLM options: With the rise of Generative AI, an overwhelming number of LLMs are available, both from cloud providers and the open source community. Deciding which LLM is best for your business is challenging, and making the wrong choice can significantly impact your outcomes.
  • Governance:
    • Cost management: Since GenAI apps charge based on token usage, it’s crucial to have governance in place to monitor and control costs. Admins need the ability to assign token limits to users or teams on a daily, weekly or monthly basis.
    • Guardrails: Security is another key aspect. When interacting with LLMs, especially cloud-based ones, there’s always a risk of users inadvertently leaking sensitive data.
    • Secrets management: Managing API keys in an enterprise setting can be complex, especially when multiple users are involved. A gateway is necessary to handle this securely.
    • Access management: In organizations that use multiple models, proper access management is essential so that admins can control who has access to which models and to what datasets.
  • Prompt evaluation: Similar to LLM evaluation, prompt evaluation is critical to the success of your AI app. Even small changes in wording can lead to entirely different responses, making this a crucial but challenging task.
  • Observability: Like any production-ready application, AI apps require strong observability. Integrating tools like LangSmith, Langfuse and LangGraph into GenAI apps takes time and effort, not to mention the ongoing maintenance required by the site reliability engineering (SRE) team.

Conclusion

Building GenAI applications may sound easy, with a plethora of development tools and frameworks currently available. This may be the case for the early experimentation phase.

However, building enterprise-grade GenAI applications and deploying them in production has many challenges, including finding the right LLM, managing costs, devising data access controls, deploying prompt guardrails and maintaining observability. We used the insights we learned from building Rafay Copilot to develop GenAI Playgrounds, an integrated development environment (IDE) for developers to prototype and build GenAI applications rapidly. You can try it for free on Rafay’s website.

Enterprise platform teams must carefully look at all the security, data and cost-related considerations and adopt an enterprise-level strategy to solve these challenges before embracing Gen AI widely in the organization. It’s recommended that enterprise platform teams build an in-house solution or use a commercial tool to address these challenges and accelerate GenAI application development and deployment.

Rafay’s Cloud Automation Platform provides a solution for platform teams that wish to build automated self-service cloud infrastructure workflows, guardrails included, allowing platform teams to enable anyone who depends on rapid access to cloud infrastructure to move faster safely with golden paths.
Learn More
The latest from Rafay
TRENDING STORIES
Rajat Tiwari is a Senior Software Engineer at Rafay Systems, with a strong interest in the GenAI/ML space and cloud infrastructure. Prior to joining Rafay Systems, he was a Software Engineer at Citrix Systems, where he contributed to the development...
Read more from Rajat Tiwari
Rafay sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.