VOOZH about

URL: https://thenewstack.io/how-to-put-guardrails-around-containerized-llms-on-kubernetes/

⇱ How to Put Guardrails Around Containerized LLMs on Kubernetes - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-03-25 07:00:23
How to Put Guardrails Around Containerized LLMs on Kubernetes
kubecon-cloudnativecon-eu-2025,sponsor-cncf,sponsored-post-contributed,
Kubecon Cloudnativecon EU 2025 / Kubernetes / Large Language Models / Security

How to Put Guardrails Around Containerized LLMs on Kubernetes

Combining application-level guardrails with container-level security measures creates a robust, multilayered defense for LLMs.
Mar 25th, 2025 7:00am by Sanjay Basu and Victor Agreda
👁 Featued image for: How to Put Guardrails Around Containerized LLMs on Kubernetes
Featured image by Álvaro Ibáñez on Unsplash.
CNCF sponsored this post.

As large language models (LLMs) grow increasingly integral to enterprise applications, it becomes paramount to deploy them securely. Common threats, such as prompt injections, can lead to unintended behaviors, data breaches or unauthorized access to internal systems. Traditional application-level security measures, while valuable, are often insufficient to protect LLM endpoints.

Containerization can help address these challenges. By wrapping LLMs and their supporting components in containers, organizations can enforce strict security boundaries at the infrastructure level. This multilayered approach, combined with robust guardrail mechanisms (e.g., those from NVIDIA NeMo Guardrails), helps prevent suspicious or malicious prompts from reaching the core model logic.

This article outlines the design and considerations needed to implement an enterprise-grade secure LLM deployment. As an illustration, we are using the Kubernetes platform on Oracle Cloud Infrastructure (OCI) with OCI Kubernetes Engine (OKE), which is highly conformant with the Cloud Native Computing Foundation’s (CNCF) open source Kubernetes. This implementation should also work with open source Kubernetes. Our example features:

  • Container-based guardrails to counter prompt injection attacks.
  • Multilayered network, resource and access policies in OKE.
  • Integration with Kubeflow for continuous training, validation and deployment (machine learning operations or MLOps).

End-to-End Workflow Diagram

This diagram shows the architecture of the solution we’ll present in this article.

👁 E2E workflow of an enterprise-grade secure LLM deployment

Understanding Prompt Injection Vulnerabilities

Prompt injection is a type of attack specific to LLMs where an adversary crafts inputs (prompts) that manipulate the model’s behavior. For example, an attacker might craft text that bypasses content filters or reveals system instructions that should remain confidential. This exploitation can lead to:

  • Unauthorized data access: Gaining insights into hidden prompts, confidential user data or system APIs.
  • Unexpected LLM behavior: Producing harmful, biased or disallowed outputs.

Prompt injection attacks pose unique challenges, making them particularly dangerous in enterprise environments. Unlike traditional injection attacks such as SQL injection or cross-site scripting (XSS) that can be detected through signature matching, prompt injections involve subtle text manipulations that easily evade standard detection methods.

The dynamic, context-dependent nature of LLMs further complicates this issue, as malicious actors can exploit the model’s own reasoning capabilities to circumvent protective measures. Moreover, in complex enterprise systems, a successful prompt injection attack that reveals sensitive data can serve as a stepping stone for attackers to mount broader network-based threats.

Guardrails and Container-Based Security

What we need are safety filters, and there are options for creating such filters capable of scanning and sanitizing prompts. Let’s look at one such solution:

NVIDIA Guardrails

NVIDIA Guardrails is an open source framework for integrating safety filters that can scan and sanitize prompts before they reach the LLM inference engine. Key features include:

  • Text filtering: Identifying malicious or suspicious patterns in prompts.
  • Context enforcement: Restricting the operational context (e.g., ensuring the LLM only discusses certain topics).
  • Adaptive learning: Continuously improving rule sets and response strategies as new threats emerge.

When paired with Kubernetes’ container orchestration, NVIDIA Guardrails can run as a pre-inference container that enforces security policies on incoming requests.

Multilayered Container Security Controls

The integration of guardrail logic within containers provides multiple layers of security benefits that enhance the overall protection of LLM deployments. These containerized security controls work in concert with the guardrails to create a comprehensive defense strategy. The benefits include:

  • Network isolation: Kubernetes network policies limit traffic between pods. The LLM container communicates only with the guardrail container and authorized services.
  • Resource constraints: CPU and memory limits in Kubernetes help prevent any single container from monopolizing cluster resources or triggering a denial-of-service scenario.
  • Runtime security policies: Tools like Seccomp, AppArmor or SELinux reduce the attack surface by limiting system calls that containers can execute.

Container-Level Access Controls

Organizations deploying LLM endpoints in containers can implement several crucial access controls to enhance security:

  • Least privilege access: Granular role-based access control (RBAC) limits who can deploy, modify or access logs from specific containers.
  • Secrets management: This method securely stores API keys, encryption keys or credentials in services like OCI Vault or in Kubernetes secrets.

Deployment Model

On a high level, the deployment model implements a structured approach to processing user requests, incorporating multiple specialized containers that each serve specific security and operational functions. This architecture helps ensure that every request undergoes appropriate validation and processing before reaching the LLM and returning to the user. Some of the components include:

  • User prompt: The user’s request enters the system via a frontend application or an API gateway.
  • Guardrail container: The request is forwarded to a specialized container (e.g., NVIDIA Guardrails) that inspects the prompt for malicious or disallowed content.
  • LLM inference container: If the prompt passes the guardrail checks, it is forwarded to the container hosting the LLM. All inference operations and stateful data remain contained here.
  • Output processing container: Optionally, another container can recheck the LLM’s response or sanitize it before returning it to the user.
  • Continuous monitoring: Logs and metrics feed into a centralized monitoring stack, alerting operators if suspicious activity occurs.
👁 OKE environment

OKE Environment

Incorporating Kubeflow for MLOps

Kubeflow serves as an essential open source MLOps platform that runs natively on Kubernetes, providing crucial capabilities for managing LLM deployments. It enables comprehensive experiment tracking, allowing teams to monitor and compare various fine-tuning experiments for LLM models.

Through pipeline automation features, Kubeflow streamlines the workflow from data ingestion through model training, validation and deployment. Leveraging Kubernetes-native scaling capabilities, it efficiently handles large training data sets and supports multiple concurrent experiments, making it ideal for enterprise-scale LLM operations.

Workflow Integration

Kubeflow’s workflow integration capabilities enhance the security and reliability of LLM deployments by automating critical processes and ensuring consistent application of security controls. The platform supports several key security-focused workflows:

  • Training and validation: Use Kubeflow pipelines to schedule and automate data preparation, LLM fine-tuning and validation steps.
  • Guardrail rule updates: As you discover new potential prompt injection patterns during training or from production logs, you can update the guardrail rules or filters. This update can be automatically applied to the guardrail container via Kubernetes rolling updates.
  • Deployment: Kubeflow triggers container builds and deployments in Kubernetes when a model or guardrail rule set is validated, ensuring continuous delivery with minimal downtime.

Operational Best Practices

  • Continuous monitoring and logging: Collect logs from both guardrail and LLM containers. Tools like Prometheus and Grafana track response times, errors and usage patterns.
  • Audit logging: For compliance purposes, maintain logs of who accessed the LLM, prompts entered and if any were flagged by the guardrail container.
  • Regular security assessments: Periodically run penetration tests focusing on prompt injections and attempts to bypass guardrail logic.
  • Multicluster or hybrid deployments: For disaster recovery or specialized use cases, consider deploying across multiple clusters or hybrid setups (on-premises + cloud).

Conclusion

Prompt injections represent a serious and evolving threat to LLM deployments. By combining application-level guardrails (e.g., NVIDIA Guardrails) with container-level security measures, organizations can implement a robust, multilayered defense. This approach helps to prevent malicious or manipulative inputs from compromising the LLM’s functionality or the broader infrastructure.

Using Kubeflow for MLOps adds further resilience and agility, enabling continuous improvement of both the LLM model and its associated guardrail rules. This containerized, highly orchestrated architecture provides the necessary scalability, security and manageability for enterprise-grade LLM deployments.

To learn more about Kubernetes and the cloud native ecosystem, join us at KubeCon + CloudNativeCon Europe in London on April 1-4.

The Cloud Native Computing Foundation (CNCF) hosts critical components of the global technology infrastructure including Kubernetes, OpenTelemetry, and Argo. CNCF is the neutral home for cloud native collaboration, bringing together the industry’s top developers, end users, and vendors.
Learn More
The latest from CNCF
TRENDING STORIES
Sanjay Basu PhD, is Senior Director - Gen AI/GPU Cloud Engineering at Oracle. He focuses on the advanced services like Generative AI, Machine-Learning, GPU Engineering, Blockchain, Microservices, Industrial IoT, 5G core along with Cloud Security and Compliance. He has double...
Read more from Sanjay Basu
Victor is a content strategist at Oracle. He is a writer and father of two based in Knoxville, TN.
Read more from Victor Agreda
CNCF sponsored this post.
SHARE THIS STORY
TRENDING STORIES
Oracle is also a sponsor of The New Stack.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.