VOOZH about

URL: https://thenewstack.io/cloud-native-and-open-source-help-scale-agentic-ai-workflows/

⇱ Cloud Native and Open Source Help Scale Agentic AI Workflows - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-06-12 13:00:35
Cloud Native and Open Source Help Scale Agentic AI Workflows
sponsor-oracle,sponsored-post-contributed,
AI / AI Agents / Kubernetes / Operations

Cloud Native and Open Source Help Scale Agentic AI Workflows

Small language models (SLMs) paired with Kubernetes and Function as a Service (FaaS) have emerged as alternatives to LLMs for agentic AI use cases.
Jun 12th, 2025 1:00pm by Sanjay Basu
👁 Featued image for: Cloud Native and Open Source Help Scale Agentic AI Workflows
Image from VectorMine on Shutterstock.
Oracle sponsored this post.
Enterprise automation is increasingly leveraging intelligent agent workflows driven by AI, typically relying on large language models (LLMs) for these applications. While LLMs can address many general-purpose use cases, the deployment and orchestration of these models can add significant complexity and high operational costs. To tackle enterprise-specific use cases, organizations have begun seeing the benefit of smaller models. As a result, small language models (SLMs) paired with contemporary cloud native platforms such as Kubernetes and Function as a Service (FaaS) have emerged as an alternative to address agentic AI use cases. Let’s explore how to effectively use cloud native paradigms to deploy and scale SLM-based agentic workflows. Specifically, how to use Kubernetes, Knative and serverless platforms to help dynamically manage inference workloads, optimize resource utilization and accelerate innovation in agent-driven AI applications.

Why Small Language Models?

While LLMs have gained popularity for their impressive capabilities, their high computational requirements and significant infrastructure overhead often limit their practical deployment at scale. SLMs, typically with fewer parameters and leaner computational demands, can offer substantial advantages in scenarios where responsiveness, scalability and cost-efficiency are critical. An example of an SLM is Microsoft’s Phi-3-mini. Its relatively small number of parameters (3.8 billion) translates to a smaller memory footprint and faster processing times. Other examples of SLMs include Mistral 7B, Llama 3.2 and Google Gemma 2B, which are well suited for running on smaller GPUs and CPUs. These models are designed for efficiency and can all be deployed in various settings, including edge devices like laptops. For many agentic workflows, such as real-time customer interactions, DevOps automation, anomaly detection and data enrichment, SLMs tend to deliver sufficient accuracy and significantly lower latency. Their smaller footprint makes them ideal candidates for cloud native architectures, emphasizing agility and cost-effectiveness.

Cloud Native Architectures: Kubernetes and FaaS

The Cloud Native Computing Foundation (CNCF) ecosystem provides robust tools that enable efficient and scalable AI deployments. At the core is Kubernetes, a container orchestration platform renowned for automating application deployment, scaling and management. Kubernetes facilitates containerized deployments, enabling efficient resource allocation and seamless scalability. A rich ecosystem of CNCF projects complements Kubernetes, including Knative. This FaaS platform provides developers and MLOps teams with critical building blocks for deploying serverless workloads on Kubernetes, enabling automatic scaling based on demand and helping reduce operational overhead by dynamically managing container life cycles. Using these technologies together can help organizations deploy SLM-based agents rapidly, scale seamlessly under varying workloads and maintain cost efficiency.

Practical Implementation

To create the following implementation, we are using OCI Kubernetes Engine (OKE) on Oracle Cloud Infrastructure (OCI). OKE provides a fully managed Kubernetes environment, simplifying the setup and operation of production-grade Kubernetes clusters. It is conformant with the CNCF’s open source Kubernetes, and the example below should also work using that. In addition, integrating Knative into OKE creates a robust serverless infrastructure for SLM deployment.

Architectural Blueprint

An effective cloud native architecture leveraging OCI, Kubernetes and FaaS for SLM deployment consists of several key components like those listed below:
  • Oracle Kubernetes Engine (OKE): Manages Kubernetes clusters, automating orchestration, security and scaling.
  • Knative Serving: Provides serverless capabilities, automatically scaling SLM containers up and down based on inference requests.
  • OCI object storage: Stores model artifacts and configuration files, facilitating easy deployment and updates.
  • Prometheus and Grafana: Integrate via CNCF tools; they monitor performance metrics, resource utilization and scaling behavior.
  • Istio service mesh: Offers advanced traffic management, security and observability.

Step-by-Step Deployment Guide

  1. Prepare Your Kubernetes Cluster
Provision a Kubernetes cluster using OCI’s managed Kubernetes service. This simplifies cluster management, leaving you free to focus on deployment specifics:
oci ce cluster create --name my-oke-cluster --kubernetes-version v1.29.0
  1. Install Knative Serving
Deploy Knative Serving using YAML manifests, enabling serverless functionality:
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.14.3/serving-crds.yaml
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.14.3/serving-core.yaml

  1. Containerize the Small Language Model
Use Docker or an OCI-compliant container registry to package your SLM with lightweight runtime environments such as FastAPI or Flask:
FROM python:3.11-slim
COPY ./model ./model
COPY requirements.txt ./
RUN pip install -r requirements.txt
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

  1. Deploy the Serverless SLM via Knative
Create a Knative Service YAML manifest:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
 name: slm-agent
spec:
 template:
 spec:
 containers:
 - image: oci-container-registry/my-slm-agent:v1.0
 resources:
 requests:
 cpu: 500m
 memory: 512Mi

Apply the manifest to deploy your model as a Knative service:
kubectl apply -f slm-agent.yaml
Knative automatically scales your SLM agent based on incoming requests, spinning up and tearing down containers as needed, optimizing resource use and cost.
  1. Monitoring and Optimization
Using tools such as Prometheus and Grafana deployed through Helm charts, monitor SLM agent performance, latency and resource utilization:
helm install prometheus prometheus-community/kube-prometheus-stack
Configure Istio service mesh for detailed traffic management and security.

Addressing Industry-Specific Use-Cases

Real-Time Customer Support

Deploying SLM agents for real-time chat support can help enhance customer interaction efficiency by significantly reducing response latency. Cloud native agents can dynamically scale to meet fluctuating demand, reducing delays during peak usage periods. Organizations benefit from operational cost reductions, as the serverless infrastructure eliminates the need for always-on provisioning, seamlessly scaling resources to match demand precisely.

DevOps Automation

Integrating SLM agents into CI/CD pipelines with Kubernetes and Knative enables highly effective automated troubleshooting and proactive anomaly detection. Agents can swiftly interpret build logs and test outputs, monitor alerts, diagnose issues and suggest immediate fixes. This helps improve operational efficiency, reduce downtime and streamline DevOps processes by helping to rapidly identify and resolve pipeline bottlenecks.

Financial Services

Financial institutions can deploy lightweight SLM agents to more quickly analyze real-time market data, enabling rapid and informed decision-making without the heavy computational overhead typical of larger models. These agile, scalable deployments can more efficiently help handle substantial volumes of concurrent queries, providing traders and financial analysts with immediate insights, trend forecasts and risk assessments, which are crucial for informed trading strategies and to address regulatory compliance.

Conclusion

Organizations are striving to understand the new paradigm that agentic AI offers to improve operational efficiencies. By integrating SLMs with Kubernetes and FaaS, enterprises can use scalable, efficient and responsive agent-based solutions to help address their use cases. Cloud native solutions like Oracle’s OKE, complemented by CNCF tools such as Knative, Prometheus and Istio, can help streamline operations, be applied to reduce overhead and enable organizations to deliver innovative AI-driven solutions swiftly and economically. Embracing this cloud native approach positions businesses to thrive in increasingly agile and competitive environments. Experiment with Oracle’s cloud native services using the Oracle Cloud Free Tier, or quickly build new generative AI solutions with the AI Solutions Hub.
Oracle offers a wide range of technologies for building, testing, and maintaining applications in the cloud and in your data center. Find free tools and learning resources at oracle.com/developer
Learn More
The latest from Oracle
TRENDING STORIES
Sanjay Basu PhD, is Senior Director - Gen AI/GPU Cloud Engineering at Oracle. He focuses on the advanced services like Generative AI, Machine-Learning, GPU Engineering, Blockchain, Microservices, Industrial IoT, 5G core along with Cloud Security and Compliance. He has double...
Read more from Sanjay Basu
Oracle sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Real, Docker.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.