AI agents are evolving from simple, prompt-based assistants into complex,
multiagent systems capable of reasoning, memory retention and collaboration. However, most development teams still face a bottleneck: deployment. Creating a powerful agent in a notebook is one thing; running it reliably in production with scalability, resilience and automation is another.
This is where
Kubernetes and
Terraform shine. Kubernetes (K8s) provides scalable orchestration for containerized workloads, while Terraform allows you to define and provision your infrastructure using code. Together, they form the foundation for cloud native AI systems that can scale intelligently as workloads grow.
Let’s build and deploy an
agentic AI workflow using a Python-based large language model (LLM) agent, containerize it with
Docker and deploy it to a Kubernetes cluster provisioned via Terraform. Whether you’re a developer, architect or technical leader, this will show you how to move from prototype to production with confidence.
Architecture Overview
Here’s the high-level design of the system:
- Agentic workflow: Introduce a LangChain-powered Python AI agent that responds intelligently to data queries.
- Docker containerization: Package the agent’s environment for portability.
- Terraform infrastructure: Provision cloud resources (VMs, networking and Kubernetes cluster).
- Kubernetes deployment: Run the agent workflow as a microservice with autoscaling.
- Load balancing and monitoring: Enable external access and observability.
Step 1: Create the Agentic AI Workflow
Begin by creating a Python-based AI agent using LangChain and OpenAI APIs.
Python Script: `agent_app.py`
import os
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.memory import ConversationBufferMemory
# Load and validate API key
openai_api_key = os.environ.get("OPENAI_API_KEY")
if not openai_api_key:
raise ValueError("OPENAI_API_KEY must be set before running this script.")
# Initialize model
llm = ChatOpenAI(
model="gpt-4",
temperature=0,
openai_api_key=openai_api_key
)
# Memory for context retention
memory = ConversationBufferMemory(memory_key="chat_history")
# Simple data retrieval tool
def fetch_data(query: str):
# Simulated data retrieval
return f"Data retrieved for query: {query}"
tools = [
Tool(
name="DataFetcher",
func=fetch_data,
description="Fetches business data for analysis."
)
]
# Initialise agent
agent = initialize_agent(
tools,
llm,
agent="chat-conversational-react-description",
memory=memory
)
# REST API for interaction
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route("/ask", methods=["POST"])
def ask():
user_input = request.json.get("query")
response = agent.run(user_input)
return jsonify({"response": response})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8080)
Explanation:
- The LangChain agent handles multistep reasoning using GPT-4.
- Memory stores conversation context for adaptive responses.
- A Flask API exposes the agent’s logic to external users and systems.
Step 2: Containerize With Docker
Next, package this app into a portable container image.
Dockerfile
# Base image
FROM python:3.10-slim
# Set working directory
WORKDIR /app
# Copy files
COPY . .
# Install dependencies
RUN pip install --no-cache-dir flask langchain-openai langchain openai
# Expose the Flask port
EXPOSE 8080
# Command to run the app
CMD ["python", "agent_app.py"]
Build and Test the Image
docker build -t agentic-ai-app:latest .
docker run -p 8080:8080 agentic-ai-app
Explanation:
- Docker encapsulates all dependencies, making the agent easily deployable in any environment: local, cloud or on premises.
Step 3: Define Infrastructure With Terraform
Define the cloud infrastructure with a managed Kubernetes cluster and Terraform. Here’s
AWS as an example. (Value: You can adapt it for Google Cloud Platform [GCP] or Azure.)
Terraform Configuration: `main.tf`
provider "aws" {
region = "us-east-1"
}
# Create a VPC
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "agentic-ai-vpc"
}
}
# Public Subnet 1
resource "aws_subnet" "subnet1" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
map_public_ip_on_launch = true
tags = {
Name = "agentic-ai-subnet-1"
}
}
# Public Subnet 2
resource "aws_subnet" "subnet2" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.2.0/24"
map_public_ip_on_launch = true
tags = {
Name = "agentic-ai-subnet-2"
}
}
# EKS Cluster
module "eks" {
source = "terraform-aws-modules/eks/aws"
cluster_name = "agentic-ai-cluster"
cluster_version = "1.29"
vpc_id = aws_vpc.main.id
subnets = [
aws_subnet.subnet1.id,
aws_subnet.subnet2.id
]
manage_aws_auth = true
tags = {
Environment = "dev"
Project = "agentic-ai"
}
}
output "cluster_endpoint" {
value = module.eks.cluster_endpoint
}
Initialize and Apply Terraform
terraform init
terraform apply -auto-approve
Explanation:
- Terraform provisions your AWS virtual private cloud (VPC) and deploys an Elastic Kubernetes Service (EKS) cluster. The `output` provides your cluster’s endpoint for connection.
Step 4: Deploy the Agent to Kubernetes
Once your cluster is ready, it’s time to configure kubectl and deploy the agent.
Kubernetes Deployment File: `deployment.yaml`
apiVersion: apps/v1
kind: Deployment
metadata:
name: agentic-ai
spec:
replicas: 2
selector:
matchLabels:
app: agentic-ai
template:
metadata:
labels:
app: agentic-ai
spec:
containers:
- name: agentic-ai
image: agentic-ai-app:latest
ports:
- containerPort: 8080
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-secret
key: api_key
---
apiVersion: v1
kind: Service
metadata:
name: agentic-ai-service
spec:
type: LoadBalancer
selector:
app: agentic-ai
ports:
- port: 80
targetPort: 8080
Deploy to Cluster
kubectl apply -f deployment.yaml
Explanation:
- The deployment ensures high availability with replicas, while the LoadBalancer service exposes your agentic workflow to the internet.
To test:
curl -X POST http://<load-balancer-endpoint>/ask -H "Content-Type: application/json" -d '{"query": "Analyse quarterly revenue trends"}'
Step 5: Add Monitoring and Autoscaling
To make the deployment production-grade, add monitoring and horizontal scaling.
Enable Autoscaling
kubectl autoscale deployment agentic-ai --cpu-percent=70 --min=2 --max=5
Monitor Logs
kubectl logs -f deployment/agentic-ai
Tip:
- For advanced monitoring, integrate Prometheus and Grafana, or use managed AWS CloudWatch dashboards.
Step 6: Continuous Learning Pipeline (Optional Enhancement)
Incorporate continual learning by enabling the agent to store and reuse knowledge from past interactions. For example, you could integrate with Pinecone or LlamaIndex to store embeddings of previous user queries and responses.
from llama_index import VectorStoreIndex, Document
# Persist new learning
def learn_from_interaction(question, response):
doc = Document(text=f"Q: {question}\\nA: {response}")
index.insert(doc)
index.save_to_disk(\"./vector_memory.json\")
Business and Technical Takeaways
For Developers
- This setup allows modular and scalable AI workflows.
- Agents can run in multiple containers, handling large-scale user interactions.
- Infrastructure changes are version-controlled via Terraform for traceability.
For Tech Leaders and CEOs
- Deploying AI agents on Kubernetes ensures high availability, security and cost-efficiency.
- Infrastructure as Code (IaC) with Terraform provides reproducibility and governance.
- The system can scale seamlessly — an agent that starts small can serve thousands of requests in production.
Shipping Complex, Multiagent Systems
AI innovation doesn’t end at the model level — it’s realized through deployment and scalability. By combining Terraform and Kubernetes, you can transform your intelligent agents into production-ready, cloud native systems that grow and adapt alongside your business needs.
This full-stack approach bridges the gap between AI research and reliable software engineering. It empowers organizations to move beyond proof-of-concept experiments and confidently integrate AI into their infrastructure.
Whether you’re deploying a customer support assistant, financial analysis agent or R&D copilot, the combination of Agentic AI, Kubernetes and Terraform gives you a scalable blueprint for the future of intelligent automation.
Andela provides the world’s largest private marketplace for global remote tech talent driven by an AI-powered platform to manage the complete contract hiring lifecycle. Andela helps companies scale teams & deliver projects faster via specialized areas: App Engineering, AI, Cloud, Data & Analytics.
Hear more from our sponsor