VOOZH about

URL: https://dzone.com/articles/how-to-build-and-deploy-an-ai-agent-on-kubernetes

⇱ Build AI Agent on Kubernetes With AWS Bedrock, FastAPI, Helm


Related

  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. How to Build and Deploy an AI Agent on Kubernetes With AWS Bedrock, FastAPI and Helm

How to Build and Deploy an AI Agent on Kubernetes With AWS Bedrock, FastAPI and Helm

Learn how to build, containerize, and deploy a lightweight, cloud-native AI agent on Amazon EKS using FastAPI, AWS Bedrock, Docker, and Helm.

Likes
Comment
Save
3.5K Views

Join the DZone community and get the full member experience.

Join For Free

The capabilities offered by AI are no longer limited to large, centralized platforms. Today, engineering teams are increasingly embracing lightweight, specialized AI agents that can be managed, scaled, and deployed just like microservices in a cloud-native environment β€” whether for summarizing large documents, translation, classification, or other analytical tasks. In this tutorial, you will create, deploy, and run an AI model that provides REST APIs for summarization and translation using AWS Bedrock, FastAPI, Docker, and deployment on Amazon EKS via Helm.

This provides a reusable process for integrating AI into operations: one agent, one task, clear boundaries, and full Kubernetes-native visibility and control.

Why AI Agents Fit the Microservices Model

Organizations implementing β€œplatform thinking” are seeking AI components that function like other services in their architecture:

  • Independently deployable
  • Scalable according to demand
  • CI/CD handled by standard pipelines
  • Observable and secure
  • Easy to integrate via REST

AI capabilities are transformed into microservices, enabling cloud-agnostic AI building blocks rather than monolithic AI platforms.

This article assumes your Amazon EKS cluster and Amazon ECR repository are already provisioned, so the focus remains on application architecture and deployment patterns rather than infrastructure setup.

Real-World Use Cases

Scenario Outcome
Customer Support  Summarize long customer tickets
Engineering Operations Translate incident reports
Risk and Compliance  Condense audit or regulatory documents
Product and Marketing  Translates release notes across regions


Step 1: Project Setup

A clean directory layout keeps application logic, containerization, and deployment assets separate:

Markdown
```text
ai-agent/
β”œβ”€β”€ app/
β”‚ β”œβ”€β”€ main.py
β”‚ β”œβ”€β”€ providers.py
β”‚ β”œβ”€β”€ models.py
β”‚ └── config.py
β”œβ”€β”€ Dockerfile
└── charts/


This layout separates application logic, container configurations, and Kubernetes deployment assets.

Step 2: Build FastAPI AI Agent

 Configuration:

Python
# app/config.py
from pydantic import BaseSettings

class Settings(BaseSettings):
 aws_region: str = "us-east-1"
 model_summarize: str = "anthropic.claude-v2"
 model_translate: str = "amazon.titan-text-lite-v1"

settings = Settings()


Request Models:

Python
# app/models.py
from pydantic import BaseModel

class SummarizeRequest(BaseModel):
 text: str

class TranslateRequest(BaseModel):
 text: str
    target_language: str


Bedrock Provider:

Python
# app/providers.py
import boto3
import json
import logging
from app.config import settings

logger = logging.getLogger("ai-agent")
logger.setLevel(logging.INFO)

bedrock_client = boto3.client(
 "bedrock-runtime",
 region_name=settings.aws_region
)

def call_bedrock(model_id: str, prompt: str) -> str:
 try:
 payload = {
 "prompt": prompt,
 "max_tokens_to_sample": 200
 }
 response = bedrock_client.invoke_model(
 modelId=model_id,
 body=json.dumps(payload)
 )
 output = json.loads(response["body"].read())
 return output.get("completion", "")
 except Exception as e:
 logger.error(f"Bedrock error: {e}")
        return "Unable to process request."


FastAPI Application:

Python
# app/main.py
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from app.models import SummarizeRequest, TranslateRequest
from app.providers import call_bedrock
from app.config import settings

app = FastAPI(title="AI Summarizer and Translator")

app.add_middleware(
 CORSMiddleware,
 allow_origins=["*"],
 allow_methods=["*"],
 allow_headers=["*"]
)

@app.get("/healthz")
async def health():
 return {"status": "ok"}

@app.post("/summarize")
async def summarize(req: SummarizeRequest):
 prompt = f"Summarize this text in two concise sentences:\n{req.text}"
 return {"summary": call_bedrock(settings.model_summarize, prompt)}

@app.post("/translate")
async def translate(req: TranslateRequest):
 prompt = f"Translate this into {req.target_language}:\n{req.text}"
    return {"translation": call_bedrock(settings.model_translate, prompt)}


Step 3: Containerize the Application

Dockerfile:

Dockerfile
FROM python:3.11-slim AS builder
WORKDIR /app
COPY app/requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

FROM python:3.11-slim
ENV PYTHONUNBUFFERED=1
ENV PATH=/root/.local/bin:$PATH
WORKDIR /app

COPY --from=builder /root/.local /root/.local
COPY app ./app

RUN adduser --disabled-password --gecos '' appuser
USER appuser

EXPOSE 8000

CMD ["uvicorn","app.main:app","--host","0.0.0.0","--port","8000","--workers","2"]


Step 4: Push the Image to Amazon ECR

Markdown
```bash
aws ecr get-login-password --region us-east-1 \
 | docker login --username AWS --password-stdin <ECR_URL>

docker tag ai-agent:latest <ECR_URL>/ai-agent:latest
docker push <ECR_URL>/ai-agent:latest


Step 5: Create Kubernetes Secretes

Markdown
```bash
kubectl create secret generic bedrock-secret \
 --from-literal=AWS_ACCESS_KEY_ID=XXX \
  --from-literal=AWS_SECRET_ACCESS_KEY=YYY


Step 6: Helm Configurations

values.yaml

YAML
replicaCount: 2

image:
 repository: <ECR_URL>/ai-agent
 tag: latest
 pullPolicy: Always

service:
 type: LoadBalancer
 port: 80

env:
 AWS_REGION: us-east-1

secretRef: bedrock-secret

resources:
 requests:
 cpu: 100m
 memory: 256Mi
 limits:
 cpu: 500m
    memory: 512Mi


Deployment Template:

YAML
apiVersion: apps/v1
kind: Deployment
metadata:
 name: ai-agent
spec:
 replicas: {{ .Values.replicaCount }}
 selector:
 matchLabels:
 app: ai-agent
 template:
 metadata:
 labels:
 app: ai-agent
 spec:
 containers:
 - name: ai-agent
 image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
 imagePullPolicy: {{ .Values.image.pullPolicy }}
 ports:
 - containerPort: 8000
 env:
 - name: AWS_REGION
 value: {{ .Values.env.AWS_REGION | quote }}
 envFrom:
 - secretRef:
 name: {{ .Values.secretRef }}
 readinessProbe:
 httpGet:
 path: /healthz
 port: 8000
 livenessProbe:
 httpGet:
 path: /healthz
 port: 8000
 resources:
{{ toYaml .Values.resources | indent 12 }}


Step 7: Deploy to Kubernetes

Markdown
```bash
helm install ai-agent ./charts -f values.yaml
kubectl get svc


Step 8: Test the API

Markdown
```bash
curl -X POST http://<EXTERNAL-IP>/summarize \
 -H "Content-Type: application/json" \
 -d '{"text":"Customer reported API latency during peak hours."}'


Step 9: Autoscaling and Monitoring

Markdown
```bash
kubectl autoscale deployment ai-agent \
  --min=2 --max=6 --cpu-percent=70


Step 10: CI/CD Automation (GitHub Actions or Harness CD)

Once the container image and Helm chart are set up, automation can be implemented through a standard CI/CD pipeline. The process involves building the container image, storing it in Amazon ECR, and deploying/upgrading a Helm release to EKS.

  • GitHub Actions: Ideal for repository-based CI/CD with simple deployment pipelines.
  • Harness CD: Suitable for environments requiring approval gates, RBAC, traceability, and multi-team orchestration.

Regardless of the tool, the deployment lifecycle remains consistent: container versioning, Kubernetes releases via Helm, and rollouts with standard health checks.

Closing Thoughts

Kubernetes offers a solid platform for deploying AI agents as modifiable services, whereas AWS Bedrock makes large language models easily accessible with simplicity that does not accrue any operational complexity in addition to that. Paired together with FastAPI, Docker and Helm, a straight and clear approach towards making AI services easily available via standard APIs becomes possible.

The ability to separate the application logic layer from the deployment aspect makes it easier to implement the approach in a way that promotes reuse, scalability and consistency in operations with the application. With the increasing trend among businesses to consume multiple clouds, the need for the above qualities cannot be overemphasized in order to control the deployment processes without getting

In the succeeding installments of this series, the same machine learning model will be utilized on Azure AKS with Azure OpenAI and on GCP GKE with Vertex AI. This is the power of Kubernetes β€” the ability to provide an equivalent layer for machine learning tasks on any cloud platform.

AI AWS Kubernetes Build (game engine)

Opinions expressed by DZone contributors are their own.

Related

  • Dynatrace Perform: Day Two
  • Compliance Automated Standard Solution (COMPASS), Part 11: Compliance as Code, the OSCAL MCP Server Way
  • Self-Hosted Inference Doesn’t Have to Be a Nightmare: How to Use GPUStack
  • The Agent Protocol Stack: MCP vs. A2A vs. AG-UI

Partner Resources

Γ—

Comments

The likes didn't load as expected. Please refresh the page and try again.

Let's be friends: