DZone
Data Engineering
AI/ML
DeepSeek on Kubernetes: AI-Powered Reasoning at Scale

DeepSeek on Kubernetes: AI-Powered Reasoning at Scale

Deploy DeepSeek-R1 on Kubernetes using Ollama for inference and Open WebUI for seamless interaction. Supports local setups like KIND or cloud deployment.

👁 Rajesh Gheware user avatar

Rajesh Gheware

👁 DZone Core
CORE ·

Mar. 14, 25 · Analysis

Likes (3)

Comment

Save

5.6K Views

Join the DZone community and get the full member experience.

Join For Free

As artificial intelligence continues to evolve, deploying AI-powered applications efficiently and at scale has become critical. Kubernetes, the de facto orchestration platform, plays a crucial role in managing containerized AI workloads, ensuring scalability, resilience, and ease of management.

In this article, we explore DeepSeek on Kubernetes, a deployment that integrates DeepSeek-R1, a powerful reasoning AI model, with Open WebUI for seamless interaction.

Why Kubernetes for DeepSeek?

DeepSeek is an advanced reasoning model that naturally benefits from containerization and orchestration provided by Kubernetes. Kubernetes stands out from alternatives like Docker Swarm and Apache Mesos due to its mature ecosystem and extensive features tailored specifically for complex AI workloads. Here's why Kubernetes is ideal for deploying DeepSeek:

Scalability

Kubernetes simplifies scaling AI workloads with tools like Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler. Imagine a scenario where DeepSeek faces a sudden surge in inference requests — Kubernetes seamlessly scales the pods and nodes automatically, ensuring consistent performance without manual intervention.

Resilience

Kubernetes ensures high resilience through automated pod rescheduling and self-healing capabilities. If a DeepSeek pod encounters issues such as resource constraints or node failures, Kubernetes quickly detects and redeploys the affected pod to a healthy node, minimizing downtime and maintaining continuous availability.

Service Discovery

Kubernetes provides built-in DNS-based service discovery and seamless management of microservices. DeepSeek’s inference services can effortlessly discover and connect to supporting microservices, like preprocessing modules or logging services, without the need for complex manual configuration, enhancing maintainability and flexibility.

Persistent Storage

Kubernetes PersistentVolumeClaims (PVCs) effectively handle AI model storage, training datasets, and checkpoints. This ensures critical data remains consistent and available even during updates, pod restarts, or node failures. For example, updating DeepSeek models or scaling inference pods becomes seamless and non-disruptive.

Load Balancing

Kubernetes offers intrinsic load-balancing capabilities, distributing workloads efficiently across multiple replicas. This capability is critical for DeepSeek to evenly distribute inference requests among multiple instances, optimizing resource utilization and significantly reducing response latency.

While alternatives like Docker Swarm offer simplicity, Kubernetes uniquely delivers comprehensive features essential for managing sophisticated AI models like DeepSeek, ensuring scalability, robustness, and operational ease.

Deploying DeepSeek on Kubernetes

1. Kubernetes Cluster Setup

In our setup, we have a three-node Kubernetes cluster with the following nodes:

Plain Text

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
deepseek-control-plane Ready control-plane 6d5h v1.32.0
deepseek-worker Ready <none> 6d5h v1.32.0
deepseek-worker2           Ready    <none>          6d5h   v1.32.0

Even if Kubernetes nodes are not powered using GPU, DeepSeek-R1 will still function, although response times may be slower. GPU acceleration is recommended for optimal performance, especially for complex reasoning tasks.

Kubernetes clusters can be set up locally using tools like:

KIND (Kubernetes IN Docker)
Minikube
MicroK8s

If deployed on a cloud provider, the setup can be made securely accessible using an Ingress object to expose services through a web interface with proper authentication and TLS security.

2. Deploying DeepSeek-R1 With Ollama

DeepSeek-R1 is deployed within Kubernetes using Ollama, which handles AI model inference. Below is the Kubernetes manifest for the Ollama deployment:

YAML

apiVersion: apps/v1
kind: Deployment
metadata:
 name: ollama
 labels:
 app: ollama
spec:
 replicas: 1
 selector:
 matchLabels:
 app: ollama
 template:
 metadata:
 labels:
 app: ollama
 spec:
 containers:
 - name: ollama
 image: ollama/ollama:latest
 ports:
 - containerPort: 11434
 volumeMounts:
 - mountPath: /root/.ollama
 name: ollama-storage
 env:
 - name: OLLAMA_MODEL
 value: deepseek-r1:1.5b
 - name: OLLAMA_KEEP_ALIVE
 value: "-1" 
 - name: OLLAMA_NO_THINKING
 value: "true"
 - name: OLLAMA_SYSTEM_PROMPT
 value: "You are DeepSeek-R1, a reasoning model. Provide direct answers without detailed reasoning steps or <think> tags."
 volumes:
 - name: ollama-storage
        emptyDir: {}

3. Exposing Ollama as a Service

To allow other services to communicate with Ollama, we define a NodePort service:

YAML

apiVersion: v1
kind: Service
metadata:
 name: ollama-service
spec:
 selector:
 app: ollama
 ports:
 - protocol: TCP
 port: 11434
 targetPort: 11434
  type: NodePort

4. Deploying Open WebUI

For an interactive experience, we integrate Open WebUI, which connects to Ollama and provides a user-friendly interface. The deployment is as follows:

YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: openweb-ui
  labels:
    app: openweb-ui
spec:
  replicas: 1
  selector:
    matchLabels:
      app: openweb-ui
  template:
    metadata:
      labels:
        app: openweb-ui
    spec:
      containers:
      - name: openweb-ui
        image: ghcr.io/open-webui/open-webui:main
        env:
        - name: WEBUI_NAME
          value: "DeepSeek India - Hardware Software Gheware"        
        - name: OLLAMA_BASE_URL
          value: "http://ollama-service:11434"  
        - name: OLLAMA_DEFAULT_MODEL
          value: "deepseek-r1:1.5b"             
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: openweb-data
          mountPath: /app/backend/data
      volumes:
      - name: openweb-data
        persistentVolumeClaim:
          claimName: openweb-ui-pvc

5. Running Inference on DeepSeek-R1

To test the deployment, we can execute a command within the Ollama container:

Shell

kubectl exec -it deploy/ollama -- bash
ollama run deepseek-r1:1.5b

This command starts an interactive session with the AI model, allowing direct input queries.

Accessing Open WebUI

After deployment, Open WebUI is accessible by creating an ingress object pointing to the URL.

Plain Text

http://deepseek.gheware.com/auth

This interface allows users to interact with DeepSeek-R1 through a chat-based environment.

Conclusion

By deploying DeepSeek on Kubernetes, we achieve a scalable, resilient, and production-ready AI reasoning system. Kubernetes efficiently orchestrates DeepSeek-R1, ensuring smooth model execution and user interaction through Open WebUI. This architecture can be further extended by adding GPU acceleration, auto-scaling, and monitoring with Prometheus and Grafana.

For AI practitioners, Kubernetes offers an excellent foundation for deploying and managing reasoning models like DeepSeek-R1.

AI Kubernetes app

Published at DZone with permission of Rajesh Gheware. See the original article here.

Opinions expressed by DZone contributors are their own.

Self-Hosted Inference Doesn’t Have to Be a Nightmare: How to Use GPUStack
AI Agents for DevOps on Kubernetes Need Real Engineering, Not Magic
Shipping GenAI Into an Existing App: How to Integrate AI Features Without Rewriting Your Stack
Kubernetes Scheduler Plugins: Optimizing AI/ML Workloads

URL: https://dzone.com/articles/deepseek-on-kubernetes-ai-powered-reasoning-at-sca