VOOZH about

URL: https://thenewstack.io/how-to-run-databases-in-kubernetes/

⇱ How To Run Databases in Kubernetes - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-07-23 10:00:22
How To Run Databases in Kubernetes
contributed,
Databases / Kubernetes

How To Run Databases in Kubernetes

Many people successfully run their databases in Kubernetes, and the number of such deployments is growing daily.
Jul 23rd, 2024 10:00am by Kolawole Olowoporoku
👁 Featued image for: How To Run Databases in Kubernetes
Image by Gerd Altmann from Pixabay.
The debate about where databases should run in Kubernetes has been a hot topic in the tech community. The prevailing argument is about “building stateless applications,” suggesting that databases are best suited as managed services with cloud providers. However, there are practical design patterns for successfully running databases in Kubernetes. On most cloud providers, volumes are constrained to a single Availability Zone (AZ), which means the databases are also constrained to that AZ by design. Most production clusters are likely regional or multi-AZ, especially for stateless applications. Using node selectors to ensure the database pods are located in the AZs where their volumes can be mounted is important. Example:
nodeSelector:
  topology.kubernetes.io/zone: europe-west6-b
This configuration specifies that the database pod should run in the ‘europe-west6-b’ AZ.

Plan Resource Usage

Since our databases are constrained to one AZ, we must carefully plan our node-to-AZ design to avoid scheduling errors and unavailability issues. One effective strategy is to run separate node groups or node pools specifically for database workloads. This ensures that sufficient resources are always available in the required AZ. Example:
  • Create a dedicated node pool for database workloads.
  • Use taints and tolerations to ensure only database pods are scheduled on these nodes.
# Taint nodes to be dedicated to databases

spec:
  taints:
  - key: "dedicated"
    value: "database"
    effect: "NoSchedule"

# Toleration in the DB pod spec
spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "database"
    effect: "NoSchedule"

High Availability

Managed database services often provide built-in high availability and failover capabilities. To achieve similar resilience in Kubernetes, meticulous planning for recovery and availability strategies is essential. Here are two approaches:

Using Kubernetes Operators:

Kubernetes operators like the Zalando Postgres Operator offer advanced features like read replicas and automatic failovers, similar to managed database services. These operators can significantly simplify the setup and management of high availability for your databases. The Zalando Postgres Operator allows you to specify the number of read replicas and automatically manages failovers. This operator provides a UI where you can configure these settings, making it an intuitive and powerful tool for managing database high availability in Kubernetes. Here is a list of some other Operators, some of which are managed by their respective communities

Self-Service Approach:

For those who prefer a more hands-on approach, particularly for NoSQL databases, here’s a step-by-step method:
  • Mount Data Volumes on Both Pods: Ensure that the data volume is accessible by both the primary and secondary pods.
  • Pod Affinity: Use pod affinity rules to ensure that the primary and secondary pods are placed together, respecting volume constraints.
  • Init Container: At startup, use an init container in the secondary pod to copy all data from the primary pod.
  • Volume Mount Constraint: Set the volume mount on the secondary pod to read-only to prevent data corruption.
  • Use cronjob for restarts: Create a simple CronJob that deletes the old pod every six hours, allowing the init container to run and copy new data.

Example Configurations

The example below shows how to set up a Neo4j read replica using pod affinity, an init container for data copying, and mount volumes with read-only constraints to ensure data integrity.
affinity:
  podAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
            - key: app
              operator: In
              values:
                - primary-db
        topologyKey: "kubernetes.io/hostname"

initContainers:
  - name: copy-data
    image: busybox
    command: ["sh", "-c", "cp -r /data/* /backup/"]
    volumeMounts:
      - name: data-volume
        mountPath: /data
      - name: backup-volume
        mountPath: /backup

volumes:
  - name: data-volume
    persistentVolumeClaim:
      claimName: primary-db-pvc
  - name: backup-volume
    persistentVolumeClaim:
      claimName: secondary-db-pvc

containers:
  - name: secondary-db
    image: neo4j:latest
    volumeMounts:
      - name: backup-volume
        mountPath: /data
        readOnly: true

Backups and Restore

Many service providers offer ways to schedule recurring snapshots on disk-based volumes. This is often the preferred method because it is easier to set up, and the recovery process is faster. We can back up the volumes hosting the DB data regularly in this scenario. Another approach involves combining the database’s proprietary tools, such as pg_dump for PostgreSQL. Here is an example Configuration for PostgreSQL using Kubernetes Cron Jobs to backup to s3
apiVersion: batch/v1beta1
kind: CronJob
metadata:
 name: postgres-backup
spec:
 schedule: "0 0 * * *"
 jobTemplate:
 spec:
 template:
 spec:
 containers:
 - name: backup
 image: postgres
 command: ["sh", "-c", "pg_dumpall -c -U $PGUSER | gzip > /backup/db_backup.gz && aws s3 cp /backup/db_backup.gz s3://your-bucket/db-backup-$(date +\%F).gz"]
 volumeMounts:
 - name: backup-volume
 mountPath: /backup
 restartPolicy: OnFailure
 volumes:
 - name: backup-volume
 emptyDir: {}

Summary

Even though the initial setup or learning curve might be steep, running your database in Kubernetes provides plenty of advantages. One not-so-talked-about benefit is cost. Running a db.m4.2xlarge (4vCPUs, 32GB RAM) instance in RDS costs approximately $1200/month, while running a similarly sized EC2 instance costs around $150/month. A node in Kubernetes will likely also run more than one pod, further optimizing resource use. Vendor agnosticism is another key motivation for many people running databases in Kubernetes. Moving your workloads across any platform with minimal tweaks is incredibly appealing. In conclusion, consider the pros and cons before deciding where to run your production database. Many people successfully run their databases in Kubernetes, and the number of such deployments is growing daily.
TRENDING STORIES
Kolawole Olowoporoku has over 7 years of experience managing distributed systems. Currently, he works as a Site Reliability Engineer (SRE) at SEKAI, a Swiss-based startup. Kolawole has also worked for several large organizations, including AWS. He is a passionate advocate...
Read more from Kolawole Olowoporoku
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.