![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
“Is Kubernetes ready for stateful workloads?” is the first question that pops up when decision-makers consider deploying databases on Kubernetes. For years the answer was “don’t do it,” and for good reasons. Kubernetes was initially designed to handle the orchestration of stateless workload. But the technology has matured, and it is time to reconsider running data on Kubernetes.
There are three important technical aspects to be considered:
While assessing the maturity of any technology isn’t a straightforward process, there are solid signals that can be used. Kubernetes is a Cloud Native Computing Foundation graduated project, meaning that the technology has the “adoption, a healthy rate of changes, and committers from multiple organizations”. Their 2020 Survey Report shows that “91% of respondents using containers report using Kubernetes, 83% of them in production.”
Since November 2017, reputed analyst firm Thoughtworks considers Kubernetes as a mature technology that companies should adopt, explaining that “it has become the default solution for most of our clients when deploying containers into a cluster of machines.”
Chart courtesy of the Cloud Native Computing Foundation.
Kubernetes stateful capabilities are often doubted, and a first-generation stateful technology named Persistent Sets (“PetSet”) is (partially) to blame. This feature was deprecated in favor of the current stateful technology in Kubernetes: StatefulSets. Released for GA (“General Availability”) in 2018, it is used today across countless solutions that provide persistent, non-ephemeral, storage for Kubernetes containers. This is what makes Vitess or other cloud native databases deployment in Kubernetes possible.
Most notably, StatefulSets mount PersistentVolumes (“PVs”) into the containers. These PVs are generally provided by storage external to the Kubernetes node, either in the form of networked drives or software-defined storage solutions, like OpenEBS. In essence, the storage used in Kubernetes and in the cloud is the same EBS volumes you use on AWS, or the Persistent Disks you use on GCP; and we can expect the same level of maturity.
Surely, database performance suffers in Kubernetes, doesn’t it? Containers are wrongly perceived as “lightweight virtual machines.” They are rather extremely thin layers of abstraction wrapping the filesystem, process, and networking spaces, provided by the Linux kernel. There might be some overhead if you use only ephemeral, container storage for the data. But the overhead is negligible if you use external PV storage.
And what about the ephemeral nature of containers? Wouldn’t this affect high availability? Since containers are just “wrappers” around a process, their lifetime is tied to that of the process. In other words, containers will be as stable as the database process running inside of them.
There are obvious advantages to running databases on Kubernetes: the simplicity of deployment, having the whole stack managed by the same orchestration tool, auto-healing, and automatic reprovisioning of failed containers leading to higher availability. For example, if one of the nodes running a database fails, Kubernetes will automatically self-heal, rescheduling the workload on another node. With cooperation with the database management software, it may elect a new database primary running on a previously existing replica, and re-initialize the new node as a new replica, all automatically. But there are other, more important, reasons why you want to run databases in Kubernetes.
Most companies want to operate databases as a DBaaS (“Database-as-a-Service”). To self-provision a self-healing database, including backups, and monitoring. While this is offered by most cloud providers, doing it yourself by using Kubernetes can save significant costs, and offer additional capabilities, such as multicloud and cloud portability.
These capabilities are made available via Kubernetes Operators. Operators are application-specific extensions to Kubernetes that encode deployment and operations automation while exposing simple interfaces to the users. Advanced database Kubernetes operators bring, among others, the following benefits:
Running databases on Kubernetes is not only the future but also the present, as shown by leading companies such as Goldman Sachs, Zalando, and Flipkart. As with any technology, careful and objective evaluation should be performed before deploying production workloads.
Unsurprisingly, the Data on Kubernetes 2021 report found that 90% of the responding companies believe that Kubernetes is ready for stateful workloads. A large majority of these organizations (70%) run stateful workloads in production with databases topping the list. Those running 75% or more of their production workloads on it report an impressive 2x or greater productivity gains!
Considering all the advantages that running databases on Kubernetes offers, companies should ensure to consider it. Running data on Kubernetes was the latest frontier to have fully orchestrated infrastructure and I believe that this shift will unleash considerable value for businesses.