VOOZH about

URL: https://thenewstack.io/the-data-protection-challenges-of-kubernetes/

⇱ The Data Protection Challenges of Kubernetes - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2020-11-02 12:29:27
The Data Protection Challenges of Kubernetes
contributed,sponsor-cncf,sponsored,sponsored-post-contributed,
Kubernetes / Storage

The Data Protection Challenges of Kubernetes

A community–developed standard approach for data protection, with the ability to allow development by third party protection vendors, is urgently needed.
Nov 2nd, 2020 12:29pm by Mathew Ericson
👁 Featued image for: The Data Protection Challenges of Kubernetes
CNCF sponsored this post.

Cloud Native Computing Foundation sponsored this post, in anticipation of the virtual KubeCon + CloudNativeCon North America 2020 – Virtual, Nov. 17-20.

Mathew Ericson
Mathew Ericson is a Sr Product Manager at Commvault. He is currently working in the Cloud and Virtualization area of the Product Management team and is responsible for Amazon Web Services and Kubernetes product integration. He’s been with Commvault for the past 4 years and in the tech industry for 23 years. He has held positions in development, storage, and data management across the Global 500. Follow him on Twitter at @mericsonAU.

Adopting Kubernetes as your de facto standard for container orchestration will accelerate your data-center orchestration and modernization efforts. Companies across all verticals and segments, from SMB to Large Enterprise, have adopted containers as a way to develop differentiated products built on cloud native foundations. But this accelerated adoption is not without its challenges.

In delivering Kubernetes to the enterprise, DevOps engineers have embraced containers as the new virtual machines and started the migration of stateful applications (i.e. using databases and middleware layers) to containers. With self-service access to provision storage whenever and wherever they need it, DevOps engineers are no longer bound by the delays of traditional IT help desk requests, so container deployments have skyrocketed. The resulting container sprawl and now cluster sprawl is providing the next challenge to understaffed IT teams.

The Containerization Journey

Why are DevOps engineers taking a technology designed for stateless applications and layering stateful applications with all the complexity of persistent volume management? Simple: agility. Kubernetes allows the developer to provision, test, QA and even scale their application, based on business needs or demand. The journey from a traditional monolithic application is multi-phased and step one is the virtual machine to container migration.

👁 Image

Businesses are relearning how to build applications that leverage the on-demand nature and resiliency capabilities of cloud native technologies, to respond to always-on customer demands in a mobile world. We are seeing rapid adoption in the VM to container space, followed by multiple re-factoring phases, followed by adoption of flexible and software-defined storage to simplify storage management at scale. The ideal state is the microservices architecture, which many businesses are striving to achieve.

Current Challenges

What is missing in this approach are the underlying tools and automation to deliver end-to-end data management. There have been some initial projects that attempt to move beyond basic scripts (formerly Heptio Ark, formerly Velero, now VMware Tanzu). There is the Kubernetes Storage SIG, and the recently established Data Protection Working Group (WG). But fundamentally there are still some basic challenges that need resolution:

  • What is the definition of an application within Kubernetes (see v1beta1 Application CRD)?
  • How does a developer record the dependencies against an application (e.g., customer resource definitions or resources)?
  • How does protection and recovery work in secure multi-tenanted Kubernetes clusters (see Hierarchical Namespace concept)?

In fact, if we look at the traditional monolithic applications that are actively being migrated to Kubernetes applications, we find another list of challenges. Application consistency must be achieved, without the requirement to insert non-application binaries or agents inside the container. Application consistency is the act of coordinating application state and the protection operation (backup, storage snapshot, etc.).

Storage Considerations

Storage consistency must be achieved using snapshot mechanisms, to allow for online or ‘live’ protection without impacting the running application. In fact, as businesses adopt a wide variety of storage solutions, the ability to take a cloud native approach to storage management (API-driven, open interface, seamless scalability) is required.

The Container Storage Interface (CSI) provides this cloud native approach today, and while dynamic provisioning, attach/detach, and mount capabilities are stable, snapshot capability is not yet generally available. Large enterprises have come to know and love snapshot-based protection with their traditional enterprise storage array technologies. Snapshot, clone, and consistency group (CG) backups are considered core functionality to permit a wholesale migration of traditional applications to containers. One example is the ability to provision all-flash storage to production environments, but leverage the CSI cloning copy of these snapshots to a more cost effective tier (i.e. dev/test seeding). These capabilities are still under development within the CSI specification.

It should be noted that while the CSI standard provides a way of providing a storage level point-in-time volume snapshot, it does not move that snapshot to alternative storage media. A snapshot is considered a ‘recovery point’ and requires copying to cloud, disk or tape media, to be considered a true ‘backup copy.’ The Data Protection WG is currently working on this challenge.

Application Resilience

We have seen a bifurcated approach to application architecture and resiliency. Depending on the development resources available to a business, they may take one of two disaster recovery approaches:

  • Application-centric recovery focused on capturing the entire Kubernetes application (manifests, persistent data, dependent resources) and re-scheduling them in a remote cluster. This approach can be entirely automated, with no reliance on the application owner.
  • Infrastructure-centric recovery focused on leveraging next-generation software-defined storage (SDS), that can be tightly integrated into Kubernetes by way of a custom resource definition (CRD) to provide scheduling, replication, cloning, and recovery from the Kubernetes command-line (i.e. kubectl).

Both approaches are valid, but incur a different level of IT operations resources, application development resources, and associated automation. As recovery events are often a response to an unanticipated event, intelligent automation is required to drive consistent recovery outcomes and meet business recovery time objectives (RTOs).

👁 Image

Beyond the Kubernetes Cluster

Kubernetes-based or container-based applications are made up of a number of new data types distributed throughout the organization. At the recent KubeCon Europe, experienced Kubernetes veterans expressed a desire to not “bypass CI/CD, code reviews, and formal release processes.” Bottom-line, for containerization to form a stable building block of the next-generation application landscape, data protection best practices are required.

For example:

  • Are you protecting developer workstations where the majority of development initiates?
  • Are you protecting your source-code control system and CI/CD systems?
  • What is the impact to your customers if your CI/CD system is unavailable?
  • Are you protecting your etcd (etcd.io) data for on-prem clusters?
  • Are you running your own private image registries, and if so, are they protected (goharbor.io)?
  • How will you protect modern persistence stores like cloud object stores?

Considering the End-to-End Challenge

When we step back and review these challenges, we must reflect on why we perform data protection and data management.

  • We need to recover a failed application to production.
  • We need to recover a failed application or container(s) to an alternate location (disaster recovery).
  • We need to migrate applications for infrastructure lifecycle or development (i.e., seeding a new cluster).
  • We need to optimize our deployments by consolidating and reducing infrastructure sprawl.
  • We need to protect applications to a defined SLA.
  • We need to deliver the capabilities regardless of workload location (on-premises, cloud).

These challenges require data management capabilities that enterprises already enjoy, including:

  • Centralized policy-based protection across all Kubernetes deployments.
  • Holistic reporting, dashboards, trending, and alerting across all protected data.
  • Self-service backup, recovery, and insights for authorized individuals.
  • Integration into Single Sign On (SSO) systems to provide granular role-based access control.
  • Policy-based control to access and use protected data.
  • Governance and compliance capabilities for report, audit, log, and persistent data for regulatory requirements.

Many data protection solutions today rely on capturing application manifests and persistent data. Scheduling of protection occurs on a cluster-by-cluster basis, with little visibility across multiple implementations. Additionally, secure multitenancy best practices and even integration with technologies like Open Policy Agilent (OPA) are yet to mature.

Kubernetes has certainly delivered application mobility, with the orchestration of an application from one cluster to another now possible. Challenges moving forward are going to require policy controls, reporting, alerting and even Kubernetes manifest transformations, to support migration between disparate cluster versions and technologies. A communitydeveloped standard approach for protection, with the ability for the solution to allow the development by third party protection vendors, is urgently needed.

To learn more about Kubernetes and other cloud native technologies, consider coming to KubeCon + CloudNativeCon North America 2020, Nov. 17-20, virtually.

The Cloud Native Computing Foundation is a sponsor of The New Stack.

Feature image via Pixabay.

The Cloud Native Computing Foundation (CNCF) hosts critical components of the global technology infrastructure including Kubernetes, OpenTelemetry, and Argo. CNCF is the neutral home for cloud native collaboration, bringing together the industry’s top developers, end users, and vendors.
Learn More
The latest from CNCF
TRENDING STORIES
Mathew Ericson is a Sr Product Manager at Commvault. He is currently working in the Cloud and Virtualization area of the Product Management team and is responsible for Amazon Web Services and Kubernetes product integration. He’s been with Commvault for...
Read more from Mathew Ericson
CNCF sponsored this post.
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.