VOOZH about

URL: https://thenewstack.io/why-is-everyone-ignoring-the-day-2-kubernetes-problem/

⇱ Why Is Everyone Ignoring the Day 2 Kubernetes Problem? - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-06-17 10:17:47
Why Is Everyone Ignoring the Day 2 Kubernetes Problem?
sponsor-rafay,sponsored,sponsored-post,
Kubernetes / Open Source

Why Is Everyone Ignoring the Day 2 Kubernetes Problem?

Once Kubernetes is implemented, it can mean a never-ending headache for your Ops teams. Here's how to minimize pain from the start.
Jun 17th, 2022 10:17am by Heather Joslyn
👁 Featued image for: Why Is Everyone Ignoring the Day 2 Kubernetes Problem?
Rafay sponsored this post.

Haseeb Budhani has seen it happen again and again.

An organization undertakes a move to adopt cloud native technologies and implements Kubernetes.

And then things start to get … interesting.

Clusters multiply. Changes proliferate. Access demands pile up. Cloud costs spike.

“Whether you’re talking to a high-tech company, or a financial services company, a healthcare company, or a retailer running apps at the edge, the problems are all the same,” Budhani, CEO and co-founder of Rafay, a Kubernetes operations platform, told The New Stack.

“How do I manage access to my clusters? What’s the policy model that I’m going to use across all my environments? What add-ons must I always have in the standard blueprint for my production clusters? What is my strategy to deploy applications that belong to multiple business units? I’ve got to upgrade all my clusters soon since I’m already three versions behind across the board — how do I do that?”

These problems can all be grouped under the heading of “Day 2.” And Day 2 can mean a never-ending headache for site reliability engineers (SRE) and IT operations engineers.

“We’ve got to be honest about the pain here,” Budhani said. “We need a moment of catharsis in this industry.”

Why Is Day 2 So Painful?

The pain has a number of root causes, Budhani said. First, there’s the matter of the skills gap — eight years into the Kubernetes era and there still aren’t enough engineers who know enough about the K8s ecosystem.

A lack of in-house skills is the top challenge that companies encounter when adopting containers and Kubernetes, according to a survey released last June by Canonical.

Then there’s the hodgepodge of tools in the cloud native ecosystem that your organization uses to operationalize Kubernetes, each of which also regularly requires upgrades and attention.

“All these tools, they follow their own lifecycle. Every so often, each of these tools will need to be updated across all the clusters,” Budhani said. “So, you have to manage the lifecycle of your Kubernetes cluster, the lifecycle of each of these tools, the lifecycle of your applications, centralized policy and access management as new internal teams deploy more apps, a disaster recovery strategy for each app, charge-back strategies, and more.”

“This is Day 2. Day 2 is about what needs to happen to keep the lights on while the underlying technologies each require custom strategies for their governance, operational security and visibility in a fully automated fashion.”

An overarching issue he sees is that not enough enterprises are using what he calls “automation with governance” — the developer velocity and freedom from unnecessary toil that cloud native architecture promises, coupled with the checks and balances that organizations need to control access to critical data, applications and infrastructure, and control cloud costs.

“We’re not aligning to the North Pole that we all agreed was the right thing to do – automation,” Budhani said. “By definition, if you’re building it again, and again, you aren’t following the first rule of DevOps: Automate everything.”

What Does Day 2 for Kubernetes Ops Look Like?

On an ongoing basis, managing your Kubernetes operations requires keeping track of a number of things. For organizations that need to deploy into multi-cloud or hybrid environments, this complexity — and the challenges of keeping tabs on all the moving parts — compounds. In fact, the fear of dealing with that complexity can keep organizations from moving toward multi-cloud solutions in the first place and could lead to vendor lock-in that prevents an organization from realizing its business goals.

But the key areas that need attention remain the same, no matter where you’re deploying your applications. Here, according to Budhani and other experts, are five pillars of Kubernetes Ops:

Cluster Standardization and Lifecycle Management

“You know what your cluster looks like today, when you built it,” Budhani said. “But how do you know what it looks like a month from now?”  Even if you don’t touch the cluster again, he noted, “an installed add-on with high-enough privileges could end up changing foundational configuration without you knowing about it.”

You will need to keep tabs on your cluster’s entire lifecycle, including how it’s affected by the other tools and users that interact with it. Setting standards for creating and updating clusters across your organization, while ensuring that a sanctioned set of add-ons is always running across your cluster fleet, can help simplify the Day 2 task of identifying anomalies when they occur.

Secure Access and Isolation

A distributed network, run in full or in part on the cloud with the help of Kubernetes, demands an entirely new approach to operator/developer access and security. A network that lives everywhere is vulnerable to attack anywhere. (Sleep well tonight, dear reader!)

The zero trust approach to security has been gaining ground among organizations that have moved or are moving to the cloud. Zero trust rejects the old “castle and moat” model of security, instead using granular, automated authentication and authorization privileges to protect vital infrastructure and data, wherever they may live.

But many, if not most, organizations are still grappling with the basics when it comes to access controls. Eighty percent of participants in a survey released in January by strongDM said their organization would be working on access management this year; only 30% said a zero trust project was in their plans. (And one in three respondents of that same study called Kubernetes the most challenging technology they work with.)

Securing access to the Kubernetes API server can help prevent unauthorized probing. And when something goes wrong in a particular Kubernetes cluster – the injection of malware, for example — that cluster or microservice needs to be isolated to avoid the problem from spreading.

Observability and Visibility

The administrators of your Kubernetes clusters need enough visibility into all environments, along with the requisite level of alerting and monitoring, to triage issues as they arise. Solutions such as Rafay’s Kubernetes Operations Platform provide these functions out of the box. Having access to long-term metrics and alert data can really help SRE and IT Ops understand trends across their cluster fleet to help with planning and forecasting.

Governance and Compliance

Kubernetes is, of course, open source — wide open, like the Wild West. And the companies that use it often struggle to add critical governance and compliance capabilities, such as logging, drift detection and auditability.

Centralized enforceable cluster configuration models help with enterprise-wide cluster standardization. Having a way to ensure that all mandated security and operational add-ons are deployed helps ensure compliance with enterprise policies. Further, having a way to detect when a cluster deviates from enterprise policies, and remediate the issue if it arises, is also a critical requirement.

Rafay’s Kubernetes Operations Platform provides capabilities such as cluster blueprinting, add-on version control, policy enforcement and violation reporting, along with drift detection logic that can block changes to cluster-wide resources such as ingress controllers, runtime security tooling, etc., and an end-to-end audit trail of cluster activity.

Third-Party Integrations and Maintenance

Making all the services and tools, which power your modern (Kubernetes-based) infrastructure, operate seamlessly and play well together can be tricky. Major cloud providers usually have a suite of tools that help manage Kubernetes — but these don’t always translate well if you step outside that cloud provider’s universe, perhaps to deploy to multiple clouds, in on-premises environments, or, in some cases, at the edge.

It’s like a jigsaw puzzle of components that need to constantly fit together even though each puzzle piece has its own lifecycle to manage. Open source tools and components can bring their own problems: vulnerabilities like those discovered late in 2021 with Log4j, for instance. Or simply the toil involved in updating your version of Kubernetes itself every quarter.

Outdated tools can result in unplanned downtime, which can directly impact end customers — and, ultimately, the business.

The Ongoing Cost of Building a K8s Platform

The burden of Day 2 Kubernetes isn’t just about cloud spend. Operating and maintaining Kubernetes can also pull team members away from working on products and applications that directly generate top-line revenue for the company.

Some of the things that can add to Day 2 headaches stem from a lack of standardization. They can include complex triage and support costs when clusters with unique configurations fail, or security risks resulting from custom access and networking between controllers and clusters. A lack of kubectl access control can also expose the business to compliance and governance risks.

“Managers are sometimes reluctant to bring up Day 2 issues in the early days of the Kubernetes journey,” Budhani said, because “they don’t want to upset the developer mindset.”

That reticence is misguided, he added: Developers already know their workload is out of balance because of Kubernetes Day 2 problems.

Rafay’s Cloud Automation Platform provides a solution for platform teams that wish to build automated self-service cloud infrastructure workflows, guardrails included, allowing platform teams to enable anyone who depends on rapid access to cloud infrastructure to move faster safely with golden paths.
Learn More
The latest from Rafay

“When I talk to actual developers, they say, ‘I don’t know why I’m writing Helm charts instead of my app,’” Budhani said. “‘Yeah, I wanted to experiment with Kubernetes. I kind of liked this new technology. And I loved learning about it when I was initially exposed to it. But, my God, I’ve got a job to do.’”

C-level executives, DevOps managers, and developers all want the same thing, he said: an efficient way to ship more and better code and generate more revenue for the business. But, he noted, “they’re not talking the same language. And in the process, these large enterprises are way behind schedule on their deliverables.”

Solving the Kubernetes Day 2 Problem

“To prepare your teams and your organization for the long-term commitment of overseeing a cloud architecture built on top of Kubernetes, it’s important to understand exactly what your organization is getting into,” said Budhani.

Before you even start a cloud native project that includes operating Kubernetes, think about what your organization is trying to accomplish, he urged.

Chief information officers and other senior executives, Budhani said, “used to challenge their teams to do more, to experiment more.” Now, he said, the question should be, “Why are you experimenting with Kubernetes add-ons? Prove to me that you need to build something beyond the things that can be purchased off the shelf before you go experiment.”

The true cost of standardizing on K8s includes pricing models, implementation and maintenance. Among the questions to ask:

  • How will you determine which tools work best for your use case?
  • How will you keep up with open-source changes?
  • Will there be continued investment in tool integrations?
  • Who will fix interoperability issues and operational problems when they arise?
  • Can you hire enough people with cloud native skills — or train the team members you already have – fast enough?

Above all, consider talking to other organizations that have also crossed the chasm and implemented Kubernetes.

“First, understand how others are doing things —because that gives you a sense of how hard or simple this problem is,” Budhani said. “Learning by failing is great, but it comes at the cost of time. Why take this path when you can learn from your peers? The beautiful thing about the Kubernetes community is that people are quite open to sharing their experiences and opinions.”

He concluded, “In 2022, you’re not the only company working on Kubernetes. Lots of other companies have been through the journey or are on this journey. There are resources out there to do it right. Look for them.”

Rafay’s Cloud Automation Platform provides a solution for platform teams that wish to build automated self-service cloud infrastructure workflows, guardrails included, allowing platform teams to enable anyone who depends on rapid access to cloud infrastructure to move faster safely with golden paths.
Learn More
The latest from Rafay
TRENDING STORIES
Heather Joslyn is the former editor-in-chief of The New Stack. She previously worked as editor-in-chief of Container Solutions, a Cloud Native consulting company, and as an editor/reporter at The Chronicle of Philanthropy and the Baltimore City Paper.
Read more from Heather Joslyn
Rafay sponsored this post.
SHARE THIS STORY
TRENDING STORIES
strongDM is a sponsor of The New Stack.
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.