VOOZH about

URL: https://thenewstack.io/kubernetes-teams-trust-automation/

⇱ Kubernetes teams trust automation to ship code but not to touch CPU, and AI is raising the stakes - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-06-23 16:56:47
Kubernetes teams trust automation to ship code but not to touch CPU, and AI is raising the stakes
contributed-cloudbolt,
AI Infrastructure / FinOps / Kubernetes

Kubernetes teams trust automation to ship code but not to touch CPU, and AI is raising the stakes

82% of Kubernetes practitioners say they trust automated delivery, but only 27% will let automation change CPU or memory without a human in the loop.
Jun 23rd, 2026 4:56pm by Yasmin Rajabi
👁 Featued image for: Kubernetes teams trust automation to ship code but not to touch CPU, and AI is raising the stakes

Kubernetes teams automate deployments without thinking about it. CI/CD pipelines fire dozens of times a day, autoscaling adjusts replicas in the background, rollback is muscle memory. But there is one category of automation where that confidence vanishes: letting a system change CPU and memory requests on a running workload without a human reviewing it first. 

And as AI inference lands on Kubernetes at scale, that hesitation is becoming hard to ignore, and increasingly expensive.

Why teams trust automation for change but not for constraint

We surveyed 321 Kubernetes practitioners at enterprise organizations earlier this year. The headline finding is one most practitioners will recognize immediately: 82% report high or complete trust in automated delivery controls. But 71% still require human review before applying resource optimization recommendations. Only 27% allow CPU and memory changes to be auto-applied, even within guardrails.

“Deploying code feels additive… rightsizing feels subtractive because you are removing safety margin from a running service, and the failure mode is fundamentally different.”

Those numbers describe a specific asymmetry. The same engineers who deploy to production dozens of times a day without hesitation slow down the moment automation wants to adjust resource allocation. And the survey data make it clear why. Deploying code feels additive. You are shipping new value, the rollback path is well understood, and if something breaks you usually see it right away. Meanwhile, rightsizing feels subtractive because you are removing safety margin from a running service, and the failure mode is fundamentally different.

As one practitioner in the survey put it: “Automated right-sizing carries a unique risk because it directly impacts the underlying stability of the application runtime. Unlike a code deployment that follows a tested path, resource changes alter the invisible contract between the workload and the scheduler.”

When you change resource requests, you change how Kubernetes schedules, prioritizes, and allocates resources. Those effects are not visible the way a code change is. You can’t trace them through a deployment pipeline. And you might not discover that something went wrong until two weeks later, when a traffic spike hits a threshold that didn’t exist at the old values. By that point, three other things have changed too, and proving causation is nearly impossible. The people responsible for those workloads are the same people who get paged at 2 a.m., and they know this.

Why AI workloads raise the stakes

That trust gap existed before inference workloads showed up. What’s changed is the cost of not closing it.

For a long time, teams could absorb the cost of manual oversight. They knew their workloads, had intuition for where the safe boundaries were, and the inefficiency of over-provisioning was a price worth paying for stability. GPU-accelerated inference workloads change that math. GPU compute is significantly more expensive per hour than CPU. The cost of over-provisioning is no longer a rounding error you can absorb quietly. And the workload behavior is less familiar, as inference jobs are bursty in ways teams haven’t built intuition for, traffic patterns shift as models are updated and usage changes, and the resource dimensions involved differ from what teams have spent years learning to tune.

That unfamiliarity compounds with scale. Rightsizing isn’t a one-lever problem the way horizontal scaling is. It involves, at minimum, CPU and memory requests and potentially limits for both, with four dimensions per workload, multiplied across hundreds or thousands of workloads per cluster. The survey data indicates that manual optimization breaks down at around 250 changes a day. Inference workloads push teams past that threshold faster than anything they’ve managed before, because the resource decisions are more frequent and the cost of getting them wrong is higher.

The economic case for automated rightsizing has never been stronger. The organization’s willingness to delegate hasn’t caught up because teams are being asked to trust automation with workloads they don’t yet have a track record with.

What the survey says about closing the gap

When we asked practitioners what would actually increase their trust in optimization automation, 48% said visibility and transparency into how decisions are made, 25% wanted proven guardrails, and 23% needed instant rollback.

Nobody asked for full manual control and very few asked for blind autonomy. What they described is automation that earns trust in stages, and that’s consistent with how the teams furthest along in their automation journey actually got there. They didn’t start with production. They started with a single namespace in a dev environment, observed the system’s behavior, compared recommendations with outcomes, and gradually expanded the scope. Different environments remained at different levels of automation maturity simultaneously, and that was intentional. Production carried more scrutiny than dev.

CI/CD followed the same curve, and the timeline is easy to forget. Most organizations took years to get from running their first automated pipeline to trusting it with production deploys without manual approval on every commit. Kubernetes resource automation is earlier in that same process, and AI workloads are extending the timeline because teams are building trust from scratch with a workload category that doesn’t yet have a track record.

Why automation design matters as much as capability

Some automation architectures deliver meaningful value only with full delegation. The system needs complete control to function the way it was designed to. That’s a form of forced autonomy, and it creates an adoption problem because it asks for exactly the level of trust that most organizations haven’t built yet. Force generally doesn’t work. Teams that feel pushed into a level of delegation they aren’t comfortable with tend to pull back entirely after the first incident.

The alternative is what I’d describe as adaptive autonomy: designing the system to work at every stage of the trust curve. A team still evaluating gets useful recommendations in read-only mode. A team ready to act but wanting boundaries can run guardrailed execution within limits they define. As confidence grows, the system handles more decisions autonomously while humans manage exceptions. And for environments where the track record supports it, closed-loop optimization runs in the background and becomes boring, which is the goal. Each stage is a legitimate operating mode, not a stepping stone you have to rush through.

That design distinction matters more with AI workloads than it ever did with traditional services, precisely because the trust-building process is starting from zero on workloads where the cost of getting it wrong is highest.

“Trust takes a long time to build and a single production incident to undermine.”

The other piece that makes this sustainable is rollout safety. Trust takes a long time to build and a single production incident to undermine. Start with the workloads showing the most headroom between requests and actual usage. Make changes incrementally, small enough that a bad outcome stays contained. Rollback needs to be fast and tied to the health signals the team already monitors. And start with opt-in, not opt-out. Let the teams willing to go first build a track record that others can look at.

The broader pattern

The 71% figure is sometimes read as resistance to automation. I think it’s a more accurate picture of how operational trust actually forms: conditional, earned over time, and moving at different speeds depending on what’s at stake. AI workloads are raising those stakes significantly, which means the path to trusted automation matters more now than it did when the cost of caution was just some unused CPU.

“Most of what gets written about Kubernetes optimization focuses on tooling capability, and the tooling is capable. The harder problem is the human one.”

Most of what gets written about Kubernetes optimization focuses on tooling capability, and the tooling is capable. The harder problem is the human one. If your team is managing AI inference workloads on Kubernetes and your optimization tooling is sitting in read-only mode, the question worth asking isn’t whether to trust the system. It’s whether the system is designed to let you build that trust gradually, starting where the stakes are low and expanding as the evidence supports it, on workloads where getting it wrong costs more than it ever has before.

TRENDING STORIES
Yasmin Rajabi is the Chief Operating Officer at CloudBolt Software. She is a recognized leader in the FinOps and Kubernetes communities, and her background as an engineer, product leader, and operator gives her a holistic perspective on the challenges facing...
Read more from Yasmin Rajabi
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.