VOOZH about

URL: https://thenewstack.io/hpa-managed-workloads-why-waste-stays/

⇱ HPA-managed workloads: Why the obvious waste stays  - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-04-11 21:51:00
HPA-managed workloads: Why the obvious waste stays 
contributed,contributed-cloudbolt,
Operations / Software Development

HPA-managed workloads: Why the obvious waste stays 

Apr 11th, 2026 9:51pm by Yasmin Rajabi
👁 Featued image for: HPA-managed workloads: Why the obvious waste stays 

Teams running Kubernetes can usually see where they’re overprovisioned. Requests are higher than they need to be, there’s consistent headroom, and capacity sits underused. 

This has been true for a while, but it is showing up more often now as more teams run burstier model-serving workloads on Kubernetes and start feeling the cost of overprovisioning more directly. 

But those workloads don’t get touched. 

This shows up most with HPA-managed services. The inefficiency is obvious; as the HPA scales, the waste scales with it. What’s less obvious is what happens when you change it. 

These workloads already scale under real production traffic. Teams have watched how they behave during spikes, launches, and incidents. That history builds trust. And once that trust is there, inefficiency is easier to live with than unpredictability. 

The biggest problem with most optimization approaches isn’t the math. It’s that they treat this as a math problem. Teams aren’t optimizing for average utilization. They’re optimizing for resilience during the worst five minutes of the quarter. Any approach that doesn’t understand that distinction is solving the wrong problem. 

The problem isn’t finding the waste 

Most teams can spot overprovisioned workloads in minutes. I bet you every organization out there has at least a Grafana dashboard showing the stark difference between capacity allocated and capacity used. The harder question is what happens after a change gets applied. 

For HPA-managed workloads, requests aren’t just a sizing input. They shape scaling behavior. HPA decisions depend on utilization ratios, so when requests change, those ratios change too. That shifts when scaling kicks in and how aggressively replicas increase. 

This is what makes resource changes feel fundamentally different from code deploys. A bad deploy has a known rollback path. A bad resource change is more subtle. It shifts an invisible contract between the workload and the scheduler, and the failure mode might not surface until Friday afternoon when traffic spikes hit a threshold that didn’t exist at the old request values. By then, three other things have changed too, and proving causation is nearly impossible. 

Changing requests isn’t a resource adjustment. It’s a change to how the workload scales. That’s what makes teams nervous. 

What teams are actually protecting 

In most cases, this isn’t inertia or ignorance. It’s a deliberate choice. Teams are preserving behavior that already works: 

  • Predictable scale-out during spikes 
  • Stable latency under real traffic 
  • Known behavior during releases and incidents 
  • The ability to explain what the service will do when demand moves 

Once things seem to be “working”, any change that could shift its scaling behavior looks risky. Most teams would rather tolerate the waste than introduce a new variable into a service they already depend on. 

And it’s worth being honest about why: the people who set those resource values are the same people who get paged at 2am if something breaks. The risk isn’t abstract. A suggestion to downsize might be technically correct, but if it touches a service owned by a team that had an incident six months ago, that team isn’t changing anything. The savings opportunity doesn’t outweigh the personal accountability. 

Why standard rightsizing stops here 

Most rightsizing workflows assume a simple loop: adjust requests, watch what happens, iterate. That works for stable services where changing requests doesn’t also change scaling behavior. 

It breaks with HPA-managed workloads, where requests and scaling are coupled. That gets even harder with model-serving workloads, where traffic can move fast and the cost of carrying extra headroom is unusually visible. 

The failure mode is especially dangerous because it’s not immediate. A service can show low average usage all week and then hit a traffic spike where the headroom that looked wasteful turns out to be the reason it stayed stable. Automation that trims too close to the line based on recent averages doesn’t account for business context: product launches, seasonal spikes, marketing events, or end-of-quarter surges that aren’t in the last two weeks of data. 

That’s why these workloads sit outside routine rightsizing efforts even when the waste is obvious. 

What would need to be true for teams to act 

If teams are going to optimize here, preserving existing scaling behavior is the bar. Changing requests can’t quietly change when the workload scales or how aggressively it responds. 

The approach that works is treating requests and HPA targets as a coupled pair. Adjust both atomically, and the workload’s behavior under load stays intact even as the resource footprint shrinks. 

But even the right technical approach isn’t enough on its own. Teams need to see the reasoning behind each change, not just the recommendation. They need guardrails that respect the same SLOs they’re held accountable to. And they need a path that starts with visibility, moves to approved recommendations, and only graduates to automation after trust has been earned. Flipping straight to full autonomy doesn’t build confidence. It skips the part where confidence gets built. 

That trust curve shows up in the broader Kubernetes market too. In CloudBolt’s recent research on the Kubernetes automation trust gap, teams consistently reported that visibility and recommendations are much easier to adopt than autonomous execution. 

Teams also need rollback to be straightforward. Not “file a ticket and wait.” Automatic, fast, and triggered by the same health signals the team already trusts. 

Without all of that, the default answer stays simple: leave it alone. 

The most expensive inefficiencies sit inside the workloads no one feels safe changing. 

CloudBolt helps platform teams turn Kubernetes optimization insights into safe, trusted action—with the guardrails, visibility, and control needed to operate confidently in production.
Learn More
Hear more from CloudBolt
TRENDING STORIES
Yasmin Rajabi is the Chief Operating Officer at CloudBolt Software. She is a recognized leader in the FinOps and Kubernetes communities, and her background as an engineer, product leader, and operator gives her a holistic perspective on the challenges facing...
Read more from Yasmin Rajabi
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.