VOOZH about

URL: https://thenewstack.io/node-overhead-the-hidden-cost-eating-your-kubernetes-spend/

⇱ Node Overhead: The Hidden Cost Eating Your Kubernetes Spend - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-05-29 10:00:56
Node Overhead: The Hidden Cost Eating Your Kubernetes Spend
contributed,sponsor-microsoft,sponsored-topic,
FinOps / Kubernetes / Observability

Node Overhead: The Hidden Cost Eating Your Kubernetes Spend

Because node costs are generally the largest drivers of Kubernetes spending, a few percent in spending lost to node overhead can greatly impact the bottom line.
May 29th, 2024 10:00am by Alex Meijer
👁 Featued image for: Node Overhead: The Hidden Cost Eating Your Kubernetes Spend
Image via Pixabay.

Kubernetes node overhead is a largely unrecognized “cost of doing business” for teams using Kubernetes. It can be defined as the node resources used to run Kubernetes itself.

All Kubernetes nodes have capacity: the consumption of these resources is the sticker price that providers like Amazon Web Services and the Google Cloud Platform bill for. A subset of that total capacity is defined as allocatable capacity; this is the portion of the node that workloads can actually be scheduled on. Capacity is what you pay for, allocatable capacity is what you can use and the difference is node overhead.

That overhead includes the kubelet, any control plane infrastructure running on the node, the container runtime (Docker, containerd) and, in general, any software running directly on the node that isn’t in a pod. Node overhead does not include things like Prometheus, Calico/Weave/CNI pods, DNS pods, Cert Manager, kube-system or any pods running in Kubernetes.

Calculating Node Overhead

With kubectl describe or any Kubernetes API request that tells you about the nodes in your cluster, you can immediately see the node’s capacity and its allocatable capacity; the difference between them represents the node overhead.

👁 Image

This example (running an N1-standard-2 Kubernetes node) displays the capacity block and the allocatable block. You can compare the 2 CPU capacity (the sticker price) versus the allocatable 1930m. That’s a relatively small amount of overhead. However, looking at the memory, there’s a stark difference: 7.6GB of capacity versus 5.7GB allocatable, which is a more substantial amount of overhead.

Turning to Open Source for Cost Visibility

OpenCost is an open source Cloud Native Computing Foundation (CNCF) sandbox project, with contributors including Microsoft, Kubecost, Adobe, SUSE and many others. The REST API combines information from kubectl describe with actual node cost information.

To calculate node overhead within OpenCost itself, you can surface standard kube-state metrics on capacity and allocatable CPU and memory. Those metrics can be exposed to Prometheus to collect and store data and to then provide querying and aggregating functions. OpenCost collects query results from Prometheus, calculates the node CPU/memory used for overhead (both in the units of bytes/vCPU,and also as a percentage) and then provides a singular cost-weighted average metric as a final summary showing the fraction of costs spent on Kubernetes overhead.

Understanding Kubernetes Overhead

Using these relatively straightforward operations on widely available Prometheus metrics, you can better understand Kubernetes overhead—including how overhead changes as node size increases, the node family changes and what differences there are across cloud providers. That understanding is crucial to accurately sizing clusters and informing other key cost-efficiency decisions.

For example, when we started to survey the overhead of popular node types, we immediately found something odd. Starting with small nodes, we discovered a set of GCP E2 small, E2 micro and E2 medium nodes with 50% CPU overhead, as highlighted in the chart below.

👁 Image

These wound up being special cases and, as such, are excluded from other graphs in this article. They do highlight something interesting about this family of nodes when used with Kubernetes: these are special burstable instance types. The way these nodes are priced by GCP versus what they declare as their capacity to Kubernetes makes them seem like they have high overhead. In the interest of brevity, we won’t go into details here, but check out our talk on this subject on YouTube for more details.

Let’s instead focus on the metric most readers are likely interested in: What percentage of node cost is overhead, and how does that vary as node capacity increases? Most often, nodes that provide more capacity cost more. The overall trend is that low-cost/low-capacity nodes lose a higher percentage of their available compute to overhead; overhead cost percentages decrease as node sizes (and therefore costs) increase. This relationship between node size and node overhead is illustrated in the below chart.

👁 Image

Analyzing the relationship between node overheads on different cloud providers, AKS overhead costs are a clear step above those of GKE and EKS. Overall, Kubernetes overhead costs can reach over 20% on the smallest node types for each provider, are generally in the 5–10% range for “medium” size nodes and taper toward zero on the largest available node types.

The above chart uses a blended overhead percentage that incorporates both CPU and memory overhead. Breaking down that composite metric by looking at overhead by individual resource type, we plot the percentage of the node’s overall CPU lost to overhead as the core count increases:

👁 Image

Overall, we see single-digit percentages from 5% of CPU lost to overhead, tailing off toward 0.5% as the node size increases. These single-digit percentages indicate that memory is the primary component of node overhead:

👁 Image

Analyzing the memory-focused chart, we see overhead in the tens of percent for the smallest nodes and the same decreasing trend line as we have seen in the other line graphs as node size increases.

Our next line of analysis involved exploring the relationship between node overhead and node family. As the charts below show, the trend follows what we have seen elsewhere, in that smaller members of each family generally have higher overheads. A key takeaway for us from these charts isthat family type doesn’t significantly affect overhead. A high CPU node, for example, doesn’t have a meaningfully different amount of overhead from a standard node or a high memory node.

👁 Image

👁 Image

Our last line of investigation sought the overall lowest overhead node types in our survey. As the bar chart of the lowest overhead nodes below shows, EKS dominates the field. The 25 nodes with the lowest overhead all clock in at under 0.6% of node capacity lost to overhead. An inspection of the node types in that chart, of course, shows that they are some of the largest instances available in AWS, with many .metal and .48xlarge instance types making the list.

👁 Image

Key Takeaways

This study reveals a clear trend: the larger the node, the lower the overhead. A significant number of node types have an overhead of 10% or greater, which is more and more impactful as organizations increase their cloud spend. Total overhead cost is largely driven by memory, with CPU overhead generally amounting to 6% or less. Comparing providers, AKS generally has the highest overhead, and EKS has the lowest overhead available. Specialized node types (high compute, high memory, high disk, etc.) don’t have significantly more or less overheads than other node types.

Optimizing Node Sizes with Kubernetes Overhead in Mind

Aligning node strategies with overhead realities means overcoming a challenging tension. Larger nodes feature reduced overhead costs, but organizations certainly shouldn’t use huge nodes if they’re going to sit idle. It’s a big mistake to go to extremes and select an 800% larger node for a 10% overhead savings — unless you can use that capacity!

What is valuable is to look at whether smaller nodes can be consolidated into fewer larger nodes, while meeting availability requirements. Considering Kubernetes overhead when making node sizing decisions will yield strategies that are that much more accurate and effective.

Final Thoughts

Because node costs are generally the largest drivers of Kubernetes spending, a few percent in spending lost to node overhead can greatly impact the bottom line. There are real overhead costs on each node, required for running Kubernetes, which can mean 5–20% of a node’s availability is below what organizations may think they’re getting. Therefore, optimizing to reduce that waste via greater visibility into Kubernetes overhead is well worth exploring as organizations look to reduce their overall cloud spend.

Microsoft (Nasdaq “MSFT” @microsoft) enables digital transformation for the era of an intelligent cloud and an intelligent edge. Its mission is to empower every person and every organization on the planet to achieve more.
Learn More
The latest from Microsoft
TRENDING STORIES
Alex Meijer is a Software Engineer at Kubecost, where he builds Kubernetes cost monitoring tools. He has been working with Kubernetes for his entire career, being at various times a user, an operator, and currently as someone working to help...
Read more from Alex Meijer
SHARE THIS STORY
TRENDING STORIES
Amazon Web Services, the Cloud Native Computing Foundation and Google are sponsors of The New Stack.
TNS owner Insight Partners is an investor in: Docker.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.