VOOZH about

URL: https://thenewstack.io/platform9-elastic-machine-pool-for-eks-clusters-for-cost-optimization/

⇱ Platform9 Elastic Machine Pool for EKS Cost Optimization - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-05-27 10:00:06
Platform9 Elastic Machine Pool for EKS Cost Optimization
contributed,
Cloud Services / FinOps / Kubernetes

Platform9 Elastic Machine Pool for EKS Cost Optimization

With EMP, enterprises could finally start realizing significant efficiency gains in virtualized data center operations.
May 27th, 2024 10:00am by Joe Thompson
👁 Featued image for: Platform9 Elastic Machine Pool for EKS Cost Optimization
Image via Pixabay.

Just about every organization with a significant cloud footprint has issues with wasted spending on enormous amounts of unused cloud resources, and an entire catalog of tools has sprung up to try to help. But what do you do when you choose and deploy a FinOps cost-control tool like an autoscaler in your Amazon Web Services EKS cluster and… it falls short of expectations?

Did you pick the wrong tool? Probably not — at Platform9, our conversations with customers have consistently shown that a wide variety of existing tools do help… but not enough. You need something more than just a tool — you need an integrated solution built to achieve your FinOps goals.

Recap: Kubernetes Right-Sizing Challenges

If you’ve been reading our latest blog posts on Kubernetes FinOps, you’ll know that there are a wide variety of tools intended to help you optimize workload resource usage — from core code built into Kubernetes itself, to components built to interface with external SaaS products, to add-ons intended to run within the cluster itself. But you also know that these bring additional complexity of their own, from infrastructure credential-handling to unexpected interactions with each other when using multiple tools.

Many of them do a good job of scaling the cluster up or down to meet application resource demands and ensure availability, but they aren’t actually intended to optimize those application workloads — and even when they are, they’re limited by the resource-management capabilities of Kubernetes itself.

The net result is that even with one or more of these tools in play, actual cluster resource consumption still typically hovers around 30%. This is significantly less than most enterprises would like — but what more can you do?

This is not a new problem — in fact, it’s a fairly old one: The main drivers of container inefficiency are the same ones that drove early virtualization inefficiency — resource and load management challenges, and the desire to avoid leaving applications starved for resources because of scaling delays during periods of peak demand.

Elastic Machine Pools for EKS Clusters: A Cost-Optimization Layer Built on Proven VM Technology

History Rewind – Solving Utilization Challenges in VM Environments

The issue of intractable under-utilization in Kubernetes is an almost direct replay of the same struggle in virtualization environments 15-20 years ago. Virtual machines (VMs) promise the ability to run services with hardware-like isolation while sharing resources more efficiently. Instead of needing to either run multiple resources in the same operating system or waste hardware resources to fully isolate them from each other, in theory, you could provision multiple small VMs per physical node, with services running under separate operating systems from the kernel up to maintain a security boundary between them.

In practice, virtualization in its earliest forms didn’t live up to this promise: If your application had periods of higher utilization, you had to allow it to use enough resources to handle those peaks. Sometimes deploying instances of the application on additional VMs behind a load-balancer was enough to deal with the extra demand, but if you couldn’t provision quickly enough to absorb the load as it increased, your application would still end up hitting a wall — so virtual machines still tended to be configured with a lot of resource overhead. It was also difficult to handle moving VMs between hypervisors to rebalance resource usage without disrupting the workloads running on them.

Before too long, hypervisors started gaining capabilities aimed at better resource management:

  • Overcommitment allowed allocating more memory to VMs running on a hypervisor node than the node itself has; if some VMs weren’t using all the memory they were allocated, others could use it.
  • Memory page merging allowed VMs running similar operating systems and applications to share a single copy of identical portions of memory, increasing the density with which VMs could be placed on nodes.
  • Live migration allowed VMs to be moved to newly provisioned nodes seamlessly when the cluster needed to expand to handle demand, or to consolidate workloads when nodes were underutilized so some could be powered off until needed.

One of the touted benefits of the trend toward containerization over the last decade was improved efficiency over traditional virtualization, using new Linux capabilities like namespaces and control groups; in theory, without the need to run a full operating system kernel and libraries to isolate applications, application processes could share the same hardware at higher density safely. In practice… it hasn’t worked out so nicely.  Kubernetes has some mechanisms to help, but within the platform itself, nothing comprehensively solves these issues.

How Elastic Machine Pools Use Proven Virtualization Technology To Solve Kubernetes FinOps Challenges

The solution for Kubernetes resource-management issues is the same today as it was back then for virtual machines: use overcommitment, page merging and live migration to make the necessary consolidation work seamlessly. But Kubernetes itself has no way to do this, and in a cloud environment like AWS, you don’t normally have access to the hypervisor running your instances. Elastic Machine Pools (EMP) bridges the gap by leveraging AWS Bare Metal, which gives it the capability to set up a virtualization layer under EMP’s direct control (in fact, allowing customers to run their own virtualization environments like this was exactly why AWS built the Bare Metal capability in the first place).

With the virtualization layer established, EMP sets up its own virtual machines, called Elastic VMs (EVMs), and joins them to the EKS cluster as new nodes — allowing EMP to use the same production-proven virtualization mechanisms discussed above to automatically optimize Kubernetes utilization without sacrificing availability:

  • EVMs with significant amounts of resources allocated but unused by their workloads are consolidated more densely on EMP-managed Bare Metal to improve utilization — without altering the configuration of individual Kubernetes workloads at all.
  • When more workloads are deployed, or existing ones start to use more of their resource allocations due to additional demand on applications, more Bare Metal instances are provisioned and EVMs are live-migrated to them to rebalance the load — without disrupting the pods running on them (especially beneficial for monolithic disruption-sensitive applications, such as many business apps written in Java). Likewise, if overall cluster utilization decreases again, underutilized EVMs are live-migrated onto a smaller number of Bare Metal instances and the excess compute is deprovisioned without disruption.

All of this automated optimization takes place at a level below EKS and cluster-based autoscalers — you don’t need to change how you define and run your workloads to benefit, and you’re still using the standard EKS cluster control plane and Kubernetes API. Plus, if you’re already using autoscalers or workload right-sizing tools in your EKS clusters, you can continue to do so — EMP can run alongside them, will not interfere with their actions and will still provide additional optimization via its EVMs.

The capability to manage utilization of the cluster as a whole in this way is something that has been missing from Kubernetes, and that gap is the main reason cluster operators have had such a difficult time achieving higher cluster utilization — using tools built on top of the Kubernetes API simply can’t achieve these kinds of results without negative impact to the workloads in the cluster.

With EMP finally filling this gap, cluster operators no longer have to walk a difficult line between saving money and protecting availability of applications. As a result, utilizations of up to 70% in Kubernetes without risking application availability are now achievable — and the cost savings of not paying for wasted compute resources at EC2 pricing are significant. And at general availability, we plan to enable Platform9’s Always-On Assurance — our proactive monitoring and management of your environment to detect and correct issues, usually before you even notice a problem developing.

Get Started Optimizing With EMP!

Platform9 Elastic Machine Pool is now available as an early-access offering for Amazon EKS (see our AWS Marketplace listing for details).  Get in touch with us for more information and we’ll get you up and running quickly — customers typically see significant savings in EKS clusters over using autoscalers alone within a few weeks!

Additional Reading

From the Platform9 blog: Kubernetes FinOps: Right-Sizing Kubernetes Workloads

More about Elastic Machine Pool:

TRENDING STORIES
Joe Thompson's career in IT started when the Linux kernel was still in 1.x versions and home Internet speeds were expressed in kilobits per second. Since 2014, he's worked primarily with cloud native systems, starting with OpenStack and public clouds,...
Read more from Joe Thompson
SHARE THIS STORY
TRENDING STORIES
AWS is a sponsor of The New Stack.
TNS owner Insight Partners is an investor in: Rewind.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.