VOOZH about

URL: https://thenewstack.io/saving-with-confidence-the-strategic-advantage-of-spot-instances/

⇱ Saving with Confidence: The Strategic Advantage of Spot Instances - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-03-18 06:57:00
Saving with Confidence: The Strategic Advantage of Spot Instances
sponsor-cncf,sponsored-post-contributed,
Cloud Services / Operations

Saving with Confidence: The Strategic Advantage of Spot Instances

The dynamic nature of spot instance pricing, availability and stability requires a proactive approach, where adjustments to workloads are made in real time.
Mar 18th, 2024 6:57am by Leon Kuperman
👁 Featued image for: Saving with Confidence: The Strategic Advantage of Spot Instances
Image by CastAI.
CNCF sponsored this post.

As cloud services have proliferated to the mainstream, organizations are continuously looking for innovative strategies to optimize their spending without compromising on performance and uptime.

Amid the rapid growth of hyperscalers like AWS, Microsoft Azure and Google Cloud Platform (GCP), which have all seen double-digit expansion, a significant opportunity for savings lies in an often-misunderstood resource: spot instances.

Despite the potential to slash compute costs by 75% to 90%, many customers remain hesitant, primarily due to concerns over their perceived instability. At CAST AI, we’ve seen these challenges firsthand. For example, one of our customers has run the majority of the company’s apps on spot instances for the past year with zero downtime, even through the busiest part of the holiday season when spot instance inventory becomes scarce. Despite this, senior leaders keep warning about the risk of relying on spot instances. We’ve also seen what works effectively at scale when using spot instances in a cost reduction strategy.

The Hesitation: Perceived Instability

The main deterrent against using spot instances is the fear of instability. Cloud providers can reclaim these instances with minimal notice — 2 minutes on AWS, and just 30 seconds on GCP and Azure. This unpredictability poses a mind-bending challenge for businesses relying on stable and uninterrupted computing resources: Do I save 75% to 90% on compute costs and risk downtime, or do I pay more and worry less about downtime?

In the context of Kubernetes environments, spot instances represent several unique and interesting technical challenges. These are due to the inherent unpredictability of spot instance availability and the complex nature of workload management. Let’s go over a few examples.

Graceful shutdown and migration of workloads: Upon receiving a termination notice for a spot instance, the Kubernetes cluster needs to perform several operations in a very short window. This includes gracefully shutting down running applications, committing any final state to storage and rerouting traffic to ensure availability. These operations are nontrivial, especially for stateful applications or those with complex shutdown procedures that might require more time than the notice period allows.

Rescheduling and capacity planning: Kubernetes must quickly reschedule the workloads from the terminated spot instance to another compute resource. This requires real-time capacity planning to identify available resources that can accommodate the evicted workloads without causing resource contention or performance degradation. In a cloud environment, where spot instance availability can fluctuate dramatically, ensuring a smooth transition can be challenging.

Automated, intelligent decision-making: To manage these transitions effectively, Kubernetes clusters need to employ sophisticated automation and decision-making algorithms. This involves not just reacting to spot instance terminations but proactively managing the mix of instance types and purchasing options (spot, on-demand, reserved) based on cost, availability and workload requirements. Developing and tuning these algorithms to balance cost savings with reliability and performance objectives requires deep expertise and continuous adjustment.

Network and dependency management: Workloads running on spot instances might be part of a larger, interdependent microservices architecture. When an instance is terminated, it’s not just about moving the affected workload; it’s also about ensuring that network configurations, service discovery mechanisms and dependency relationships are updated in real time to reflect the new deployment topology. Kubernetes and adjacent cloud native technologies such as service mesh take care of many of these concerns. However, tight time constraints add to the complexity.

Given all this, it’s understandable why many companies hesitate to embrace spot instance capacity. Opting for savings plans, reserved instances and cloud providers’ other commitment-based discount programs seems much more straightforward in terms of planning and utilization. Yet, in taking this route, customers overlook the most substantial savings opportunities the cloud has to offer, coupled with absolute flexibility.

The Reality: Measurable and Manageable Risks

What if perceived instability could be quantified and, therefore, effectively managed through automation? This is the premise behind our latest innovation: a global heat map that provides clear insights into spot instance availability and reliability across different regions and availability zones. With the upcoming launch of our spot instance heat map, by tracking metrics such as spot interruption rate and insufficient capacity errors (ICE), we’ll offer a tangible way to assess the risk associated with using spot instances in specific locations.

👁 Image

Embracing Automation

The key to unlocking the full potential of spot instances lies in automation. The dynamic nature of spot instance pricing, availability and stability requires a proactive approach, where adjustments to workloads are made in real time based on current market conditions. This includes not just choosing the most cost-effective instances, but also preparing for and responding to interruptions without manual intervention. Automation can ensure that workloads are seamlessly transferred to new instances, eliminating downtime and maintaining performance.

The Strategic Advantage

We hope that our heat map provides organizations with some insights into risk management across all cloud regions and availability zones. Observability and risk assessments are not enough, however. With automated management tools, businesses can confidently incorporate spot instances into their cloud infrastructure. This not only leads to substantial cost savings but also empowers organizations to make data-driven decisions about their cloud resources. The fear of instability becomes a manageable risk, overshadowed by the benefits of optimized spending and enhanced efficiency.

To learn more about Kubernetes and the cloud native ecosystem, join us at KubeCon + CloudNativeCon Europe in Paris, from March 19-22, 2024.

The Cloud Native Computing Foundation (CNCF) hosts critical components of the global technology infrastructure including Kubernetes, OpenTelemetry, and Argo. CNCF is the neutral home for cloud native collaboration, bringing together the industry’s top developers, end users, and vendors.
Learn More
The latest from CNCF
TRENDING STORIES
Leon Kuperman is co-founder and CTO at CAST AI. Formerly vice president of Security Products OCI at Oracle, Leon has 20+ years of experience spanning companies such as IBM, Truition and HostedPCI. He founded and served as the CTO of...
Read more from Leon Kuperman
CNCF sponsored this post.
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.