VOOZH about

URL: https://thenewstack.io/reducing-cloud-spend-need-not-be-a-paradox/

⇱ Reducing Cloud Spend Need Not Be a Paradox - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2021-08-17 07:00:09
Reducing Cloud Spend Need Not Be a Paradox
contributed,
Cloud Services / Tech Culture

Reducing Cloud Spend Need Not Be a Paradox

By focusing further up the stack and matching services to the right infrastructure choices, incentivizing optimizing behaviors, automating thoughtfully (not reflexively), and having a repatriation strategy before you reach scale, you can be better positioned to reign in costs and retain value for you shareholders.
Aug 17th, 2021 7:00am by Nati Shalom
👁 Featued image for: Reducing Cloud Spend Need Not Be a Paradox
Feature image via Pixabay.

Few people know cloud transformation from the perspective of Martin Casado. Recently, he and Sarah Wang published a chatter-provoking analysis titled, “The Cost of Cloud, a Trillion Dollar Paradox,” posted on the blog site for venture capital firm Andreessen Horowitz (a16z). The post challenges conventional thinking about the mania surrounding cloud transformation and cloud migration, and it calls us to think about the impact of cloud infrastructure cost (and the impact this, in turn, has on company valuation).

A recent blog from @a16z made the case that for SaaS companies, the cost of cloud is a drag on their market caps. What @martin_casado told me about the post and the debate that ensued: https://t.co/1mqSLUMSYT

— Belle Lin (@bellelin_) June 23, 2021

Nati Shalom
Nati Shalom is the CTO and Founder of Cloudify. He is a serial entrepreneur and widely published thought leader and speaker in open source, multicloud orchestration, network virtualization, DevOps, and edge computing. Nati has received multiple recognitions including YCombinator and is one of the leaders of cloud native and DevOps Israel groups.

The paradox states that cloud is cheaper earlier in a company’s evolution, and it becomes more costly as the company scales. They state it simply: “You’re crazy if you don’t start in the cloud, then you’re crazy if you stay on it.” The paradox is that cloud infrastructure makes your business model possible at smaller scales, but it transforms into a source of value destruction at scale, apparent only after you’re deeply committed to the cloud. In aggregate, this equates to hundreds of billions of dollars of equity value evaporation.

Casado and Wang emphasize repatriation — bringing workloads back to private or hybrid infrastructure from cloud-only models — as the main strategy to optimize infrastructure cost. They tell the story of a billion-dollar private software company with a public cloud spend consuming 81% of the company’s cost of revenue (COR). Among the largest 50 publicly traded software companies, aggregate cloud bills top $8 billion (among those that reveal cloud spend).

It’s curious that cloud repatriation is no more popular than it is. Repatriation can drive a big reduction in cloud spend, and an oft-cited figure is 50% savings. Adopting this for the purposes of making the point, repatriation would, in the example cited by Casado and Wang, result in savings of $4 billion in recovered profit. Consider the broad universe of at-scale software companies utilizing public cloud infrastructure, and you can quickly see that this $4 billion of unrealized profit could be far higher.

The a16z post offers a set of useful recommendations on how to overcome the trillion-dollar paradox, including making cloud spend a KPI, incentivizing engineers to optimize resource consumption, choosing a subset of your most resource-intensive workloads as a place to start, and thinking about repatriation upfront before inertia and lock-in strip away your options for repatriation.

My Take

The growth of infrastructure cost does not always grow in direct proportion to revenue growth. This can lead to shrinking profitability as a company scales, and as a result of that, the growing cost of cloud infrastructure equates to hundreds of billions of dollars of equity value of software companies as they reach scale.

Let’s dig in on this a bit: cloud spend can have a 25x impact on market cap, according to Casado and Wang’s analysis. Applying this, one quickly sees that an additional $4 billion of gross profit can be estimated to yield an additional $100B of market capitalization among these 50 companies alone.

Monitoring service provider Datadog, a publicly-traded company, recently traded at close to 40 times 2021 estimated gross profit and disclosed an aggregate $225 million, three-year commitment to Amazon Web Services‘ in its S-1.

Let’s annualize the committed spend to $75 million of annual AWS costs, and let’s further assume 50% or $37.5 million of this may be recovered via cloud repatriation. This translates to approximately $1.5 billion of additional market cap for the company, just on committed spend reductions alone! If we expand to the broader universe of enterprise software and consumer internet companies, this number is likely more than $500 billion, assuming 50% of overall cloud spend is consumed by at-scale technology companies that stand to benefit from cloud repatriation.

Consider this example of the impact efficiency might have on company valuation. Both MongoDB and Elastic reported nearly identical fiscal year 2021 annual revenue ($590 million and $608 million, respectively). Why is it that the market cap of MongoDB is nearly double that of Elastic ($23.4 billion and $13.6 billion, respectively)? One clue might be the difference in infrastructure use efficiency at Mongo, which uses fine-grain multi-tenancy for its SaaS offering versus Elastic, which uses a separate cluster per tenant. The difference in resource consumption is dramatic.

By decoupling the service from the infrastructure we can create predefined zones of infrastructure that are highly optimized to serve each workload.

In preparing this post, I chatted with Casado, who pointed out to me that the likely difference here is the power of a SaaS model. “The market values cloud revenue about three times more than open source on-premises, and the reason is largely about net revenue retention,” he observed. On-premises open source infrastructure tends to have a high churn rate, (normally 18%). In addition to Elastic and MongoDB, Confluent and others felt this effect as well. Elastic is one company that hasn’t been terribly successful in moving large portions of its offering to SaaS. Mongo and Confluent accomplished this, and Databricks and Snowflake started in the cloud.

Casado also called out the example of Atlassian, which moved its service to a multitenant cloud model in AWS, and in so doing reduced cost by three times. However, this wasn’t because AWS was cheaper. It wasn’t. It was because the re-architected, multitenant model made the service far more lightweight.

Focus on Services, Not Infrastructure

The job to be done here is to optimize cloud spend. Therefore, we need to think of cloud optimization pragmatic terms. Optimization is hard. To be successful we need to stop thinking in terms of velocity of feature development vs efficiency. Instead, we should treat efficiency as yet another first-class citizen feature that needs to be prioritized in our backlog and get the right management attention as any other feature.

Throwing Automation at the Problem Can Make Things Worse

Using automation as an efficiency tool to reduce cost sounds trivial. If so, why are so many cloud implementations still so horribly inefficient — often by orders of magnitude? Automation done right can reduce cost significantly. Quite often, however, the side effect of automation is that it makes it easy for developers to spin up cloud resources and leave them running even when they’re not needed. In Is There an Enterprise Margin Crisis?, Casado points out that…

  • Automation isn’t magic: Many companies try to improve margins by automating human processes. This can be technically challenging, and the drive for growth makes prioritization difficult.
  • Unoptimized cloud: Private markets push for growth, and so cloud implementations can be inefficient by orders of magnitude. Waiting for when growth slows to correct this — when margins are more important — is rarely trivial.

I’ve run into many companies that moved their monolithic application into Kubernetes, and during that phase they experienced increased efficiency. Fairly quickly, however, the cost of their cloud infrastructure started skyrocketing. Developers started spinning up instances not necessarily for the right reasons: they did so simply because it was significantly easier.

Take Ownership of Your Workload

In automation, there tends to be too much emphasis on infrastructure automation and almost no focus on the automation of the service itself. Based on my experience there is more room for optimization at the service layer than on the infrastructure layer.

Decouple the Workload from the Infrastructure Choice

To achieve optimization at the service layer we need to be able to decouple the service from the choice of infrastructure. In this way, we can allow for better flexibility on choosing the right infrastructure or cloud for the job, and we also leave enough room for future incremental optimization as we grow.

Kubernetes, Terraform, and Ansible Are Not Enough

Kubernetes, Terraform, and Ansible are great tools. They help abstract away and simplify infrastructure management. But they’re simply insufficient:

  • Managing infrastructure and the services atop that infrastructure are two different things. This is especially true when you consider day 2 operations such as continuous updates.
  • Managing distributed service, multi-Kubernetes clusters, multidata centers, or multicloud is still fairly complex, and these tools offer limited help.
  • It’s easy to get lost when you have lots of templates and scripts to manage your infrastructure without having anything that maps all this back into your service.

Regaining control of our services: moving up the stack beyond IaC and Kubernetes

I argue that the biggest potential for overcoming many of these issues and regaining control over our own applications is moving up the stack, thinking of how we manage our services and not just the infrastructure that runs those services. By decoupling the service from the infrastructure we can create predefined zones of infrastructure that are highly optimized to serve each workload (test, production, ML, networking etc). These optimized zones don’t have to live outside of the cloud, as there remains ample room for optimization even within the same cloud — and obviously between clouds. In that context, moving off the cloud becomes another private case of those optimized infrastructure zones. I refer to this as Environment-as-a-Service (EaaS).

The following example illustrates how these ideas can be mapped into a real-world example. In this case, we see how to run the same workload on two different infrastructure stacks: one optimized for production and the other for development. This idea can be similarly applied to other areas.

👁 Image

Frustrated? You are not alone. But there’s hope.

The trillion-dollar paradox need not be a value-destroying trap for successful software companies. By focusing further up the stack and matching services to the right infrastructure choices, incentivizing optimizing behaviors, automating thoughtfully (not reflexively), and having a repatriation strategy before you reach scale, you can be better positioned to reign in costs and retain value for you shareholders.

The cycle of cloud migration, automation, and cost optimization are ongoing processes that require continuous iteration, overcoming and learning from failure, and above all, teamwork. There are many tools that can help you achieve this goal, but at the end of the day, without the right discipline and partners, they can turn against you. As historian Yuval Noah Harari remarked, “A knife can be used to cut vegetables and make great food but it can also be used to kill people: it all depends on how you use it.”

As a start, we need to reset our expectations to solve the paradox. There are options today that allow you to simplify your journey, as noted above. We must start thinking further up the stack, further up the value chain and focus on the service itself rather than the infrastructure and see how we match the right infrastructure to the service and not the other way around.

TRENDING STORIES
Nati Shalom is CTO and Founder at Cloudify. He is a serial entrepreneur and thought leader in open source, multicloud orchestration, network virtualization, DevOps, edge computing, and more. Nati has received multiple recognitions from publications such as The CIO Magazine...
Read more from Nati Shalom
SHARE THIS STORY
TRENDING STORIES
Amazon Web Services and MongoDB are sponsors of The New Stack.
TNS owner Insight Partners is an investor in: Databricks.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.