VOOZH about

URL: https://thenewstack.io/the-2020s-will-be-about-scale-out-data/

⇱ The 2020s Will Be Defined by Scale-Out Data - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2020-04-22 17:00:33
The 2020s Will Be Defined by Scale-Out Data
op-ed,profile,
Cloud Native Ecosystem / Frontend Development

The 2020s Will Be Defined by Scale-Out Data

If the 2000s was when networking evolved, and the 2010s was all about compute, then the 2020s will see a revolution in scale-out data.
Apr 22nd, 2020 5:00pm by Richard MacManus
👁 Featued image for: The 2020s Will Be Defined by Scale-Out Data

If the 2000s was when networking evolved on the internet (I called this the read/write era, others named it ‘Web 2.0’), and the 2010s was all about the compute layer, then the 2020s will see a revolution in the data layer.

That’s according to DataStax Chief Strategy Officer Sam Ramji, who outlined his vision at last week’s The New Stack Virtual Pancake Breakfast webinar.

Specifically, Ramji was talking about how these trends “scaled out” on the internet. As The New Stack readers will know, scale-out architecture refers to adding more power to an application by adding more machines — rather than the “scale-up” approach, which relies on upgrading a machine by adding a faster CPU or more memory. Scale-out is, of course, a cornerstone of the cloud native world we now live in.


Pancake Podcast: Cassandra and the Need for a Kubernetes Data Plane

Listen to all TNS podcasts on Simplecast.

Before we dig into Ramji’s predictions, it’s worth quickly reviewing how we got here.

The 2010s saw the maturation of several major internet platforms: social, mobile and cloud. From an infrastructure perspective, cloud was by far the most important. All the big players now have substantial cloud computing infrastructures — Amazon, Google, Microsoft, Apple and Facebook. With the emergence of containers in the middle of the decade (which The New Stack founder Alex Williams was among the first to cover), a more efficient and scalable way of managing applications on the cloud was discovered.

The next revolution was the open source container orchestration platform Kubernetes, which enabled even more “scale-out” of the compute platform. Kubernetes, which evolved out of an in-house Google platform called “Borg”, has experienced rapid growth over the past couple of years. According to the most recent survey of The Cloud Native Computing Foundation (CNCF), 78% of respondents are now using Kubernetes in production — a leap from 58% last year.

Now that Kubernetes is so prevalent among cloud native companies, attention is starting to focus on the data layer. Apache Cassandra has become the open source database of choice in the cloud native world, and DataStax is among a cadre of startups offering commercial solutions on top of Cassandra.

What is Apache Cassandra? It’s a highly scalable distributed open source database, first developed at Facebook and released as an open source project in July 2008. It’s a so-called NoSQL database, a type of non-relational database “built specifically for scalable applications.” Nowadays, Cassandra is used by corporations like Netflix, Comcast, eBay, Hulu and Intuit. One of its biggest users is Apple, which runs 150,000 Cassandra instances and stores hundreds of petabytes of data.

The idea that Ramji and others are pushing is that Cassandra (the data plane) is a natural complement to Kubernetes (the control plane). Both are open source, both are distributed, and both are highly scalable. As Ramji put it in another interview, “Cassandra and Kube is like peanut butter and chocolate […] kind of a perfect pairing of data and compute for a cloud native world.”

If anyone has insight into how Kubernetes and Cassandra can be used together, it’s Ramji. He led the Kubernetes team at Google during his time there (late 2016 to mid-2018) and now he’s leading strategy for the Cassandra-focused startup DataStax.

“Apache Cassandra has got over a decade of hard-won battle-tested code improvement,” Ramji said on the Virtual Pancake webinar. So it’s ready, he believes, to be the distributed database of choice for major cloud projects.

Although it’s worth noting that Cassandra will need to be further adapted to scale on Kubernetes, as it isn’t native to that platform. To that end, DataStax launched its open source Kubernetes operator last month. An “operator” is a tool that makes deploying and managing an application on Kubernetes easier.

There are other Kubernetes operators for Cassandra available on the Web, not to mention plenty of competition for DataStax in the scale-out architecture market. Cockroach Labs, Redis Labs and MongoDB all have cloud native database products.

It’s interesting to ponder what future applications the pairing of Kubernetes with Cassandra (or an alternative scalable database) could lead to. Ramji is keeping an eye on artificial intelligence and machine learning apps. Now that the networking and compute layers are solved, he thinks that over the next ten years “there’s an opportunity to make data really easy, really manageable, and create a playground for apps of the future, which will include AI and ML apps.”

That’s because to create truly effective AI and ML apps, you need a database that can scale aggressively.

“You look at the kind of loads that those systems put on modern infrastructures,” Ramji said, “just doing a training set on a set of static image data, you could be looking at sustained demand of many gigabytes a second — let alone image recognition overlaid with video, audio, anything else that you might want to do.”

If you add Kubernetes to the mix, it’s a recipe for the future of AI and ML applications.

“So the demands on the system for raw throughput, times the ability to scale as wide as you might scale a cloud infrastructure like Kubernetes,” said Ramji, “kind of does give us a little peek ahead of time, right? What’s the old, most excellent quote: the future is already here, it’s just unevenly distributed.”

Look no further than Google for an example of AI and ML apps built on cloud native technology. In Google’s case, the control plane was Borg (the mother of Kubernetes) and the data plane was its own massively scalable database management system, Google Cloud Spanner.

“So when you look at why Google was able to build a modern AI and machine learning business,” said Ramji, referring to Google search, Ads, Gmail and other products, “it was because it had this Borg control plane, and you had Spanner as your data plane. So the marriage of those two things made compute and data so universally addressable, so easy to access, that you could do just about anything that you could imagine.”

The intriguing thing is what tens of thousands of other businesses and startups could do with the same technology (only this time open source). In other words, there’s a good chance the leading AI and ML apps of the coming decade will be built on Kubernetes and Cassandra.

DataStax, CNCF, Redis Labs and MongoDB are sponsors of The New Stack.

Image via Pixabay.

TRENDING STORIES
Richard MacManus is a Senior Editor at The New Stack and writes about web and application development trends. Previously he founded ReadWriteWeb in 2003 and built it into one of the world’s most influential technology news sites. From the early...
Read more from Richard MacManus
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.