VOOZH about

URL: https://thenewstack.io/why-nosql-deployments-are-failing-at-scale/

⇱ Why NoSQL Deployments Are Failing at Scale - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-10-04 10:00:27
Why NoSQL Deployments Are Failing at Scale
contributed,
Cloud Native Ecosystem / Databases / Emerging technologies

Why NoSQL Deployments Are Failing at Scale

NoSQL struggles at scale, distributed SQL offers a stronger solution.
Oct 4th, 2024 10:00am by Sunny Bains
👁 Featued image for: Why NoSQL Deployments Are Failing at Scale
Photo by Leif Christoph Gottwald on Unsplash.

Why does technology become obsolete? There’s no one answer. Sometimes, it’s surpassed by something strictly better. Other times, the underlying need evolves. Technology that serves the needs of an emerging market might prove insufficient when the market matures.

That’s what many businesses are discovering about NoSQL. And it’s why so many NoSQL implementations are struggling today.

Not so long ago, in the early days of big data, Hadoop was the name on everyone’s lips. Traditional SQL-based data stores were thought to be passé. Every venture-funded startup seemed to have a NoSQL key-value store under the hood. They followed in the footsteps of tech giants like Google, Facebook, and Yahoo, who developed NoSQL technology to manage their rapid growth. It was only natural for startups to reach for the tools that had powered their predecessors’ global success.

But a curious thing happened. The startups that succeeded started tossing their NoSQL databases overboard.

Consider the trajectory of Hbase, a database distributed as part of the standard Apache Hadoop package. Modeled on Google’s famed BigTable, HBase’s popularity soared for a few years and then steadily declined.

👁 Image

Looking at the chart above, one might assume that in 2017, a new database came along to supersede HBase — maybe one that stored and accessed data faster or could address more information. But that’s not what happened. HBase still stores and retrieves the best of them. Its decline in popularity has nothing to do with its raw power. It’s about the complexity of the problems its users are trying to solve.

In the early days of SaaS and big data, startups had their hands full just keeping up with customer growth. They needed an inexpensive way to store and manage large amounts of high-velocity data. NoSQL tools like HBase filled that role admirably. But querying that data? Keeping it consistent? Those were problems for another time.

Eventually, that time arrived. When it did, it became apparent that companies built on NoSQL had a massive maintenance problem. They had trouble writing queries. Data became unreliable. New applications were harder and harder to build. NoSQL, which was so cost-effective initially, began imposing costs as the business became more complex.

At this point, many of the companies running HBase were no longer startups. They had expanded worldwide. They had created platforms others used to build businesses. They were hiring data analysts. They were thinking in terms of downtime and SLAs. They weren’t just trying to keep data anymore. They were trying to use it.

That was when NoSQL’s limitations became evident — and a real concern.

For HBase, those included:

  • Lack of transaction support: This means users get none of the ACID properties typical of a modern relational database. Data can become corrupt or logically inconsistent. The more data you have, the harder it becomes to find the problem through brute force when data quality decays.
  • Lack of a secondary index: HBase’s lack of secondary indexes means everything must be found via brute-force scan. Not a problem when you don’t need to find data. Not a problem when you have relatively small amounts of data. But when you need to find a needle in a terabyte-scale haystack, the lack of secondary indexes makes every query computationally expensive.
  • Single point of failure: HBase’s use of the HDFS file system — with its centralized NameNode directory — created dependencies that made it dangerously vulnerable to crashing.
  • Unfriendly interface: NoSQL’s lack of relational architecture is an asset when it comes to quickly storing data but a fundamental problem when it comes to querying it. NoSQL doesn’t eliminate the need for a relational schema. It just forces the burden onto the application, which is much more difficult and expensive to maintain. Altering an explicit SQL database schema with your data structure is much easier than modifying an implicit schema embedded within an application.

Over time, these fundamental issues with running NoSQL at scale became impossible to ignore. Some responded by trying to find a compromise solution. Newer NoSQL databases tried to layer structure over HBase’s key-value architecture, adding transactions with SQL or SQL-like capabilities.

As MIT’s Michael Stonebraker put it: “Despite strong protestations that SQL was terrible, by the end of the 2010s, almost every NoSQL DBMS added a SQL interface.” He adds: “Many of the remaining NoSQL DBMSs also added strongly consistent (ACID) transactions. As such, the NoSQL message has morphed from ‘Do not use SQL — it is too slow!’ to ‘Not only SQL’ (i.e., SQL is fine for some things).”

Over time, NoSQL products came to resemble their RDBMS counterparts. But essential differences remained. By definition, NoSQL solutions lack a schema. That’s both their strength and their weakness. The absence of a data schema enables fast storage and retrieval. It also makes analytics and transactions more difficult. If the schema isn’t realized within the database, it has to be instantiated in the query. If, for example, data needs to be sharded onto different servers, the change has to be reflected within the application code. Some NoSQL solutions allow a schema to be defined externally, but this approach is prone to error in practice. Schema migrations are fragile, hair-raising operations.

The difficulty of changing a database discourages new application development. It makes innovation harder, and few businesses will tolerate that for long.

Pinterest is a good example. It was an early adopter of HBase. At one point, according to a Pinterest Engineering blog post, it was running “50 clusters, 9000 AWS EC2 instances, and over 6 PBs of data” on HBase. And HBase did the job. But over time, as Pinterest grew, it decided HBase’s shortcomings outweighed its benefits. It was too light on features and cost too much to manage. As other businesses started to come to the same conclusions, it became harder and harder to find HBase-savvy engineers. Ultimately, Pinterest migrated to an open source, MySQL-compatible distributed SQL solution called TiDB. In doing so, the company improved development velocity and query latency while making performance more predictable.

That might come as a surprise to some. For years, SQL labored under the misimpression that it is inherently slower and less efficient than NoSQL. But that’s simply not the case. Advances in cloud computing and horizontal scale-out have brought recent SQL solutions much closer to raw performance parity with their NoSQL counterparts while still providing all the advantages of an RDBMS. Rather than focusing on one dimension of database functionality — storage and retrieval — distributed SQL seeks to provide high performance across a wide range of transactional and analytical use cases, making it attractive to mature businesses with complex needs and a wide variety of stakeholders.

Ironically, in moving from NoSQL to distributed SQL, Pinterest and companies like it are following in Google’s footsteps, the same way they were when they adopted NoSQL in the first place. TiDB and other distributed SQL solutions are descendants of Google Spanner. This is software Google created to solve the problems of BigTable, the technology that gave rise to HBase.

In a way, the SaaS industry simply recapitulates the journey Google and other tech giants have been on for the past two decades. Here, we have a technology (SQL/RDBMS) supposedly made obsolete by another technology (NoSQL), which is now being displaced by a more modern iteration of the technology it ousted.

Who is to say the wheel might not turn again? To cite Stonebraker one last time, “What goes around continues to come around. Another wave of developers will claim that SQL and the [relational model] are insufficient for emerging application domains. People will then propose new query languages and data models to overcome these problems.” But none, he points out, have ever seriously threatened to displace the SQL-based RDBMS.

It’s a useful reminder that over the years, the traditional relational database has proved remarkably capable of absorbing innovation, from clustering to cloud to vector search. Trends in database architecture come and go, but somehow, when the dust settles, SQL always seems to be left standing.

TRENDING STORIES
Sunny Bains is a software architect at PingCAP, the company behind TiDB. He has worked on storage engines for more than 22 years. His first acquaintance with database kernel work was in 2001, when he was tasked with writing a...
Read more from Sunny Bains
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.