VOOZH about

URL: https://thenewstack.io/handling-bursty-traffic-in-real-time-analytics-applications/

⇱ Handling Bursty Traffic in Real-Time Analytics Applications - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-04-07 06:13:28
Handling Bursty Traffic in Real-Time Analytics Applications
contributed,sponsor-rockset,sponsored,sponsored-post-contributed,
Data / Networking

Handling Bursty Traffic in Real-Time Analytics Applications

A database using Aggregator-Leaf-Tailer architecture separates storing and analyzing data, enabling ingestion or queries to scale up and down as needed.
Apr 7th, 2022 6:13am by Dhruba Borthakur
👁 Featued image for: Handling Bursty Traffic in Real-Time Analytics Applications
Photo by Mark Boss on Unsplash
Rockset sponsored this post.
Dhruba Borthakur
Dhruba is CTO and co-founder of Rockset and is responsible for the company's technical direction. He was an engineer on the database team at Facebook, where he was the founding engineer of the RocksDB data store. Earlier at Yahoo, he was one of the founding engineers of the Hadoop Distributed File System. He was also a contributor to the open source Apache HBase project.

Note: This post is the third in the series “Designing the Next Generation of Data Systems for Real-Time Analytics.”

Developers, data engineers and site reliability engineers may disagree on many things, but one thing they can agree on is that bursty data traffic is almost unavoidable.

It’s well documented that web retail traffic can spike 10x during Black Friday. There are many other occasions where data traffic balloons suddenly. Halloween causes consumer social media apps to be inundated with photos. Major news events can set the markets afire with electronic trades. A meme can suddenly go viral among teenagers.

In the old days of batch analytics, bursts of data traffic were easier to manage. Executives didn’t expect reports more than once a week nor dashboards to have up-to-the-minute data. Though some data sources like event streams were starting to arrive in real time, neither data nor queries were time sensitive. Databases could just buffer, ingest and query data on a regular schedule.

Moreover, analytical systems and pipelines were complementary, not mission-critical. Analytics wasn’t embedded into applications or used for day-to-day operations as it is today. Finally, you could always plan ahead for bursty traffic and overprovision your database clusters and pipelines. It was expensive, but it was safe.

Why Bursty Data Traffic Is an Issue Today

These conditions have completely flipped. Companies are rapidly transforming into digital enterprises in order to emulate disruptors such as Uber, Airbnb, Meta and others. Real-time analytics now drive their operations and bottom line, whether it is through a customer recommendation engine, an automated personalization system or an internal business observability platform. There’s no time to buffer data for leisurely ingestion. And because of the massive amounts of data involved today, overprovisioning can be financially ruinous for companies.

Many databases claim to deliver scalability on demand so that you can avoid expensive overprovisioning and keep your data-driven operations humming. Look more closely, and you’ll see these databases usually employ one of these two poor man’s solutions:

  • Manual reconfigurations. Many systems require system administrators to manually deploy new configuration files to scale up databases. Scale-up cannot be triggered automatically through a rule or API call. That creates bottlenecks and delays that are unacceptable in real time.
  • Offloading complex analytics onto data applications. Other databases claim their design provides immunity to bursty data traffic. Key-value and document databases are two good examples. Both are extremely fast at the simple tasks they are designed for — retrieving individual values or whole documents — and that speed is largely unaffected by bursts of data. However, these databases tend to sacrifice support for complex SQL queries at any scale. Instead, these database makers have offloaded complex analytics onto application code and their developers, who have neither the skills nor the time to constantly update queries as data sets evolve. This query optimization is something that all SQL databases excel at and do automatically.

Bursty data traffic also afflicts the many databases that are by default deployed in a balanced configuration or were not designed to segregate the tasks of compute and storage. Not separating ingest from queries means that they directly affect the other. Writing a large amount of data slows down your reads, and vice-versa.

This problem — potential slowdowns caused by contention between ingest and query compute — is common to many Apache Druid and Elasticsearch systems. It’s less of an issue with Snowflake, which avoids contention by scaling up both sides of the system. That’s an effective, albeit expensive, overprovisioning strategy.

Database makers have experimented with different designs to scale for bursts of data traffic without sacrificing speed, features or cost. It turns out there is a cost-effective and performant way and a costly, inefficient way.

Lambda Architecture: Too Many Compromises

A decade ago, a multitiered database architecture called Lambda began to emerge. Lambda systems try to accommodate the needs of both big data-focused data scientists as well as streaming-focused developers by separating data ingestion into two layers. One layer processes batches of historic data. Hadoop was initially used but has since been replaced by Snowflake, Redshift and other databases.

There is also a “speed” layer typically built around a stream-processing technology such as Amazon Kinesis or Spark. It provides instant views of the real-time data. The serving layer — often MongoDB, Elasticsearch or Cassandra — then delivers those results to both dashboards and users’ ad hoc queries.

When systems are created out of compromise, so are their features. Maintaining two data processing paths creates extra work for developers who must write and maintain two versions of code, as well as greater risk of data errors. Developers and data scientists also have little control over the streaming and batch data pipelines.

Finally, most of the data processing in Lambda happens as new data is written to the system. The serving layer is a simpler key-value or document lookup that does not handle complex transformations or queries. Instead, data-application developers must handle all the work of applying new transformations and modifying queries. Not very agile. With these problems and more, it’s no wonder that the calls to “kill Lambda” keep increasing year over year.

👁 Image

ALT: The Best Architecture for Bursty Traffic

There is an elegant solution to the problem of bursty data traffic.

To efficiently scale to handle bursty traffic in real time, a database would separate the functions of storing and analyzing data. Such a disaggregated architecture enables ingestion or queries to scale up and down as needed. This design also removes the bottlenecks created by compute contention, so spikes in queries don’t slow down data writes, and vice-versa. Finally, the database must be cloud native, so all scaling is automatic and hidden from developers and users. No need to overprovision in advance.

👁 Image

Such a serverless real-time architecture exists and it’s called Aggregator-Leaf-Tailer (ALT) for the way it separates the jobs of fetching, indexing and querying data.

👁 Image

Like cruise control on a car, an ALT architecture can easily maintain ingest speeds if queries suddenly spike, and vice-versa. And like a cruise control, those ingest and query speeds can independently scale upward based on application rules, not manual server reconfigurations. With both of those features, there’s no potential for contention-caused slowdowns, nor any need to overprovision your system in advance either. ALT architectures provide the best price performance for real-time analytics.

I witnessed the power of ALT firsthand at Facebook (now Meta) when I was on the team that brought the News Feed (now renamed Feed) — the updates from all of your friends — from an hourly-update schedule into real time. Similarly, when LinkedIn upgraded its real-time FollowFeed to an ALT data architecture, it boosted query speeds and data retention while slashing the number of servers needed by half. Google and other web-scale companies also use ALT. For more details, read my blog on ALT and why it beats the Lambda architecture for real-time analytics.

Companies don’t need to be overstaffed with data engineers like the ones above to deploy ALT. Rockset provides a real-time analytics database in the cloud built around the ALT architecture. Our database lets companies easily handle bursty data traffic for their real-time analytical workloads, as well as solve other key real-time issues such as mutable and out-of-order data, low-latency queries, flexible schemas and more.

If you are picking a system for serving data in real time for applications, evaluate whether it implements the ALT architecture so that it can handle bursty traffic wherever it comes from.

Rockset was founded in 2016 by experts in web-scale data management and distributed systems. The team includes engineers who created the online data and search infrastructure at Facebook, founded the Hadoop project at Yahoo, implemented the Gmail backend at Google, and built databases at Oracle.
Learn More
The latest from Rockset
TRENDING STORIES
Dhruba is CTO and co-founder of Rockset and is responsible for the company's technical direction. He was an engineer on the database team at Facebook, where he was the founding engineer of the RocksDB data store. Earlier at Yahoo, he...
Read more from Dhruba Borthakur
Rockset sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Real.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.