VOOZH about

URL: https://thenewstack.io/real-time-write-heavy-workloads-considerations-and-tips/

⇱ Real-Time Write Heavy Workloads: Considerations and Tips - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-03-14 09:00:42
Real-Time Write Heavy Workloads: Considerations and Tips
sponsor-scylladb,sponsored-post-contributed,
CI/CD / Databases / Operations / Storage

Real-Time Write Heavy Workloads: Considerations and Tips

Take a look at the performance-related complexities that teams commonly face and options for tackling them.
Mar 14th, 2025 9:00am by Felipe Cardeneti Mendes and Lubos Kosco
👁 Featued image for: Real-Time Write Heavy Workloads: Considerations and Tips
Image from Roman Samborskyi on Shutterstock.
ScyllaDB sponsored this post.

Write-heavy database workloads bring a distinctly different set of challenges than read-heavy ones. For example:

  • Scaling writes can be costly, especially if you pay per operation, and writes are five times more costly than reads.
  • Locking can add delays and reduce throughput.
  • I/O bottlenecks can lead to write amplification and complicate crash recovery.
  • Database back pressure can throttle the incoming load.

While cost matters — quite a lot, in many cases — it’s not a topic we want to cover here. Rather, let’s focus on the performance-related complexities that teams commonly face and discuss your options for tackling them.

What Do We Mean by ‘a Real-Time Write-Heavy Workload’?

First, let’s clarify what we mean by this term. We’re talking about workloads that:

  • Ingest a large amount of data (for instance, over 50K OPS)
  • Involve more writes than reads
  • Are bound by strict latency service-level agreements (SLAs) (such as single-digit millisecond P99 latency)

In the wild, they occur across everything from online gaming to real-time stock exchanges. A few specific examples:

  • Internet of Things (IoT) workloads tend to involve small but frequent append-only writes of time series data. Here, the ingestion rate is primarily determined by the number of endpoints collecting data. Think of smart home sensors or industrial monitoring equipment constantly sending data streams to be processed and stored.
  • Logging and monitoring systems also deal with frequent data ingestion, but they don’t have a fixed ingestion rate. They may not necessarily append only, as well as may be prone to hotspots, such as when one endpoint misbehaves.
  • Online gaming platforms need to process real-time user interactions, including game state changes, player actions and messaging. The workload tends to be spiky, with sudden surges in activity. They’re extremely latency sensitive since even small delays can affect the gaming experience.
  • E-commerce and retail workloads are typically update-heavy and often involve batch processing. These systems must maintain accurate inventory levels, process customer reviews, track order status and manage shopping cart operations, which usually require reading existing data before making updates.
  • Ad tech and real-time bidding systems require split-second decisions. These systems handle complex bid processing, including impression tracking and auction results, while simultaneously monitoring user interactions such as clicks and conversions. They must also detect fraud in real time and manage sophisticated audience segmentation for targeted advertising.
  • Real-time stock exchange systems must support high-frequency trading operations, constant stock price updates and complex order matching processes — all while maintaining absolute data consistency and minimal latency.

👁 Image

Next, let’s look at key architectural and configuration considerations that affect write performance.

Storage Engine Architecture

The choice of storage engine architecture fundamentally affects write performance in databases. Two primary approaches exist: LSM trees and B-Trees.

👁 Underlying storage engine diagram

Databases known to handle writes efficiently — such as ScyllaDB, Apache Cassandra, HBase and Google BigTable — use log-structured merge trees (LSM). This architecture is ideal for handling large volumes of writes. Since writes are immediately appended to memory, this allows for very fast initial storage. Once the “memtable” in memory fills up, the recent writes are flushed to disk in sorted order. That reduces the need for random I/O.

For example, here’s what the ScyllaDB write path looks like:

👁 ScyllaDB write path

With B-tree structures, each write operation requires locating and modifying a node in the tree — and that involves both sequential and random I/O. As the data set grows, the tree can require additional nodes and rebalancing, leading to more disk I/O, which can affect performance. B-trees are generally better suited for workloads involving joins and ad-hoc queries.

Payload Size

Payload size also affects performance. With small payloads, throughput is good but CPU processing is the primary bottleneck. As the payload size increases, you get lower overall throughput and disk utilization also increases.

Ultimately, a small write usually fits in all the buffers, and everything can be processed quite quickly. That’s why it’s easy to get high throughput. For larger payloads, you need to allocate larger buffers or multiple buffers. The larger the payloads, the more resources (network and disk) are required to service those payloads.

Compression

Disk utilization is something to watch closely with a write-heavy workload. Although storage is continuously becoming cheaper, it’s still not free.

Compression can help keep things in check, so choose your compression strategy wisely. Faster compression speeds are important for write-heavy workloads, but also consider your available CPU and memory resources.

Be sure to look at the compression chunk size parameter. Compression basically splits your data into smaller blocks (or chunks) and then compresses each block separately. When tuning this setting, realize that larger chunks are better for reads while smaller ones are better for writes, and take your payload size into consideration.

Compaction

For LSM-based databases, the compaction strategy you select also influences write performance. Compaction involves merging multiple SSTables into fewer, more organized files, to optimize read performance, reclaim disk space, reduce data fragmentation and maintain overall system efficiency.

When selecting compaction strategies, you could aim for low read amplification, which makes reads as efficient as possible. Or, you could aim for low write amplification by avoiding compaction from being too aggressive. Or, you could prioritize low space amplification and have compaction purge data as efficiently as possible. For example, ScyllaDB offers several compaction strategies (and Cassandra offers similar ones):

  • Size-tiered compaction strategy (STCS): Triggered when the system has enough (four by default) similarly sized SSTables.
  • Leveled compaction strategy (LCS): The system uses small, fixed-size (by default 160 MB) SSTables distributed across different levels.
  • Incremental Compaction Strategy (ICS): Shares the same read and write amplification factors as STCS, but it fixes its 2x temporary space amplification issue by breaking huge SSTables into SSTable runs, which are comprised of a sorted set of smaller (1 GB by default), nonoverlapping SSTables.
  • Time-window compaction strategy (TWCS): Designed for time series data.

For write-heavy workloads, we warn users to avoid leveled compaction at all costs. That strategy is designed for read-heavy use cases. Using it can result in a regrettable 40x write amplification.

Batching

In databases like ScyllaDB and Cassandra, batching can actually be a bit of a trap, especially for write-heavy workloads. If you’re used to relational databases, batching might seem like a good option for handling a high volume of writes. But it can actually slow things down if it’s not done carefully. Mainly, that’s because large or unstructured batches end up creating a lot of coordination and network overhead between nodes. However, that’s really not what you want in a distributed database.

Here’s how to think about batching when you’re dealing with heavy writes:

  • Batch by the partition key: Group your writes by the partition key so the batch goes to a coordinator node that also owns the data. That way, the coordinator doesn’t have to reach out to other nodes for extra data. Instead, it just handles its own, which cuts down on unnecessary network traffic.
  • Keep batches small and targeted: Breaking up large batches into smaller ones by partition keeps things efficient. It avoids overloading the network and lets each node work on only the data it owns. You still get the benefits of batching, but without the overhead that can bog things down.
  • Stick to unlogged batches: Considering you follow the earlier points, it’s best to use unlogged batches. Logged batches add extra consistency checks, which can really slow down the write.

So, if you’re in a write-heavy situation, structure your batches carefully to avoid the delays that big, cross-node batches can introduce.

Wrapping Up

We offered quite a few warnings, but don’t worry. It was easy to compile a list of lessons learned because so many teams are extremely successful working with real-time write-heavy workloads. Now you know many of their secrets — without having to experience their mistakes. :-)

If you want to learn more, here are some firsthand perspectives from teams who tackled quite interesting write-heavy challenges:

  • Zillow: Consuming records from multiple data producers, which resulted in out-of-order writes that could result in incorrect updates.
  • Tractian: Preparing for 10X growth in high-frequency data writes from IoT devices.
  • Fanatics: Heavy write operations like handling orders, shopping carts and product updates for this online sports retailer.

Also, take a look at the following video, where we go into even greater depth on these write-heavy challenges, and also walk you through what these workloads look like on ScyllaDB.

ScyllaDB is engineered to deliver predictable performance at scale. It’s adopted by organizations that need ultra-low latency, even over millions of ops/sec & PBs of data. Our unique architecture leverages the power of modern infrastructure – translating to fewer nodes, less admin & lower costs.
Learn More
The latest from ScyllaDB
Hear more from our sponsor
TRENDING STORIES
Felipe Cardeneti Mendes is an IT specialist with years of experience on distributed systems and open source technologies. He is co-author of three Linux books and is a frequent speaker at public events and conferences to promote open source technologies....
Read more from Felipe Cardeneti Mendes
Lubos Kosco is a software engineer on the ScyllaDB Professional Services team. He helps customers get the most out of their ScyllaDB clusters. Previously, he worked in ad tech real-time bidding with Sizmek/Rocketfuel and Oracle/Sun Microsystems on products that managed...
Read more from Lubos Kosco
ScyllaDB sponsored this post.
SHARE THIS STORY
TRENDING STORIES
Google is a sponsor of The New Stack.
TNS owner Insight Partners is an investor in: Real, Fanatics.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.