VOOZH about

URL: https://thenewstack.io/benchmarking-apache-cassandra-40-nodes-vs-scylladb-4-nodes/

⇱ Benchmarking Apache Cassandra (40 Nodes) vs. ScyllaDB (4 Nodes) - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-05-02 08:37:02
Benchmarking Apache Cassandra (40 Nodes) vs. ScyllaDB (4 Nodes)
contributed,sponsor-scylladb,sponsored,sponsored-post-contributed,
Data / Open Source / Storage

Benchmarking Apache Cassandra (40 Nodes) vs. ScyllaDB (4 Nodes)

We were confident that scaling up beats scaling out for massive workloads. So, we put it to the test.
May 2nd, 2022 8:37am by Juliusz Stasiewicz, Piotr Grabowski and Karol Baryla
👁 Featued image for: Benchmarking Apache Cassandra (40 Nodes) vs. ScyllaDB (4 Nodes)
Image by Adrian Malec from Pixabay 
ScyllaDB sponsored this post.
Juliusz Stasiewicz
Juliusz is a software engineer at ScyllaDB. He has also worked at a number of companies on C++ programming, including Samsung and DataStax.

Since the NoSQL revolution in database management systems kicked off over a decade ago, organizations of all sizes have benefitted from a key feature: massive scale using relatively inexpensive commodity hardware. It has enabled organizations to deploy architectures that would have been prohibitively expensive and impossible to scale with traditional relational database systems.

Commodity hardware itself has undergone a transformation over the same decade, but most modern software doesn’t take advantage of modern computing resources. Most frameworks that scale out for data-intensive applications don’t scale up. They aren’t able to take advantage of the resources offered by large nodes, such as the added CPU, memory and solid-state drives (SSDs), nor can they store large amounts of data on disk efficiently.

Piotr Grabowski
From a young age, Piotr participated in many competitive programming contests. Always striving to stay on top of the latest business and tech news, he never finished a day without checking his RSS reader. Piotr holds a bachelor’s degree in computer science from the University of Warsaw and is now pursuing a master’s degree.

Managed runtimes, like Java, are further constrained by heap size. Multithreaded code, with its locking overhead and lack of attention for non-uniform memory architecture (NUMA), imposes a significant performance penalty against modern hardware architectures.

Software’s inability to keep up with hardware advancements has led to the widespread belief that running database infrastructure on many small nodes is the optimal architecture for scaling massive workloads. The alternative, using small clusters of large nodes, is often viewed with skepticism. A few common concerns are that large nodes won’t be fully utilized, that they have a hard time streaming data when scaling out and, finally, they might have a catastrophic effect on recovery times.

Based on our experiences with ScyllaDB, a fast and scalable database that takes full advantage of modern infrastructure and networking capabilities, we were confident that scaling up beats scaling out. So, we put it to the test.

Karol Baryla
Karol is a junior software engineer at ScyllaDB. He often participates in security CTF (Capture the Flag) competitions as a member of team ‘Armia Prezesa,’ where he solves web security and reverse engineering tasks. He is currently pursuing a master’s degree in computer science at the University of Warsaw.

ScyllaDB is API-compatible with Apache Cassandra (and DynamoDB compatible too); it provides the same Cassandra Query Language (CQL) interface and queries, the same drivers, even the same on-disk SSTable format. Our core Apache Cassandra 4.0 performance benchmarking used identical three-node hardware for both ScyllaDB and Cassandra (TL;DR, ScyllaDB performed 3x to 8x better). Since ScyllaDB scales well vertically, we executed what we are calling a “4×40” test, with a large-scale setup where we used node sizes optimal for each database. Since ScyllaDB’s architecture takes full advantage of extremely large nodes, we compared a setup of four i3.metal machines (288 vCPUs in total) vs. 40 i3.4xlarge Cassandra machines (640 vCPUs in total, almost 2.5 times ScyllaDB’s resources).

In terms of CPU count, RAM volume or cluster topology, this would appear to be an apples-to-oranges comparison. You might wonder why anyone would ever consider such a test. After all, we’re comparing four machines to 40 very different machines. Due to ScyllaDB’s shard-per-core architecture and custom memory management, we know that ScyllaDB can utilize very powerful hardware. Meanwhile, Cassandra and its JVM’s garbage collectors are optimized to be heavily distributed with many smaller nodes.

The true purpose of this test was to see whether both CQL solutions could perform similarly in this duel, even with Cassandra requiring about 2.5 times more hardware, for 2.5x the cost. What’s really at stake here is a reduction in administrative burden: Could a DBA be maintaining just four servers instead of 40?

The 4 x 40 Node Setup

We set up clusters on Amazon EC2 in a single Availability Zone within us-east-2 data center. The ScyllaDB cluster consisted of four i3.metal VMs and the competing Cassandra cluster consisted of 40 i3.4xlarge VMs. Servers were initialized with clean machine images (AMIs) of Ubuntu 20.04 (Cassandra 4.0) or CentOS 7.9 (ScyllaDB 4.4).

Apart from the cluster, 15 loader machines were used to run cassandra-stress to insert data, and – later – to provide background load at CL=QUORUM to mess with the administrative operations.

ScyllaDB Cassandra Loaders
EC2 Instance type i3.metal i3.4xlarge c5n.9xlarge
Cluster size 4 40 15
Storage (total) 8x 1.9 TB NVMe in RAID0

(60.8 TB)

2x 1.9 TB NVMe in RAID0

(152 TB)

Not important for a loader (EBS-only)
Network 25 Gbps Up to 10 Gbps 50 Gbps
vCPUs (total) 72 (288) 16 (640) 36 (540)
RAM (total) 512 (2048) GiB 122 (4880) GiB 96 (1440) GiB

Once up and running, both databases were loaded with random data at RF=3 until the cluster’s total disk usage reached approximately 40 TB. This translated to 1 TB of data per Cassandra node and 10 TB of data per ScyllaDB node. After loading was done, we flushed the data and waited until the compactions finished, so we could start the actual benchmarking.

Spoiler: We found that a ScyllaDB cluster can be 10 times smaller in node count and run on a cluster 2.5x less expensively, yet maintain the equivalent performance of Cassandra 4. Here’s how it played out.

Throughput and Latencies

UPDATE Queries

The following shows the 90- and 99-percentile latencies of UPDATE queries, as measured on:

  • Four-node ScyllaDB cluster (4 x i3.metal, 288 vCPUs in total)
  • 40-node Cassandra cluster (40 x i3.4xlarge, 640 vCPUs in total)

👁 Image

The workload was uniformly distributed — every partition in the multi-terabyte dataset had an equal chance of being selected/updated.

Under low load, Cassandra slightly outperformed ScyllaDB. The reason is that ScyllaDB runs more compaction automatically when it is idle and the default scheduler tick of 0.5 milliseconds hurts the P99 latency. (Note, there is a parameter that controls this, but we wanted to provide out-of-the-box results with zero custom tuning or configuration.)

Under high load with P99 latency <10 milliseconds, ScyllaDB’s throughput on four nodes was 33% higher than Cassandra’s on 40 nodes.

SELECT Queries

The following shows the 99-percentile latencies of SELECT queries, as measured on:

  • Four-node ScyllaDB cluster (4 x i3.metal, 288 vCPUs in total)
  • 40-node Cassandra cluster (40 x i3.4xlarge, 640 vCPUs in total)

👁 Image
The workload was uniformly distributed — every partition in the multi-terabyte dataset had an equal chance of being selected/updated. Under low load Cassandra slightly outperformed ScyllaDB. Under high load with P99 latency <10 milliseconds, ScyllaDB’s throughput on four nodes was again 33% higher than Cassandra’s on 40 nodes.

Scaling up the Cluster by 25%

In this benchmark, we increase the capacity of the cluster by 25%:

  • By adding a single ScyllaDB node to the cluster (from four nodes to five)
  • By adding 10 Cassandra nodes to the cluster (from 40 nodes to 50 nodes)

👁 Image

ScyllaDB 4.4.3 Cassandra 4.0 ScyllaDB 4.4 vs. Cassandra 4.0
Add 25% capacity 1 hour, 29 minutes 16 hours, 54 minutes 11x faster

Performing Major Compaction

In this benchmark, we measure the throughput of a major compaction. To compensate for Cassandra having 10 times more nodes (each having 1/10th of the data), this benchmark measures the throughput of a single ScyllaDB node performing major compaction and the collective throughput of 10 Cassandra nodes performing major compactions concurrently.

👁 Image

Here, ScyllaDB ran on a single i3.metal machine (72 vCPUs) and competed with a 10-node cluster of Cassandra 4 (10x i3.4xlarge machines; 160 vCPUs in total). ScyllaDB can split this problem across CPU cores, which Cassandra cannot. ScyllaDB performed 32x better in this case.

ScyllaDB is engineered to deliver predictable performance at scale. It’s adopted by organizations that need ultra-low latency, even over millions of ops/sec & PBs of data. Our unique architecture leverages the power of modern infrastructure – translating to fewer nodes, less admin & lower costs.
Learn More
The latest from ScyllaDB
Hear more from our sponsor

The Bottom Line

The bottom line is that a ScyllaDB cluster can be 10 times smaller and 2.5 times less expensive, yet still outperform Cassandra 4.0 by 42%. In this use case, choosing ScyllaDB over Cassandra 4.0 would result in hundreds of thousands annual savings in hardware costs alone, without factoring in reduced administration costs or environmental impact. Scaling the cluster is 11 times faster, and ScyllaDB provides additional features, from change data capture (CDC) to cache bypass and native Prometheus support and more. That’s why teams at companies such as Discord, Expedia, Fanatics and Rakuten have switched.

For more details on how this test was configured and how to replicate it, read the complete benchmark.

ScyllaDB is engineered to deliver predictable performance at scale. It’s adopted by organizations that need ultra-low latency, even over millions of ops/sec & PBs of data. Our unique architecture leverages the power of modern infrastructure – translating to fewer nodes, less admin & lower costs.
Learn More
The latest from ScyllaDB
Hear more from our sponsor
TRENDING STORIES
Juliusz is a software engineer at ScyllaDB. He has also worked at a number of companies on C++ programming, including Samsung and DataStax.
Read more from Juliusz Stasiewicz
From a young age, Piotr participated in many competitive programming contests. Always striving to stay on top of the latest business and tech news, he never finished a day without checking his RSS reader. Piotr holds a bachelor’s degree in...
Read more from Piotr Grabowski
Karol is a junior software engineer at ScyllaDB. He often participates in security CTF (Capture the Flag) competitions as a member of team ‘Armia Prezesa,’ where he solves web security and reverse engineering tasks. He is currently pursuing a master’s...
Read more from Karol Baryla
ScyllaDB sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma, Fanatics.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.