VOOZH about

URL: https://thenewstack.io/store-more-pay-less-welcome-to-kafka-tiered-storage/

⇱ Kafka Tiered Storage: Store More, But Pay Less - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-05-07 14:00:04
Kafka Tiered Storage: Store More, But Pay Less
sponsor-instaclustr,sponsored-post-contributed,
Data Streaming / Storage

Kafka Tiered Storage: Store More, But Pay Less

Tiered Storage transforms the way to tap Kafka at scale, enabling new use cases while simplifying operations and ensuring longer-term data consistency.
May 7th, 2025 2:00pm by Anil Inamdar
👁 Featued image for: Kafka Tiered Storage: Store More, But Pay Less
Image from Sabina Akter on Shutterstock.
Instaclustr sponsored this post.

Longtime Kafka users are familiar with a fork in the proverbial road as their applications scale and data accumulates and accumulates and accumulates. Data storage isn’t free, and eventually, the moment of truth arrives when you have to decide to either hang on to all your historical data or keep your storage costs realistic.

The arrival of Kafka’s Tiered Storage eliminates that dilemma with a third option: Why not both?

With Tiered Storage, the popular open source distributed event streaming platform now lets you automatically split data into two tiers: one that delivers high performance by storing recent and crucial hot data locally and another that places historical data in low-cost cloud object storage.

Tiered Storage transforms the way organizations can tap Kafka at scale, enabling new use cases while simplifying operations and ensuring longer-term data consistency. Here’s how it works and why it’s a game-changer for every data-hungry Kafka deployment.

Cut Costs While Saving Business-Valuable Data

Kafka Tiered Storage preserves the platform’s core semantics and APIs, allowing existing applications and their Kafka producers and consumers to function without modification. The architecture functions as a write-through cache. Data initially lands on local storage before being asynchronously copied to remote storage once segments close. Consumers seamlessly read from either local or remote storage as needed, with the underlying complexity completely abstracted away.

As organizations grow, their data accumulation accelerates and eventually reaches a point where simply expanding broker storage becomes financially unsustainable. Cloud object storage costs a fraction of high-performance SSDs, making the economic case for tiered storage immediately compelling to financial stakeholders (in other words, Kafka Tiered Storage is going to make your CFO happy). Meanwhile, technical teams gain powerful new capabilities for historical data analysis and reprocessing that were previously cost-prohibitive.

Building a Better Time Machine

While Kafka has always enabled enterprises to “time travel” through their data streams, unlocking critical insights and capabilities, the high cost of historical data retention has severely limited this functionality’s scope until now.

Kafka Tiered Storage makes extended time travel across years of historical data economically viable, opening up transformative opportunities. Teams can now train machine learning (ML) models on complete historical data sets rather than samples, execute seamless migrations to new sink systems and perform comprehensive compliance auditing across all past transactions.

This functionality also helps modernize application development practices. Engineering teams can address bugs by reverting to the exact state before they were introduced, even months after the fact. Applications can undergo thorough A/B testing using parallel processing pipelines against historical data.

Time-shifted operations, such as running accurate what-if simulations with historical operational data, have now become practical use cases. Even disaster recovery strategies evolve as organizations replace expensive hot infrastructure duplicates with far more affordable cold data storage that can be rapidly deployed on new Kafka clusters when needed.

Managing Tiered Performance and Right-Sizing Capacity

Tiered storage means maintaining seamless high-performance access to mission-critical data. That said, a few smart adjustments can optimize performance when accessing historical data in cold cloud storage as well.

Retention policies should be a microcosm of your tiered storage strategy, keeping often-accessed data locally and using remote storage for less commonly needed data. That remote copying occurs asynchronously, meaning that Kafka producers will function the same as always. However, you should increase cluster CPU and network resources by around 10% to better perform those tiering operations.

Kafka Tiered Storage also changes the equation when planning how much capacity to make available. According to NetApp Instaclustr’s benchmark data, reads from hot local storage are often two or three times as fast as from remote cloud storage, with up to 20x degradation with small segment sizes. To maintain the correct capacity, separate workloads and determine your producer input rate, consumer patterns and data to store locally.

Looking at access patterns, rather than total volume, will help to right-size local retention. Size topics to best serve the parallel processing required for performant access to cloud-stored data, keeping in mind that the partition count greatly affects read performance. Increasing partitions for those topics processing historical data will increase cold storage throughput by enabling more consumers to read data concurrently. If you want to go deep into Kafka Tiered Storage sizing, my Instaclustr colleague Paul Brebner has you covered.

Kafka’s Evolutionary Leap Forward

Kafka Tiered Storage represents the first step in Kafka’s evolution toward invisible infrastructure, freeing development teams from storage management concerns to focus on business logic and application development. By automating the complex decisions around data retention and placement, Tiered Storage allows enterprises to concentrate on extracting value from their data rather than managing its underlying infrastructure.

Future Kafka releases will likely continue this trajectory, further automating operations while optimizing to meet organizations’ increasingly sophisticated data management requirements and rapidly scaling demands.

Instaclustr delivers reliability at scale through an integrated data platform of open source technologies such as Apache Cassandra®, Apache Kafka®, Apache SparkTM, ElasticsearchTM, RedisTM, Apache ZooKeeperTM, and PostgreSQL®.
Learn More
The latest from Instaclustr
TRENDING STORIES
Anil Inamdar is the global head of Data Services at NetApp Instaclustr, which provides a managed platform around open source data technologies including Cassandra, Kafka, Postgres, ClickHouse and OpenSearch. Anil has more than 20 years of experience in data and...
Read more from Anil Inamdar
Instaclustr sponsored this post.
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.