VOOZH about

URL: https://thenewstack.io/historical-data-and-streaming-friends-not-foes/

⇱ Historical Data and Streaming: Friends, Not Foes - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-03-20 04:00:32
Historical Data and Streaming: Friends, Not Foes
sponsor-confluent,sponsored-post-contributed,
Data / DevOps / Storage

Historical Data and Streaming: Friends, Not Foes

The cloud's access to object storage provides near-limitless storage capacity, meaning you can keep data in a stream as long as it makes sense.
Mar 20th, 2023 4:00am by Michael Drogalis
👁 Featued image for: Historical Data and Streaming: Friends, Not Foes
Confluent sponsored this post.

Real-time event streaming has become one of the most prominent tools for software engineers over the last decade. In Stack Overflow’s 2022 Developer Survey, Apache Kafka, the de facto event-streaming platform, is ranked as one of the highest-paying tech skills and most-loved frameworks.

While obscure at its outset, there are now countless stories of companies using it at massive scale for use cases like gaming and ride-sharing, where latency must remain incredibly low. Because these examples are talked about the most, many people believe event streaming — also called data streaming — is only appropriate for use cases with demanding real-time requirements and not suitable for older, historical data. This thinking, however, is shortsighted and highlights a missed architectural opportunity.

Regardless of how fast your business needs to process data, streaming can make your software more understandable, more robust and less vulnerable to bugs — if it’s the right tool for the job. Here are three key factors to think about when you consider adding streaming to your architecture.

Confluent, founded by the original creators of Apache Kafka, pioneered a complete data streaming platform that streams, connects, processes, and governs data as it flows throughout a business. With Confluent, any organization can modernize their business and run it in real-time.
Learn More
The latest from Confluent

Factor 1: Understand Your Data’s Time/Value Curve

How valuable is your data? That’s a trick question. It depends on when the data point happened. The vast majority of data has a time/value curve. In general, data becomes less valuable the older it gets.

Now, older data hasn’t commonly been something people talk about in the same breath as streaming. Why? Until somewhat recently, most streaming platforms were created to have relatively small storage capacity. This made sense for their initial homes in bare-metal data centers but has become an unsound pattern since nearly everything has moved to the cloud. The cloud’s access to object storage provides near-limitless storage capacity.

Many streaming platforms integrate directly with those stores and carry through the same storage capacity improvements. This matters because it takes forced retention decisions out of the equation when it comes to streaming. You no longer need to decide how long you can keep data in a stream — you simply keep it as long as it makes sense.

One of the most exciting use cases for historical streams is backtesting online machine learning models. Teams often find that when they deploy a trained model to production, they need to change it in some way. But how can they be sure their new model works well? The very best outcome is to test it against all of the historical traffic, and because streaming is lossless, that is exactly what you get.

If your data’s time/value relationship makes sense, streaming is a great way to get value out of both ends of the curve.

Factor 2: Decide on the Direction of Data Flow

In the old days of software engineering, many things were written with polling — periodic checks to see if something happened. For instance, you might periodically poll a database table to see if a row was added or changed. For a lot of reasons, this is a recipe for disaster because many things can change since the last time you checked, and you won’t know what all the changes are.

Streaming’s superpower is that it forces you to think in terms of lossless, unidirectional dataflows instead of mutable, bidirectional process calls. This gives you a simple model for understanding how systems communicate, regardless of whether data is real time or historical. Instead of polling, you can listen for updates from a system and guarantee that you’ll see every change that happens in the order they occurred. To address the example above, change data capture has become the de facto solution for listening to database changes.

When you think about whether streams are useful for your problem, set aside latency and ask yourself: Does my system benefit from this kind of push model? Are lossless updates important?

Factor 3: Pick an Expiration Strategy

Unbounded, historical streams are great, but there will always come a time when it makes sense to delete your data, perhaps due to GDPR compliance or changes to your business. How do you reconcile streaming’s key primitive — an immutable log of data — with deletion, a mutable operation?

There are two common ways to address this. The first is to implement expiry policies which enable data systems to get rid of data after a certain time period, like a time-to-live (TTL). A variation on that is compaction, where a record’s historical revisions get purged after a certain timeframe.

The second is a bit more sophisticated and uses encryption. An encrypted payload is only useful if you have the decryption key. In general, deleting a payload’s encryption key is seen as a mistake, but not if you want to prevent anyone from ever seeing that data again! In some systems, intentionally deleting encryption keys, and then later deleting the actual dataset, is a simple solution to taking data offline.

It’s hard to predict the future of software, but one constant is that there will always be new technologies on the horizon. When you consider streaming for your use case, it’s important to think about these key questions: Is a push model helpful? Is ordered access to older data useful? Is there a simple way to delete old data? If the answer is yes to these questions, you’re investing in streaming technology for the right reasons, and it’s hard to go wrong when you do that.

Confluent, founded by the original creators of Apache Kafka, pioneered a complete data streaming platform that streams, connects, processes, and governs data as it flows throughout a business. With Confluent, any organization can modernize their business and run it in real-time.
Learn More
The latest from Confluent
TRENDING STORIES
Michael Drogalis is a principal technologist on the Office of the CTO (OCTO) team at Confluent, where he evaluates noteworthy concepts, trends, and technologies relevant to the data streaming category. Before joining Confluent, Michael served as the CEO of Distributed...
Read more from Michael Drogalis
Confluent sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Real, Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.