VOOZH about

URL: https://thenewstack.io/the-big-data-debate-batch-processing-vs-streaming-processing/

⇱ The Big Data Debate: Batch Versus Stream Processing - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2018-06-25 06:00:53
The Big Data Debate: Batch Versus Stream Processing
contributed,

The Big Data Debate: Batch Versus Stream Processing

Jun 25th, 2018 6:00am by Mark Balkenende
👁 Featued image for: The Big Data Debate: Batch Versus Stream Processing
Feature image via Pixabay.
Mark Balkenende, Director of Technical Product Marketing
Mark Balkenende is a Sales Solution Architects Manager at Talend. Prior to joining Talend, Mark has had a long career of mastering and integrating data at a number of companies, including Motorola, Abbott Labs and Walgreens. Mark holds an Information Systems Management degree and is also an extreme cycling enthusiast.

While data is the new currency in today’s digital economy, it’s still a struggle to keep pace with the changes in enterprise data and the growing business demands for information. That’s why companies are liberating data from legacy infrastructures by moving over to the cloud to scale data-driven decision making. This ensures that their precious resource — data — is governed, trusted, managed and accessible.

While businesses can agree that cloud-based technologies are key to ensuring the data management, security, privacy and process compliance across enterprises, there’s still an interesting debate on how to get data processed faster — batch vs. streaming processing.

Each approach has its pros and cons, but your choice of batch or streaming all comes down to your business use case. Let’s dive deep into the debate to see exactly which use cases require the use of batch vs. streaming processing.

What’s the Difference Between Batch and Streaming Processing?

A batch is a collection of data points that have been grouped together within a specific time interval. Another term often used for this is a window of data. Streaming processing deals with continuous data and is key to turning big data into fast data. Both models are valuable and each can be used to address different use cases. And to make it even more confusing you can do windows of batch in streaming often referred to as micro-batches.

While the batch processing model requires a set of data collected over time, streaming processing requires data to be fed into an analytics tool, often in micro-batches, and in real-time. Batch processing is often used when dealing with large volumes of data or data sources from legacy systems, where it’s not feasible to deliver data in streams. Batch data also by definition requires all the data needed for the batch to be loaded to some type of storage, a database or file system to then be processed. At times, IT teams may be idly sitting around and waiting for all the data to be loaded before starting the analysis phase.

Data streams can also be involved in processing large quantities of data, but batch works best when you don’t need real-time analytics. Because stream processing is in charge of processing data in motion and providing analytics results quickly, it generates near-instant results using platforms like Apache Spark and Apache Beam. For example, Talend’s recently announced Talend Data Streams, is a free, Amazon marketplace application, powered by Apache Beam, that simplifies and accelerates ingestion of massive volumes and wide varieties of real-time data.

Is One Better Than the Other?

Whether you are pro-batch or pro-streaming processing, both are better when working together. Although streaming processing is best for use cases where time matters, and batch processing works well when all the data has been collected, it’s not a matter of which one is better than the other — it really depends on your business objective.

However, we’ve seen a big shift in companies trying to take advantage of streaming. A recent survey of more than 16,000 data professionals showed the most common challenges to data science including everything from dirty data to overall access or availability of data. Unfortunately, streaming tends to accentuate those challenges because data is in motion. Before jumping into real-time, it is key to solve those accessibility and quality data issues.

When we talk to organizations about how they collect data and accelerate time-to-innovation, they usually share that they want data in real-time, which prompts us to ask, “What does real-time mean to you?” The business use cases may vary, but real-time depends on how much time to the event creation or data creation relative to the processing time, which could be every hour, every five minutes or every millisecond.

To draw an analogy for why organizations would convert their batch data processes into streaming data processes, let’s take a look at one of my favorite beverages — beer. Imagine you just ordered a flight of beers from your favorite brewery, and they’re ready for drinking. But before you can consume the beers, perhaps you have to score them based on their hop flavor and rate each beer using online reviews. If you know you have to complete this same repetitive process on each beer, it’s going to take quite some time to get from one beer to the next. For a business, the beer translates into your pipeline data. Rather than wait until you have all the data for processing, instead you can process it in micro-batches, in seconds or milliseconds (which means you get to drink your beer flight faster!).

Why Use One Over the Other?

If you don’t have a long history working with streaming processing, you may ask, “Why can’t we just batch like we used to?” You certainly can, but if you have enormous volumes of data, it’s not a matter of when you need to pull data, but when you need to use it.

Companies view real-time data as a game changer, but it can still be a challenge to get there without the proper tools, particularly because businesses need to work with increasing volumes, varieties and types of data from numerous disparate data systems such as social media, web, mobile, sensors, the cloud, etc. At Talend, we’re seeing enterprises typically want to have more agile data processes so they can move from imagination to innovation faster and respond to competitive threats more quickly. For example, data from the sensors on a wind turbine are always-on. So, the stream of data is non-stop and flowing all the time. A typical batch approach to ingest or process this data is obsolete as there is no start or stop of the data. This is a perfect use case where stream processing is the way to go.

The Big Data Debate

It is clear enterprises are shifting priorities toward real-time analytics and data streams to glean actionable information in real time. While outdated tools can’t cope with the speed or scale involved in analyzing data, today’s databases and streaming applications are well equipped to handle today’s business problems.

Here’s the big takeaway from the big data debate: just because you have a hammer doesn’t mean that’s the right tool for the job. Batch and streaming processing are two different models and it’s not a matter of choosing one over the other, it’s about being smart and determining which one is better for your use case.

TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.