VOOZH about

URL: https://thenewstack.io/confluent-wants-to-make-batch-processing-a-thing-of-the-past/

⇱ Confluent Wants to Make Batch Processing a Thing of the Past - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-10-03 06:39:35
Confluent Wants to Make Batch Processing a Thing of the Past
sponsor-celerdata,sponsored-topic,
Operations

Confluent Wants to Make Batch Processing a Thing of the Past

And Apache Flink is Confluent's secret weapon to get developers on board.
Oct 3rd, 2023 6:39am by Joab Jackson
👁 Featued image for: Confluent Wants to Make Batch Processing a Thing of the Past
Confluent CEO and co-founder Jay Kreps explaining his vision to reporters, Current23.

For Jay Kreps, Confluent CEO and one of the creators of the Apache Kafka distributed event streaming platform, all business work done on computers should be executed in real-time.

A decade ago, as a staff engineer at LinkedIn, Kreps saw, in his words, “a fundamental disconnect” between how businesses operated and the computing systems that supported these businesses.

At the social media service, all the data was generated all the time, and this was true of pretty much every business Kreps knew of. “Business is fundamentally an activity happening. It’s continuous and happens all day, throughout the day,” he recalled.

Yet, most enterprise data processing still happens in batch processes. Routines that were set to execute once a day, or at some other scheduled time (mostly at night when there was an excess of computing power). Dashboards showed last week’s data (and still do); Bank customers had to wait three days for a money transfer to come through. Databases captured data, which could then be queried at some point later.

“Batch processing logic, I think, is probably on the decline…”
— Jay Kreps

In effect, the Kafka project, founded at LinkedIn, was about bypassing batch-based systems. It provides a  high-throughput, low-latency platform for handling real-time data feeds.

Today, Kafka serves as the basis for operations for many webscale companies including Uber and Lyft, PayPal, Twitter and Netflix, all of whom interact with their customer bases in real-time. Kreps went on to co-found Confluent, which offers an enterprise-supported version of the software.

But data steaming data today is still largely seen as a niche, albeit an important one. Kreps wants to see it everywhere, rendering batch processing a relic of the past. And this year, at the company’s annual Current user conference, held in San Jose last week, Kreps introduced a new tool that he feels will win over many more converts to the data stream processing: Apache Flink.

CelerData helps enterprises accelerate business growth with a unified analytics platform that delivers 3X the performance of any other solution on the market while reducing operating costs by up to 80%. Powered by StarRocks, CelerData is used worldwide by leading brands including Airbnb and Lenovo.
Learn More
The latest from CelerData

What Is Wrong with Data Streaming?

In his Current keynote, Kreps went over the reasons most people consider data streaming a niche technology. Stream processing is too difficult for developers to interact with — it is not as expressive. It does not scale well, it loses data and is not as efficient as batch processing.

Kreps shared his vision of how ubiquitous data processing would take care of all these issues.

To date, most organizations keep two sets of data processing systems, one for handling events that have already taken place, last week or last month or whenever, and those that act on data as it arrives. Most real-time data systems do not also work on historical data, hence the need for at least two separate analysis and processing systems.

But this does not have to be the case, Kreps argued. A data streaming system can be built such that it works on both historical data through batch query as well as new data as it comes in. As an aside, he argued that even most batch processing systems, such as data warehouses process data through limited stream processes.

There has been a lot of work around building fault-tolerant models for processing in parallel, which makes systems processing systems such as Kafka scalable and transactional.

With Kafka as the streaming hub, everything else can be plugged into it to query and process the data. In effect, it can serve as the database layer for the streaming environment. It was databases that served as the basis for most all enterprise applications.

This is where Apache Flink comes in. It can provide a unified interface for developers to write from, initially using a language everyone is familiar with, SQL. It can be used for event-driven applications, streaming analytics or streaming data pipelines, scaling up to whatever the size of the job needed.

👁 Image

Enter Flink

In January, Confluent acquired Immerok, a startup with expertise in maintaining a cloud native, full-managed Apache Flink. This led to last week’s launch of Flink on the Confluent Cloud.

In the long run, Flink will be every bit as important to Confluent as Kafka itself, Kreps said, in a roundtable interview. What Flink does is give developers tools to query the streaming data in the same way they would interrogate a database or data store itself — through SQL and, next year, Python or Java.

What attracted Confluent to Flink was that it was not a separate system for working only with streaming data. In fact, you can use Flink to build business logic from either streaming or batch data. You use the same tools. Unlike other stream processing tools, Flink treats batch, or bounded, data, the same way it treats data streaming, or unbounded data. 👁 Image

“Whether you need to do batch processing or stream processing, you can write your code once and run it with both execution models,” further explained Confluent software practice lead David Anderson, in a technical session at the conference.

👁 Image

Confluent’s David Anderson.

Not only does this lower the barrier to entry for those who want to try data streaming, but it ultimately would streamline the business processes by reducing the number of tools needed to a single set.

“If you look at reality, companies are quite federated across different technologies. How do you put that together and make it all feel like one experience to the customers?” Kreps asked.

Evidently, Confluent is not the only company thinking about this. Amazon Web Services has also put together a data streaming package that relies on Flink and Kafka.

Another advantage of Flink is its easy scalability, Anderson pointed out. The way the APIs are organized, the developer does not need to worry about managing multiple concurrent threads at once.

“Flink has the runtime architecture needed to achieve really high scale and really high performance. It scales all the way from simple applications, ingesting 1000s of events per day, up to really huge applications ingesting billions of events per second,” Anderson said. “So it’s fault-tolerant, and provides high availability.”

It can handle both stateless applications and stateful ones as well, with the state being saved either in local storage or on a small fast data key-value store such as RocksDB.

Flink can be used in cases where data is being ingested into another system but requires some enhancement. This could be for dashboards, or even for real-time observability and metering. It is also used as the basis of event-driven programming, where one action can trigger another.

In a customer panel with the press, two engineers from website-building giant WixAvi Perez, head of backend engineering; and Natan Silnitsky, backend infrastructure tech lead — shared why it was essential to run its event-driven architecture on a streaming platform. The company hosts at least 200 million active websites.

Every time a user hits a button on the site, perhaps to spin up a new service, it triggers hundreds if not thousands of events to deliver that. There is simply no way such an architecture could run, at least not responsively, if it were built on a standard database alone, Silnitsky noted.

CelerData helps enterprises accelerate business growth with a unified analytics platform that delivers 3X the performance of any other solution on the market while reducing operating costs by up to 80%. Powered by StarRocks, CelerData is used worldwide by leading brands including Airbnb and Lenovo.
Learn More
The latest from CelerData
TRENDING STORIES
Joab Jackson is a senior editor for The New Stack, covering cloud native computing and system operations. He has reported on IT infrastructure and development for over 30 years, including stints at IDG and Government Computer News. Before that, he...
Read more from Joab Jackson
SHARE THIS STORY
TRENDING STORIES
AWS and Confluent are sponsors of The New Stack.
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.