VOOZH about

URL: https://thenewstack.io/data-transformations-apache-flink-vs-redpanda-data-transforms/

⇱ Data Transformations: Apache Flink vs. Redpanda Data Transforms  - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-03-29 11:04:56
Data Transformations: Apache Flink vs. Redpanda Data Transforms 
sponsor-redpanda,sponsored-post-contributed,
Data / Software Development

Data Transformations: Apache Flink vs. Redpanda Data Transforms 

Let’s compare the two data transformation options across their best features, along with potential use cases and benefits.
Mar 29th, 2024 11:04am by Dunith Danushka
👁 Featued image for: Data Transformations: Apache Flink vs. Redpanda Data Transforms 
Image from SkillUp on Shutterstock.
Redpanda sponsored this post.

Transforming a data stream usually requires an external consumer, like a stream processor or an event-driven microservice. This leads to the data first being transferred from the event broker to the consumer and then returned to the broker after transformation. No matter how simple or complicated the transformation is, this data ping-pong results in added latency, increased networking costs and complex stream-processing pipelines.

Here’s where in-broker data transformations step into the limelight, enabling stateless transformation functions like filtering, scrubbing and transcoding to execute within the broker itself, entirely eliminating the need to move data between external systems.

Apache Flink is a popular choice for both stateless and stateful stream processing. However, Redpanda recently announced the public beta of Data Transforms, powered by WebAssembly (Wasm), for simpler and less expensive in-broker data transformations. In brief, Redpanda Data Transforms allows developers to read data, prepare messages and make API calls without the “data ping-pong” for more cost-efficient pipelines and improved data quality for downstream consumers.

Naturally, those working with stream processing workloads are wondering what distinguishes the two technologies and when to use them. Let’s compare Redpanda Data Transforms and Apache Flink across their best features, along with potential use cases and benefits.

When to Use Apache Flink for Data Transformations

A stream processing architecture using Apache Flink. (Source: Flink Docs)

Your Operations Are Complex and Stateful

Choose Apache Flink if your transformations are stateful. A state is maintained across multiple invocations. For example, when counting the occurrences of different types of events in a topic, it must keep counters in the state and also ensure the state is fault-tolerant.

Flink excels at stateful stream processing on unbounded streams, which involves aggregations, joins, window operations and event time processing. If your transformation belongs there, Flink is your best choice.

Your Transformations Need to Access Disks and Network

Choose Flink if your transformations want to read from and write to external systems, such as databases, services and file systems.

At the time of this writing, Redpanda Data Transforms doesn’t support making network calls or accessing the local disk. This is where Flink and its rich connector ecosystem come in handy, especially if your transformation is data-intensive.

When to Use Redpanda Data Transforms

How Redpanda Data Transforms simplifies stream processing. (Source: Redpanda Blog)

Your Operations Are Trivial and Stateless

Choose Redpanda Data Transforms if your transformations are stateless. This means the processing of an event does not depend on any events seen in the past, and no history is kept. Stateless transformations are self-contained and the output is purely a function of the input data.

Redpanda Data Transforms is ideal for the following transformations:

  • Redacting sensitive PI/PII from the event payload
  • Transcoding data (for instance, JSON to Avro transformation)
  • Event filtering
  • Field-level encryption

Processing Latency Is a Concern

Choose Redpanda Data Transforms if the processing latency of transformations is critical and you absolutely need real-time processing. This is because Redpanda transformations are executed in process — near where the data resides — reducing the processing latency, as there’s less data transfer across the network.

Moreover, stateless transformations lend themselves to parallelization. Since each event can be processed independently, multiple events can be processed simultaneously across multiple processing units or threads. This parallelization can lead to significant performance improvements, especially on multicore or distributed systems.

Your Transformations Should Support Multiple Programming Languages

Choose Redpanda Data Transforms if you want to give developers the freedom to pick their preferred language for the transformations.

Each transformation is deployed inside a WebAssembly (Wasm) engine inside a Redpanda broker. Wasm allows developers to write code in languages like C/C++, Rust and other high-level languages like Python or TypeScript, and then compiles them into Wasm bytecode. This bytecode is then distributed across a Redpanda cluster and deployed. As of this writing, Redpanda supports Wasm transformations with Go and Rust. The support for C++ and JavaScript is well underway, while adding Kotlin later is planned.

Despite Flink’s support for polyglot programming with Java, Scala and Python, Wasm offers a broader range of options while providing performance, portability, security and flexibility.

You Don’t Need to Manage an Additional System

Flink is a distributed system that requires additional resource provisioning, monitoring and staffing. Sometimes using such a system for a trivial task, such as redacting a password, can be overkill.

The Verdict

Apache Flink is a great choice if most of your workloads are stateful and embody complex processing logic, such as event time semantics, state fault tolerance and large-scale aggregations. If your transformation use case is stateless, latency-sensitive and you want to operate with cost efficiency in mind, go with Redpanda Data Transforms.

For example, if you’re kicking off a streaming data project and your requirements are simple, like masking a PII field or a JSON to CSV conversion, you might not want to manage an external distributed system just for that.

If you’re still on the fence or have a different question, ask away in the Redpanda Community on Slack.

Redpanda is the streaming data platform for developers. Built with a native Kafka API, Redpanda eliminates complexity, maximizes performance and reduces costs. Its lean architecture gives you 10x lower latencies and up to a 6x lower cloud spend — without sacrificing reliability or durability.
Learn More
The latest from Redpanda
TRENDING STORIES
Dunith Danushka is a senior developer advocate at Redpanda Data where he creates developer-friendly content about building modern streaming data applications. He has over 10 years of experience designing, building and operating real-time, event-driven architectures and loves to share his...
Read more from Dunith Danushka
Redpanda sponsored this post.
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.