VOOZH about

URL: https://thenewstack.io/how-to-introduce-real-time-data-predictions-with-redpanda/

⇱ How to Introduce Real-Time Data Predictions with Redpanda - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-03-22 08:18:09
How to Introduce Real-Time Data Predictions with Redpanda
sponsor-redpanda,sponsored-post-contributed,
AI / Cloud Services / Data

How to Introduce Real-Time Data Predictions with Redpanda

In sectors that handle high volumes of data in real time, Redpanda Data Transforms can prepare data for machine learning on the fly.
Mar 22nd, 2024 8:18am by Christina Lin
👁 Featued image for: How to Introduce Real-Time Data Predictions with Redpanda
Featured image by Robert Anasch on Unsplash.
Redpanda sponsored this post.

In the world of machine learning, change is the only constant. The traditional reliance on large, batch-processed data sets is giving way to a more dynamic, real-time approach to data. This evolution is being driven by the understanding that being able to process and analyze data in real time is not just an advantage — it’s a necessity.

This is particularly true in sectors like the food delivery ecosystem, where customer expectations and business needs can switch at the drop of a hat. Here, streaming data engines emerge as key players transforming the landscape of data processing and machine learning.

The Predicament with Batch-Processed Data

Food delivery time prediction has traditionally relied on batch-processed data. This method, while somewhat effective, often leads to stale insights due to the latency between data collection and processing. The data variables typically include the delivery partner’s mode of transport, age, ratings and the crucial metric of distance between the restaurant and delivery location.

Enter Streaming Data: The Real-Time Revolution

In recent years, the food delivery industry experienced a tremendous spike in demand. This surge, partially driven by the pandemic, highlighted the painful limitations of batch-processed data models and underlined the need for real-time data processing. Real-time data processing allows immediate insights and adaptability — key components in an industry driven by time-sensitive customer expectations.

Streaming technologies like Apache Kafka bubbled up to solve the challenges created by the influx of real-time data. Kafka, known for its ability to handle high-throughput data streams, provides the backbone for real-time data ingestion and processing. However, Kafka’s architecture, while robust, often requires additional components for data transformation and processing.

Redpanda is a modern implementation of the Kafka API positioned as a more streamlined alternative to Kafka. It addresses some of Kafka’s complexities by providing a simpler setup and operational experience for developers.

For example, Redpanda Data Transforms is powered by WebAssembly (Wasm) and allows in-place data processing. This means data can be cleaned, transformed and prepared for machine learning models directly within the Redpanda broker, eliminating the need for additional data-processing layers.

Implementing Redpanda in Real-Time Predictive Models

To illustrate Redpanda’s role in machine learning (ML) applications that handle high volumes of data in real time, I’ll continue the example of a food delivery service.

👁 Redpanda architecture

Architecture of how Redpanda fits into a real-time delivery service powered by machine learning (Source: Redpanda)

In the “food delivery time” prediction model, Redpanda’s architecture involves these key components:

  • Data ingestion: This data comes from various sources and is often raw and unstructured, which presents the first challenge.
  • Instant data transformation: Once ingested, a custom-built Golang script uses Redpanda’s Wasm feature to process the data on the fly. This includes calculating the missing “distance” metric — a critical feature for this predictive model. This process exemplifies feature engineering in ML, where key data features are developed or transformed to enhance model accuracy. Redpanda’s real-time data transformation efficiency enables immediate and dynamic feature creation and modification.
  • ML model training with TensorFlow: The transformed data is then fed into an ML model built using TensorFlow I/O. TensorFlow I/O facilitates the consumption of real-time data streams, allowing the model to be continuously updated with fresh data. However, it’s important to note that initial training still requires a batch of historical data to establish a baseline.
  • Model deployment and inference: Once trained, the model is deployed for real-time inference. As new data streams in, the model dynamically adjusts its predictions, providing up-to-date delivery time estimates.
  • User-facing application: The final component is a user-facing application that uses the model’s predictions to provide customers and delivery partners with accurate, real-time delivery estimates.

Set Up the Infrastructure

The following diagram illustrates the setup process, which involves several key steps.

👁 Diagram of service infrastructure

Components of the proposed food delivery service infrastructure. (Source: Redpanda)

1. Simulate Data Streams

A Python script simulates the continuous flow of data, mimicking real-world scenarios of frequent order updates.

2. Configure the Cluster

A Redpanda cluster is set up to handle the data streams. This involves configuring the number of brokers and setting up Redpanda Console for monitoring.

3. Deploy Data Transformations

The Golang script for data transformation is deployed using Redpanda’s rpk transform deploy command. This ensures that the data transformation logic is applied uniformly across all broker nodes.

👁 Diagram of data being processed

Data is processed in the broker of the partition it is sent to, and the result is written directly into memory. (Source: Redpanda)

Initiate the Redpanda Transforms project:


Build the transform into a WebAssembly (Wasm) module and deploy it to the Redpanda cluster for execution:

Deploy the module to the Redpanda cluster. Redpanda distributes the deployed module across all brokers in the cluster. This distribution is vital for load balancing and fault tolerance. Regardless of which broker is managing a particular partition or topic, the transform logic will be available to process the data to reduce latency and increase efficiency, since there’s no need to move data across the network for processing.

4. Train the TensorFlow Model

The TensorFlow I/O model is trained using both historical batch data and real-time data streams. This hybrid approach helps ensure the model benefits from the depth of historical data while staying agile with real-time updates.

👁 Diagram of processing data

Wasm assists in preprocessing data into the desired format and prepares it for ML model training. (Source: Redpanda)

To stream data directly from Redpanda topics into a TensorFlow data set, configure the data set to ingest data from the “model data” topic on a Redpanda cluster. The main processing loop handles data in batches: It accumulates messages, and then shuffles and decodes them before using them for training. Subsequently, the model is trained for one epoch with each batch and then saved and exported.

Advantages and Future Applications

Integrating Redpanda in predictive modeling offers several advantages:

  • Reduced latency: By processing data in real time, the latency between data collection and insight generation is significantly reduced.
  • Dynamic model updates: The continuous data flow allows the model to adapt and improve over time, leading to more accurate predictions.
  • Streamlined architecture: Performing data transformations within the broker reduces the need for additional data-processing layers, simplifying the overall architecture.

This approach, while demonstrated through the example of food delivery time prediction, has far-reaching implications. It can be applied to many sectors where real-time data analysis is crucial, such as financial markets, health-care monitoring and smart city management.

Modern streaming-data engines like Redpanda aren’t just transforming the way we handle data — they’re reshaping the future of real-time ML applications. As we continue to explore and innovate, the possibilities are as vast and exciting as the data streams we seek to harness.

Redpanda is the streaming data platform for developers. Built with a native Kafka API, Redpanda eliminates complexity, maximizes performance and reduces costs. Its lean architecture gives you 10x lower latencies and up to a 6x lower cloud spend — without sacrificing reliability or durability.
Learn More
The latest from Redpanda
TRENDING STORIES
Christina Lin is the Director of Developer Advocacy at Redpanda Data where she turns innovative data streaming solutions into easily accessible content for everyone to learn from. She has 20+ years of experience in software development and has worked as...
Read more from Christina Lin
Redpanda sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Real.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.