VOOZH about

URL: https://thenewstack.io/real-time-stream-processing-apps-edge-computing-and-kafka/

⇱ Real-Time Stream Processing Apps, Edge Computing, and Kafka - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2021-12-27 07:27:03
Real-Time Stream Processing Apps, Edge Computing, and Kafka
contributed,sponsor-cox-edge,sponsored,sponsored-post-contributed,
Edge Computing / Observability

Real-Time Stream Processing Apps, Edge Computing, and Kafka

Many different use cases can benefit from Kafka's low response times and ultra-low latency, enabling real-time processing at the edge.
Dec 27th, 2021 7:27am by Joshua Bradley
👁 Featued image for: Real-Time Stream Processing Apps, Edge Computing, and Kafka
Featured image via Pixabay
Cox Edge sponsored this post.
Joshua Bradley
Josh is the director of technology and a former DevOps engineer at Cox Edge, America’s first last-mile edge cloud provider. He’s a longtime member of the Cox Communications family, starting his career there 20 years ago, and an early Kubernetes advocate.

It was 3 a.m.

In a network operations center (NOC), engineers were alert and ready to spring into action as data points flowed across their screens. Occasionally, they lifted their heads to look up at giant monitors populated with colorful dashboards.

No one spoke. After a few minutes, something flashed across the computer screens, and the dashboards lit up. The engineers yelled in excitement, and customer support reached for the hotline.

These engineers belong to a CDN company’s live-streaming command center that has been set up to monitor live OTT (over-the-top) streams produced by a major sports broadcaster. The objective for the CDN company was to alert the customer of possible live-stream failures before the customer realized that a fault occurred. The dashboard alerts showed the occurrence of TCP (Transmission Control Protocol) retransmits, and the sports broadcaster was thrilled that it was alerted a full minute before its own in-house network operations center received the warning.

It might seem magical that real-time alerts like these are even possible for large-scale online events, but for companies using the power of edge computing combined with Apache Kafka, it’s just another routine monitoring event.

So, What Is Apache Kafka?

Apache Kafka is an open source distributed event-streaming platform that enables real-time stream processing applications at the edge. Applications in industrial IoT, telecommunications, healthcare, automotive and other use cases can benefit from Kafka’s low response times and ultra-low latency.

Working with Kafka requires an understanding of its architecture as well as its terminology. Central to Kafka’s architecture is the Kafka broker, generally configured as a cluster of servers. A Kafka broker receives and stores data from producers. Producers are sources of data that send data to a Kafka broker. Data sources could be connected cars, IoT devices in factories, other smart devices, patient health data and other sources.

Topics are where data is written and logically divided. These streams of related information and messages are categorized into groups. Topics are written to multiple partitions spread across numerous brokers to achieve parallelism, resulting in better throughput and lower latency. A retention time can be configured, at the end of which messages in a topic expire. This keeps storage space available for new incoming data.

Kafka consumers read data from one or more topics and process the stream of events producers publish. Producers and consumers are decoupled, which gives producers the freedom to publish data to topics without worrying about whether consumers are consuming them.

Slow consumers are not affected by fast producers. A consumer can come online once, consume messages from a topic and go offline again. Messages are the individual data elements written into log files spread across multiple partitions.

When we write data into a partition, we create a log of data. The log is immutable. It will remain there until its retention time expires. The log allows us to structure and track our data over time as changes occur. Producers can write to the log, and the data can be read by hundreds of different consumers at different points in the log.

👁 Log structure data flow

Each data point recorded in the log is referred to as an event. What distinguishes Kafka as an event-streaming platform is the continuous delivery of these events by producers and consumers’ continuous processing of these events.

Consumer Parallelism

On the consumer side, multiple consumers read messages from various partitions across brokers. So instead of reading messages serially, various messages are read at the same time.

This leads to a scalable data pipeline. The parallelism at the producer and consumer ends allows us to scale the system quickly and efficiently, providing the high throughput and low latency environment required by real-time streaming applications.

Kafka brokers use Zookeeper to manage cluster membership and to elect a cluster controller.

A resilient Kafka deployment at the edge will consist of a minimum of three Kafka brokers, each running on a separate server, and a Zookeeper ensemble of three members.

This allows us to replicate each Kafka partition at least three times and have a cluster that will survive the failure of two nodes without data loss.

Apache Kafka at the Edge

The architecture for Kafka at the edge can be thought of as a three-layer architecture.

👁 Kafka edge architecture diagram

  • The client layer: consists of IoT devices and the applications running at the customer location (retail stores, coffee shops, hospitals, etc.). IoT devices generate and send data to the Kafka edge layer.
  • The edge layer: A small cluster of Kafka servers deployed at the customer’s site or in edge data centers processes data sent to it by IoT devices and returns it to applications that use it for monitoring and reporting.
  • The cloud Layer: Larger Kafka clusters residing at the cloud layer receive data from the edge layer and combine it with data from other customer sites for further aggregation and analysis.

Let’s look at an example of how Kafka is used in real-time stream processing applications at the edge.

Monitoring and Alerting for Live OTT Streaming

CDN providers can leverage their base CDN offering to provide their customers with deep visibility, 24\7 support and mitigation for live OTT streams.

A fraction of the live streaming data (~10%) is sent over a special-purpose network of servers, spread across edge data centers for monitoring and alerting on stream failures like TCP retransmits, buffering, latency issues and other failures.

Here’s a high-level architectural diagram:

👁 High-level architectural diagram

Producers send live event data to Kafka clusters that store this data in multiple topics across several partitions distributed across multiple Kafka brokers.

Real-time stream processing applications like Spark Streaming divide the incoming streams into micro-batches of specified intervals and process them to return a DStream (discretized stream). A DStream is represented by an underlying set of RDDs (resilient distributed datasets).

Each RDD in a DStream contains data from a specified interval.

👁 DStream diagram

DStreams are processed, and the results can be pushed to NoSQL databases for further processing, aggregation and reporting. In this case, Spark streaming calculates aggregates and stores the results in an Apache Cassandra database. An example of an aggregate would be “the total number of retransmits within the last 10 seconds.”

Applications query the Cassandra database and generate alerts if the results of those queries confirm that the metrics being monitored have breached predefined thresholds. Sending live streams to edge data center servers rather than centralized cloud servers can reduce the latency and response times required by real-time monitoring and alerting systems like this.

Cox is the largest private telecom company in America, connecting 6.5 million homes and businesses nationwide. Cox is investing in edge capabilities and working with developers and engineers across the ecosystem to deliver low-latency performance compute services.
Learn More
The latest from Cox Edge

Note: Applications intolerant of latencies in the seconds range can use Kafka Streams in Spark Streaming.

Edge computing combined with Kafka can improve processing speed, lower network costs and enable developers to build large-scale real-time stream-processing systems with zero data loss.

Cox is the largest private telecom company in America, connecting 6.5 million homes and businesses nationwide. Cox is investing in edge capabilities and working with developers and engineers across the ecosystem to deliver low-latency performance compute services.
Learn More
The latest from Cox Edge
TRENDING STORIES
Josh is the director of technology and a former DevOps engineer at Cox Edge, America’s first last-mile edge cloud provider. He’s a longtime member of the Cox Communications family, starting his career there 20 years ago, and an early Kubernetes...
Read more from Joshua Bradley
Cox Edge sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Real, Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.