VOOZH about

URL: https://thenewstack.io/stream-data-across-multiple-regions-and-clouds-with-kafka/

⇱ Stream Data Across Multiple Regions and Clouds with Kafka - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-06-21 09:31:26
Stream Data Across Multiple Regions and Clouds with Kafka
sponsor-confluent,sponsored-post-contributed,
Data / Open Source / Operations

Stream Data Across Multiple Regions and Clouds with Kafka

A look at various architectures and use cases for multiple Kafka clusters, all with trade-offs in effort, cost and risks. Make sure you understand them.
Jun 21st, 2023 9:31am by Kai Waehner
👁 Featued image for: Stream Data Across Multiple Regions and Clouds with Kafka
Confluent sponsored this post.

Multicluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception as businesses aim for uptime and reliability. In this article, I’ll dive into several scenarios that may require multicluster solutions and showcase real-world examples with their specific requirements and trade-offs, including disaster recovery, aggregation for analytics, cloud migration, mission-critical stretched deployments and global Kafka.

Apache Kafka is a distributed data streaming platform that handles failures, like issues with a disk or network, automatically to avoid downtime or data loss. Nevertheless, Kafka is often deployed across data centers or clouds to survive the outage of one data center. Let’s explore the use cases, each with its trade-offs and concrete real-world examples.

Confluent, founded by the original creators of Apache Kafka, pioneered a complete data streaming platform that streams, connects, processes, and governs data as it flows throughout a business. With Confluent, any organization can modernize their business and run it in real-time.
Learn More
The latest from Confluent

1. Disaster Recovery Between Regions

Critical business transactions require failover and recovery in the case of a disaster such as the outage of a data center or cloud region. Data is replicated in real time between two independent Kafka clusters in separate data centers, cloud regions or even two cloud providers. Active-active and active-passive architectures are possible. Usually, applications switch to another cluster if a disaster strikes. Business continuity is ensured.

The biggest trade-off is that the replication between the clusters happens asynchronously. Hence, a few messages might be lost. If you need zero data loss, there is a more advanced (and complex) option: stretched clusters.

👁 Image

👁 Image
2. Stretched Clusters for Zero Downtime and Data Loss

A stretched Kafka cluster operates as a single deployment across different data centers or cloud regions. The benefits are zero downtime and zero data loss even in the case of disaster. This architecture is compliant with the most challenging legal and business requirements.

However, there are significant disadvantages and requirements to using this architecture, so I’d only recommend it if there is no other way:

  • Very good and stable latency is required between the regions
  • Operation is much more complex than a local Kafka cluster in a single region
  • Additional features like choosing which data to replicate synchronously (such as critical payment data) vs. asynchronously (like non-critical log data) is usually needed and only available in commercial platforms.

3. Hybrid Integration Between Data Center and Public Cloud

The Kafka cluster in the data center connects to existing legacy applications like a database, mainframe or on-premise ERP system. The Kafka cluster in the cloud connects to SaaS offerings, cloud native microservices, analytics platforms, etc.

With true decoupling and automatic backpressure handling, Kafka acts not just a messaging, but also as an event store.

Replication with Kafka between two or more Kafka clusters is set up via a Kafka-native replication tool as the single source of truth. This creates many benefits:

  • Avoiding a spaghetti architecture with many point-to-point integrations
  • The heart of the integration is real-time, reliable, and scalable
  • Guaranteed ordering even across the data center and cloud

Real-world example: Siemens AG (Berlin and Munich) is a global technology powerhouse that has stood for engineering excellence, innovation, quality, reliability and internationality for more than 170 years. Siemens connected its SAP system to Kafka on-premises. It improved the business processes and integration workflow from daily or weekly batches to real-time communication by optimizing the SAP integration. Siemens later migrated from self-managed on-premises Kafka to Confluent Cloud via Confluent Replicator. Integrating Salesforce via Kafka Connect was the first step of Siemens’ cloud strategy. More and more projects and applications join the data streaming journey as it is easy to tap into the event stream and connect it to other tools, APIs and SaaS products after the initial streaming pipeline is built.

4. Edge Computing and Aggregation in the Data Center or Public Cloud

Each edge site (retail store, factory, etc.) operates a small Kafka cluster (sometimes just a single node without high availability) for edge operations like pre-processing and filtering or advanced analytics with stream processing. The curated data is ingested into the large Kafka cluster in the data center or cloud where the integration with the rest of the IT infrastructure runs, like the data warehouse and the data lake.

This architecture has a few benefits compared to choosing distinct technology at the edge:

  • The core is real time, scalable, and reliable even end-to-end across edge sites and the cloud
  • Using the same technology, API, development tools and vendor for edge and cloud deployments and the replication. This usually enables better end-to-end service-level agreements (SLAs), cost-efficiency and time to market.
  • Disconnected and air-gapped environments can be used for safety- or privacy-critical use cases while analytics operates in elastic and more flexible cloud infrastructure

👁 Image

Real-world example: A major cruise line implemented one of the most famous use cases for Kafka at the edge. Each cruise ship has a Kafka cluster running locally because of bad and costly connectivity to the internet. Use cases include payment processing, loyalty information, customer recommendations, etc. When back in the harbor with a stable internet connection for a few hours, relevant data is replicated to a large Kafka cluster for big data analytics and other use cases.

5. Migration from Self-Managed Kafka to a Fully Managed Cloud Service

While many multicluster Kafka deployments run long-term for hybrid integrations or disaster recovery, some use cases only require two clusters for a planned infrastructure migration with a final cutover. Two common scenarios are the migration from open source Kafka to a commercial vendor or cloud service, or the move from on-premise infrastructure to the public cloud.

👁 Image

6. Multiple Kafka Clusters are the Norm, not an Exception!

This article showed various architectures and use cases for multiple Kafka clusters. All alternatives have trade-offs regarding efforts, cost and risks. Make sure to begin the evaluation with the requirements for your service-level agreements (downtime, data loss, compliance, security) before digging deeper into the potential deployment options. Many projects are multiyear journeys. Kafka allows you to connect legacy and cloud native applications with any kind of protocol (Kafka, message queue, file, database, etc.) or communication paradigm (real-time, batch, request-response) and progress at your own pace. The heart of the infrastructure and data replication is real time, scalable and reliable.

Many tools exist on the market for the replication between Kafka clusters: Open source MirrorMaker 2 is part of open source Apache Kafka. More advanced commercial tools bring benefits. For instance, Confluent Cluster Linking leverages the native Kafka protocol for the replication. This makes operations much easier and less costly, and provides more capabilities for critical scenarios like failover in case of a disaster or for common security requirements like initiating the connection from the source site.

No matter if you choose open source, a commercial platform or a cloud service, make sure to understand the trade-offs between the different architectures and products. And be aware that even the best technologies alone do not make a critical multicluster Kafka project successful. Get help from trusted experts who do similar projects on a daily basis to understand all the best practices and trade-offs.

Confluent, founded by the original creators of Apache Kafka, pioneered a complete data streaming platform that streams, connects, processes, and governs data as it flows throughout a business. With Confluent, any organization can modernize their business and run it in real-time.
Learn More
The latest from Confluent
TRENDING STORIES
Kai Waehner is field CTO at Confluent. He works with customers and partners across the globe and with internal teams like engineering and marketing. Kai’s main area of expertise lies within the fields of Data Streaming with Apache Kafka and...
Read more from Kai Waehner
Confluent sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Real, Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.