VOOZH about

URL: https://thenewstack.io/pulsar-nifi-better-together-for-messaging-streaming/

⇱ Pulsar, NiFi: Better Together for Messaging, Streaming - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-03-10 11:25:08
Pulsar, NiFi: Better Together for Messaging, Streaming
profile,
Data / Open Source

Pulsar, NiFi: Better Together for Messaging, Streaming

Cloudera and StreamNative have released a new open source integration between Apache Pulsar and Apache NiFi.
Mar 10th, 2022 11:25am by Susan Hall
👁 Featued image for: Pulsar, NiFi: Better Together for Messaging, Streaming
Feature image via Pixabay.

In a combination seemingly as natural as Reese’s marriage of peanut butter and chocolate, Cloudera and StreamNative have released a new open source integration between Apache Pulsar and Apache NiFi. The two together create a cloud native, scalable, real-time streaming data platform that can ingest, transform, and analyze massive amounts of data.

“[NiFi is] a really nice way to get data in and out of Pulsar very easily and very fast. It’s a really nice way to be able to build streaming applications very simply with low code or no code,” said Tim Spann, developer advocate at StreamNative and a longtime contributor to the NiFi project.

StreamNative was founded by the original creators of Apache Pulsar, and many of the NiFi creators had worked from the technology’s origins at the National Security Agency (NSA), through the acquisition of Onyara by Hortonworks in 2015. Cloudera bought out Hortonworks in 2018.

While there’s been an open source connector between the two for a while, it wasn’t up to date, Spann said, so he decide it was time to do something about that. The two companies working together with the open source communities got the two projects in sync and ran the integration through its paces in the test cases they had out there.

With this update, users can consume and produce messages from Pulsar topics at scale with simple configuration settings within Apache NiFi. Cloudera is making four processors available with its Cloudera Dataflow for Data Hub 7.2.14 and newer.

“Cloudera is putting it out there as the first supported processor from another company, so that’s nice to see,” Spann said. “The NiFi ecosystems growing, the Pulsar ecosystems growing. It’s nice to see that interaction and overlap between the two projects.”

Apache Pulsar

Apache Pulsar is a distributed messaging and streaming platform originally created at Yahoo! and now a top-level Apache Software Foundation project. Its claim to fame is providing scalable messaging and streaming both. While streaming systems like Apache Kafka can scale, they require a lot of work around data rebalancing, Addison Higham, chief architect at StreamNative, wrote in a blog post for The New Stack.

It uses a distributed publish-subscribe pattern designed to route messages from one endpoint to another without data loss. At its core, Pulsar uses a replicated distributed ledger to provide durable stream storage that can easily scale to retain petabytes of data, making long-term retention of event data feasible.

Pulsar makes scaling easy and provides more flexibility, Higham said in an interview.

“It’s very capable as a streaming system comparable to Kafka, so it can move large amounts of data; it can handle lots of parallelism, but it also has some advantages,” he said.

Pulsar clusters can support millions of different topics, offering organizations more flexibility in the way they use it, he said. Messages might be sent by customers, by users, for example.

“Pulsar’s model actually looks more like a messaging API, so it supports a traditional work queue. You can have as many consumers connected as you would like. And you can get higher throughput for out-of-order processing as well as the flexibility to do traditional messaging and fanout workloads, with a lot of consumers getting their own copy of the message,” he said, explaining that makes it a favored technology among marketing companies.

Rather than having a different cluster for each team, organizations can use one Pulsar cluster and with NiFi create a kind of data mesh data platform, making enriched data available to less technical users as well, Higham said.

He describes it as one technology that works across a broad range of different use cases and workloads. And at the same time, it aims to provide a lot of simplicity operationally.

“So [you have] this ability to have millions of topics without degrading performance,” he said.

Its users include Tencent, Verizon Media, Comcast and Overstock. In 2020, Splunk unveiled its Pulsar-based Splunk Data Stream Processor (DSP).

Apache NiFi

The NSA made NiFi available to the Apache Software Foundation in 2014. It became a top-level project the next year.

NiFi supports powerful and scalable directed graphs of data routing, transformation and system mediation logic.

This visual tool uses flow-based programming, enabling users to construct data flows that automate moving data from various platforms —databases, cloud-storage, messaging systems — to another, making data ingestion fast, easy and secure. It also provides event-level data provenance and traceability, allowing you to trace every piece of data back to its origin.

It takes care of dataflow-management needs including prioritization, back pressure and edge intelligence.

The NiFi platform also includes more than 100 pre-built processors that can be used to perform enrichment, routing and other transformations on the data as it flows from the source to destination.

👁 Image

Why the Combo?

NiFi is focused on making it easy to move data between software systems, rather than doing anything with it long term. Pulsar, meanwhile, was designed to act as a long-term repository of event data and provides strong integration with popular stream processing frameworks such as Flink and Spark.

With NiFi, data can be processed and transformed en route, then routed directly to Pulsar’s durable stream storage for long-term retention and made available for a host of more complex streaming processing and analytics use cases.

“NiFi is designed to do integration, and it’s really good in grabbing a lot of sources, letting you do basic your enrichment, transformation, lookup routing. Pulsar’s great for fast transportation of messaging and a lot of other things,” said Spann. “With the flexibility of Pulsar, once the Nifi messages are in there, so many other options can be used, whether it’s for streaming applications, work use, lots of different styles of messaging.

“And Pulsar also has gateways to a lot of other messaging protocols, which makes it like we’re connecting two gateways together. Once you get data into one or the other system, you can connect pretty much anywhere in the modern data stack. Regardless of what source or sink it is in, between the two of them, you have all the connections you need.”

The integration consists of four processors, two for publishing data to Pulsar — PublishPulsar and PublishPulsarRecord — and two for consuming data from Pulsar — ConsumePulsar and ConsumePulsarRecord. There are also two controller services included as well: one used for creating Pulsar clients and another for authentication to secure Pulsar clusters.

In addition to the Cloudera offering, the artifacts can be downloaded directly from the maven central repository or you can build them directly from the source code.

The Apache Software Foundation has a bunch of streaming technology projects. Analyst Janakiram MSV  created a guide for them here.

Join StreamNative’s Spann and John Kuchmek, principal solutions engineer at Cloudera, for the meetup: “Apache Pulsar and Apache NiFi for Cloud Data Lakes” today at 3 p.m. PST / 6 p.m. EST.

TRENDING STORIES
Susan Hall is the Sponsor Editor for The New Stack. Her job is to help sponsors attain the widest readership possible for their contributed content. She has written for The New Stack since its early days, as well as sites...
Read more from Susan Hall
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.