VOOZH about

URL: https://thenewstack.io/apache-beam-will-make-big-difference-organization/

⇱ Future-Proof Your Big Data Processing with Apache Beam - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2016-07-08 10:02:42
Future-Proof Your Big Data Processing with Apache Beam
contributed,op-ed,

Future-Proof Your Big Data Processing with Apache Beam

Jul 8th, 2016 10:02am by Jesse Anderson
👁 Featued image for: Future-Proof Your Big Data Processing with Apache Beam
Jesse Anderson
Jesse Anderson is a Data Engineer, Creative Engineer and CEO of Smoking Hand. He trains at companies ranging from startups to Fortune 100 companies on Big Data. This includes training on cutting edge technology like Apache Kafka, Apache Hadoop and Apache Spark.

I was sitting there at Strata SJC 2016 eating lunch and chatting with two engineers. They were talking about how they’re excited to go back to work and rewrite their system to use the new frameworks they’ve just learned about. The engineer in me thinks “awesome!” The business person in me thinks “I hope they run it by their manager first. That’s a massive time sink and probably a waste of time.”

I seriously doubt they ran it by their manager first. They probably spent a few days writing and debugging code that won’t help the business’ bottom line.

You might sit back and think “Our engineers are more disciplined than that. We’d never have to rewrite code for a new platform.” If your organization went from MapReduce to Spark, then, yes, your team rewrote code for a new platform. And they shouldn’t have had to.

Stopping the Rewrites

Your team’s code was written to directly use the MapReduce API. Their rewrite to use Spark was, in turn, written directly to Spark’s API. The next time there is a new platform, there will be another direct API rewrite and so on.

How do we stop rewriting code for a new platform? We start using an intermediary API and stop writing directly to an API.

This is where Apache Beam comes in. You only write to its API and then choose a technology to run the code on. The actual means of writing the code and executing the code is decoupled or separated.

Beam would change the lunch conversation above to “I learned about this new technology. I’m going to make a configuration change and see if our jobs run faster.” You’re no longer doing rewrites to change from one technology to another.

Placing Difficult Bets

Technical leaders are faced with a difficult decision of placing a long-term bet on rapidly changing technologies. We’re already seeing a transition from Hadoop MapReduce to Spark. It wasn’t that those who chose Hadoop MapReduce made a strategic mistake. A better batch processing engine came along later and they were forced to rewrite.

Now leaders face another difficult question. Which batch or streaming processing engine should I use? Should you use Spark, Flink, Storm, or another up and coming technology? This is where Beam becomes even more interesting. It uses the same API for both batch and stream (real-time) processing.

Which one of these technologies is the next big thing? Which one will be the one that gains a community, general acceptance, and lives on? I couldn’t tell you, but I don’t have to tell you now. As these new technologies come out, you make configuration changes and move to the new framework.

One API to Rule Them All

Simplicity saves businesses time and money. Think about how many APIs your engineers need to use to write data processing jobs. They’ll need to know Hadoop MapReduce API, Spark API, Flink API, and any other APIs. By having a single Beam API, your engineers only need to know one API. This same API can handle small data processing of MBs all the way up to large data processing of TBs and PBs.

Engineers will only need to know how each framework like Hadoop MapReduce or Spark works in a conceptual sense. They’ll need to understand the general tradeoffs between each system. However, the coding for each one will be the same. That will result in a big productivity increase.

The Future

I can’t tell what’s going to happen in the future for Big Data. This is especially true for streaming frameworks. We used to have to predict what’s coming next to prevent the next rewrite. Beam allows us to future-proof without worrying about the next big thing.

Feature image via Pixabay.

TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.