VOOZH about

URL: https://thenewstack.io/apache-hop-harnesses-metadata-to-create-visual-data-pipelines/

⇱ Apache Hop Harnesses Metadata to Create Visual Data Pipelines - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-02-02 13:19:49
Apache Hop Harnesses Metadata to Create Visual Data Pipelines
profile,
Data / Open Source

Apache Hop Harnesses Metadata to Create Visual Data Pipelines

Using Hop's graphical user interface, data workflows and pipelines can be set up visually and described with metadata.
Feb 2nd, 2022 1:19pm by Susan Hall
👁 Featued image for: Apache Hop Harnesses Metadata to Create Visual Data Pipelines

Apache Hop, the open source metadata-based data engineering and data orchestration platform, recently was named an Apache Software Foundation top-level project.

Everything in Hop is treated as metadata. This allows it to work flexibly with hundreds of data platforms and their configuration.

The metadata describes how data should be processed or how workflows and pipelines should be orchestrated.

The project originated more than two decades ago as the Extract-Transform-Load (ETL) platform Kettle, which was acquired by Pentaho (now Hitachi Vantara) and brought to market as Pentaho Data Integration (PDI).

The software was refactored over several years, and a fork of it — the name standing for Hop Orchestration Platform — entered the Apache Incubator in September 2020.

Using a graphical user interface, data workflows and pipelines can be set up visually and described with metadata.

With its drag-and-drop graphical user interface (GUI), users don’t have to have specific programming knowledge to design, test and run workflows and pipelines. Alternatively, programmers and developers can work from the command line.

It runs in a Java environment and can be used independently of the operating system. It has been designed to work anywhere: on-premises, in the cloud, on a bare OS, in containers, IoT environments and more, on Windows, Linux, and OSX.

The Hop engine uses a kernel architecture containing only core functionality. All other functionality is added through plugins. More than 250 plugins are available with the standard installation, though you can easily add your own or third-party plugins.

Hop 1.0 was released in October, which included a massive architecture redesign and code refactoring toward its current kernel-plus-plugins architecture.

“This architecture significantly improves the development process and allows Hop to adapt to the architecture it needs to be deployed in, not the other way around,” members of the Hop Project Management Committee said in an email.

The integration with Apache Beam allows Hop pipelines and workflows to not only run on Hop’s native engine locally and remotely but also on Apache Spark, Apache Flink and Google Cloud Dataflow. This allows project teams to take their projects where the data is without any modifications to their work.

Hop supports project life cycle management through best practices, integrated version control, unit testing, support for projects and deployment environments and more.

It includes a library of integration tests and templates for metadata injection pipelines. The injection is done at runtime, reducing the need for manual development.

Hop supports multiple projects and environments. Project environments contain the configuration for a project deployment on development, test, production or other stages of your project’s life cycle.

Project files are version controlled through the git integration in Hop GUI’s file explorer, giving users options such as the ability to visually compare two versions of a workflow or pipeline.

Using the Hop GUI, developers and engineers can manage the entire project life cycle: switch between projects, environments, runtime configurations, manage git versions, etc.

“We started adopting Apache Hop in our data integration projects in early 2021 because of its flexibility, scalability and ease of use, in various scenarios ranging from classical DWH ETL processes to highly critical, real-time processes,” said Sergio Ramazzina, CEO and chief architect at Italian business analytics firm Serasoft S.r.l., and member of the Apache Hop Project Management Committee.

“We are impressed by how responsive the community is in solving issues and helping users approaching the platform — an important point to increase users’ adoption and trust.”

The Hop community continued working on the Hop 1.1.0 release throughout the graduation process, according to the committee

Hop 1.1.0 will contain work on over 200 tickets. These include numerous UI improvements and bug fixes, Apache Tika support and asynchronous web services.

In addition to extended integration with Apache Beam, work continues on more and new integration with other Apache projects like Airflow, PLC4X and others.

“In the longer term, we’ll build a marketplace where third-party plugins can easily be shared, a new GUI for improved monitoring, logging, debugging and previewing of pipelines and lots of other exciting new features the community is already working on,” the committee said.

“We consider the graduation as a top-level project as the start of an exciting new era for Hop.”

TRENDING STORIES
Susan Hall is the Sponsor Editor for The New Stack. Her job is to help sponsors attain the widest readership possible for their contributed content. She has written for The New Stack since its early days, as well as sites...
Read more from Susan Hall
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.