VOOZH about

URL: https://thenewstack.io/how-apache-airflow-better-manages-machine-learning-pipelines/

⇱ How Apache Airflow Better Manages Machine Learning Pipelines - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-06-08 15:13:55
How Apache Airflow Better Manages Machine Learning Pipelines
podcast,sponsor-amazon-web-services-aws,sponsored-podcast-day-of-podcasting,video,
AI / Data

How Apache Airflow Better Manages Machine Learning Pipelines

In this episode of The New Stack Makers, a trio of technologists, who all work with Amazon Web Services Managed Service for Airflow team, talked about improving the Apache Airflow user experience.
Jun 8th, 2023 3:13pm by Alex Williams
👁 Featued image for: How Apache Airflow Better Manages Machine Learning Pipelines
AWS sponsored this post.

VANCOUVER — What is apparent with Apache Airflow, the open source project for building pipelines in machine learning? The experience is getting even easier, as illustrated in a discussion on The New Stack Makers with three technologists from Amazon Web Services.

Apache Airflow is a Python-based platform to programmatically author, schedule and monitor workflows. It is well-suited to machine learning for building pipelines, managing data, training models, and deploying them.

Airflow is generic enough for the whole pipeline in machine learning. Airflow fetches data and performs extraction, transformation and loading (ETL). It tags the data, does the training, deploys the model, tests it and sends it to production.

Since its inception, Amazon Web Services (AWS) has been the best place for customers to build and run open source software in the cloud. AWS is proud to support open source projects, foundations, and partners.
Learn More
The latest from AWS
Hear more from our sponsor

In an On the Road episode of Makers recorded at the Linux Foundation’s Open Source Summit North America, our guests, who all work with the AWS Managed Service for Airflow team, reflected on the work on Apache Airflow to improve the overall experience:

Dennis Ferruzzi, a software developer at AWS, is an Airflow contributor working on project  API-49, which will update Airflow’s logging and metrics backend to the OpenTelemetry standard. The API will allow for more granular metrics and better visibility into Airflow environments.

Niko Oliveira, a senior software development engineer at AWS, is a committer/maintainer for Apache Airflow. He spends much time reviewing, approving and merging pull requests. A recent project included writing and implementing AIP-51 (Airflow Improvement Proposal), which modifies and updates the Executor interface in Airflow. It allows Airflow to be a more pluggable architecture, which makes it easier for users to build and write their own Airflow Executors.

Raphaël Vandon, a senior software engineer at AWS, is an Apache Airflow contributor working on performance improvements for Airflow and leveraging async capabilities in AWS Operators, the part of Airflow that allows for seamless interactions with AWS.

“The beautiful thing about Airflow, that has made it so popular is that it’s so easy,” Oliveira said. “For one, it’s Python. Python is easy to learn and pick up. And two, we have this operator ecosystem. So companies like AWS, and Google and Databricks, are all contributing these operators, which really wrap their underlying SDK.”

‘That Blueprint Exists for Everyone’

Operators are like generic building blocks. Each operator does one specific task, Ferruzzi said.

“You just chain them together in different ways,” he said. “So, for example, there’s an operator to write data to [Amazon Simple Storage Service]. And then there’s an operator that will send the data to an SQL server or something like that. And basically, the community develops and contributes to these operators so that the users, in the end, are basically saying the task I want to do is pull data from here. So I’m going to use that operator, and then I want to send the data somewhere else.

“So I’m going to go and look at, say, the Google Cloud operators and find one that fits what I want to do there. It’s cross-cloud. You can interact with so many different services and cloud providers. And it’s just growing. We’re at 2,500 contributors now, I believe. And it’s just like people find a need, and they contribute it back. And now that block, that blueprint exists for everyone.”

Airflow 2.6 has an alpha for sensors, Vandon said. Sensors are operators that wait for something to happen. There are also notifiers, which get placed at the end of the workflow. They act depending on the success (or not) of the workflow.

As Vandon said, “It’s just making things simpler for users.”

Since its inception, Amazon Web Services (AWS) has been the best place for customers to build and run open source software in the cloud. AWS is proud to support open source projects, foundations, and partners.
Learn More
The latest from AWS
Hear more from our sponsor
TRENDING STORIES
Alex Williams is founder and publisher of The New Stack. He's a longtime technology journalist who did stints at TechCrunch, SiliconAngle and what is now known as ReadWrite. Alex has been a journalist since the late 1980s, starting at the...
Read more from Alex Williams
AWS sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma, Databricks.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.