VOOZH about

URL: https://thenewstack.io/apache-airflow-3-0-from-data-pipelines-to-ai-inference/

⇱ Apache Airflow 3.0: From Data Pipelines to AI Inference - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-04-28 07:00:42
Apache Airflow 3.0: From Data Pipelines to AI Inference
650,
AI / Data / Python

Apache Airflow 3.0: From Data Pipelines to AI Inference

Latest edition provides DAG versioning, remote execution capabilities, range of scheduling options, and more.
Apr 28th, 2025 7:00am by Jelani Harper
👁 Featued image for: Apache Airflow 3.0: From Data Pipelines to AI Inference

Approximately 10 years ago, Apache Airflow launched with a relatively simple, yet timeless premise. It was initially devised as a means to allow developers and data engineers to write data pipelines as code.

With the recent release of the 3.0 version of the solution, the increasingly popular open source workflow management resource now offers a host of new features to support enterprise-scale applications.

There are version controls for data pipelines — termed Directed Acyclic Graphs (DAGs) — enhanced security features, and constructs underpinning AI inference execution.

Driven by an ever-active open source community, the platform’s new developments are significantly enlarging the array of use cases it supports.

Although still a mainstay of data integration and data orchestration efforts, it’s now expanding into deployments of data science and machine learning.

According to Vikram Koka, Chief Strategy Officer at Astronomer and Apache Airflow Committer, “As Airflow adoption has grown, now we see 30% of our users using Airflow for MLOps. We’re seeing 10% of our users using it for generative AI applications.”

The 3.0 release has several capabilities that support each of these developments, while reinforcing its capital value proposition of servicing data-driven workflows via Python-based code.

DAG Version Controls

One of the most horizontally applicable features of the 3.0 edition of Airflow is the versioning it provides for data pipelines or DAGs.

Prior to the release, the system functioned as though users solely cared about the most recent version of the code for these tasks.

The new version of the platform lets developers see previous incarnations of DAGs, as well as a multitude of other relevant concerns, including “all the operation elements,” Koka said. “Logs, diagnostics, metrics… everything about that, you can actually go back now and look at.”

This functionality is pivotal for multiple developer teams working on DAGs, or even on respective parts of the same DAG. It’s also helpful for inheriting data pipelines when their original authors have switched jobs or projects.

The most pervasive use case likely entails debugging attempts and gleaning why parts of data pipelines are broken, or how they can be improved to maximize efficiency.

Airflow’s DAG versioning is fairly detailed, including aspects of the prior history of DAGs, like “What were the logs of that prior history; what was the structure of that prior history,” Koka commented. “How long did it take to run in a historical version? Being able to look at all the DAG runs based on prior incarnations of that pipeline or DAG then becomes more important.”

Decoupled Security Enhancements

Airflow 3.0 has also boosted its security features to make the platform worthy of enterprise-scale production opportunities. Its chief security upgrade is separating the task execution capabilities from the administration, scheduling, and overall orchestration capabilities the solution provides.

Airflow’s server components now include “an API server which can basically read and write into the Airflow metadata database,” Koka said. “And then, we provide what’s called a task SDK, which is a client component, which is initially in Python. So, all the user-defined code only runs in the context of this task SDK.”

With this paradigm, the task SDK’s code doesn’t directly connect to Airflow’s metadata database, preventing worker processes from directly writing to it. Instead, the jobs specified in the task SDK interface with the API server to report and receive the status of jobs. The result is “a stronger security access control posture,” Koka explained. Koka also mentioned that a task SDK for Golang will be available imminently, and that community members have been asking for a task SDK with support for Rust.

Remote Execution

One of the more compelling consequences of the decoupling of Airflow’s task execution capabilities from its other core functions is that it effectively allows for tasks to run wherever users would like them to. In some cases, this latitude can reinforce security and data governance controls — like running jobs on data complying with financial industry regulations in a private cloud, so it doesn’t leave a particular data center.

For this use case, such data “can be orchestrated centrally but still remain completely local to that particular datacenter for data sovereignty,” Koka said.

The API server provides the centralized orchestration Koka referenced, while its separation from the task SDK enables jobs to run in completely different clusters, in public or private clouds, or wherever else organizations specify. “You might have some ML jobs which would benefit from GPUs,” Koka commented.

“You can run those on a completely separate GPU cluster. You don’t need to add the expense of having those GPUs on your same cluster. You can just go and rent the GPU cluster when the need comes up.”

Scheduling Options

The 3.0 release still supports, yet substantially expands beyond, the traditional batch paradigm for scheduling data pipeline jobs. Several modes for scheduling tasks are now available, including:

  • Event-Driven Scheduling: With this option, organizations can trigger workflows based on data changes in external systems. There are also low-latency implications, such as relying on this scheduling method to trigger certain pipeline components based on data arriving in Kafka. “It enables Airflow to be reactive to data changes in the rest of the ecosystem,” Koka mentioned. “It’s more near real-time event processing.”
  • Simultaneous DAG Execution: This scheduling approach is helpful for machine learning model inferences. “You really want to be able to run many of those at the same time,” Koka said. “We’ve added support for inference execution so you can run multiple of these pipelines of incoming data at the same time.”
  • Ad-Hoc Scheduling: Conceptually, there’s some overlap between this scheduling variety and event-driven scheduling. However, “I generally tend to think about it as being something which is triggered based on almost like a human event or human-triggered event,” Koka said. “Something like a mortgage application showing up, or somebody saying I want to run a DAG as the result of an API, which is from some other system, triggered by a human action.”

Enterprise Maturation

Apache Airflow 3.0’s DAG versioning controls, security upgrades, remote execution capabilities, and job scheduling flexibility make it useful for a broadening number of use cases. The latest edition also allows for backfills, so organizations can asynchronously rerun missed tasks, monitor their progress, and cancel them.

Each of these developments signals a transition of the pipeline authoring and deployment tool from one desired by developers and engineers in backrooms, to those deployed in enterprise applications.

TRENDING STORIES
Jelani Harper has worked as a research analyst, research lead, information technology editorial consultant, and journalist for over 10 years. During that time he has helped myriad vendors and publications in the data management space strategize, develop, compose, and place...
Read more from Jelani Harper
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Astronomer.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.