VOOZH about

URL: https://thenewstack.io/databricks-brings-data-pipeline-service-to-ga/

⇱ Databricks Brings Data Pipeline Service to GA - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-04-05 08:25:15
Databricks Brings Data Pipeline Service to GA
news,
Cloud Services / Data

Databricks Brings Data Pipeline Service to GA

Databricks, the cloud data platform company, has announced the general availability (GA) of Delta Live Tables (DLT).
Apr 5th, 2022 8:25am by Andrew Brust
👁 Featued image for: Databricks Brings Data Pipeline Service to GA

Databricks, the cloud data platform company that coined the phrase “data lakehouse” and was founded by the creators of Apache Spark, is today announcing general availability (GA) of Delta Live Tables (DLT). DLT is a data transformation and data pipeline service that Databricks launched in preview form, in May 2021.

The New Stack was fortunate enough to be briefed on DLT’s GA by Databricks Distinguished Software Engineer Michael Armbust, who created Delta Live Tables, and Databricks CEO Ali Ghodsi. In the briefing, the two explained some of the finer points of DLT that help it avoid being “just another” ETL (extract-transform-load) solution on the market.

Hot Mess, Cool Cleanup

Let’s start by addressing the problem Delta Live Tables seeks to address. As Ghodsi describes it: “… people are… stitching together so many different things. They have the data, they use these tools to get [it] in, but then they have to use Airflow, or maybe they’re using Oozie, they’re writing a bunch of custom ETL scripts, they’re moving it into data warehouses, they’re moving it into data lakes… they have to do their own monitoring to make sure that this stuff doesn’t break… there’s just behind-the-scenes hell, that everybody has to do.”

Now contrast this with Databricks’ view of how things should be: data engineers should only have to provide a declarative specification of the data transformations they wish to perform in a data pipeline, and do it in a language they already know. Moreover, data engineers shouldn’t have to concern themselves with the logistics behind, or special performance considerations around, executing their pipelines. Instead, they should only have to define a spec; the system should then take over, managing execution on an on-demand, continuous or scheduled basis.

In a nutshell, that’s what Delta Live Tables seek to do.

Sweet Syntactic Sugar

Since Databricks thinks data engineers should be able to do data pipelines by leveraging skills they already have, DLT’s bread and butter are SQL and Python code snippets in a notebook.

On the SQL side, the output of a pipeline is defined by a query whose result set indicates an output table’s schema and content. Extensions to the SQL syntax allow specification of “expectations” — data quality rules and actions to be taken when rows of data don’t comply.

On the Python side, rather than writing imperative code, the developer leverages extensions to the DataFrame API with a declarative syntax for specifying calculations, destination table column names, filter conditions, and support for attributes that specify the same data quality “expectations” supported in SQL.

In Armbrust’s words: “In both cases… you are giving a declarative description of what tables should exist inside of your lakehouse, and then the system is figuring out how to create and keep those tables up-to-date.”

Execution Sans Naivete

👁 User interface for Delta Live Tables jobs

Databricks user interface for Delta Live Tables jobs. Note list of status messages from previous run at bottom and execution graph visualization in the center.

Notebooks with DLT code can be scheduled as a special kind of job in Databricks, which triggers analysis of the notebook’s code and generation of an intelligent execution graph. The analysis permits parallel execution of subtasks that are determined not to have mutual dependencies and proper sequencing of subtasks that do. This allows Databricks to go beyond mere agnostic scheduling of the notebook’s code. As Ghodsi explained it, pipelines generated by other platforms whose execution might be orchestrated by Apache Airflow, for example, would not enjoy such boosted execution.

The acceleration this brings is comparable to that of conventional SQL commands executed on a database with a query optimizer. In fact, Spark SQL‘s query optimizer is responsible for generating the execution graph in the first place. This makes sense, because Armbrust also created Spark SQL. In addition, Delta Live Tables works for both streaming data and data-at-rest since Spark Streaming, also created by Armbrust, works with the same data access constructs used by the rest of the Databricks platform.

Think Different?

To date, most ETL implementations have involved completely code-driven efforts, or the use of a standalone ETL platform with a visual design surface. Delta Live Tables finds a middle ground, taking a code-based yet declarative approach. While the dbt platform takes a similar SQL-based declarative approach, it’s a standalone solution, whereas DLT’s engine is deeply integrated into the very same Databricks platform used for data science and analytics.

Check out: Fivetran Transformations for dbt Core Simplifies Data Analytics Pipelines

Meanwhile, there’s no reason that Databricks couldn’t create a visual designer for DLT that would generate the underlying SQL code. In fact, the Databricks workspace user interface generates a visualization of the execution graph when a job is built around a DLT notebook (as seen in the screenshot above). And while the graph visualization is a management/monitoring feature and not an authoring interface, there’s no reason it couldn’t work in both directions, generally speaking. Maybe that’s why I got the distinct feeling when speaking with Armbrust and Ghodsi that a visual designer might be on the horizon.

A Market Execution Engine, too

For now, though, Databricks is focused on making its platform an omni-data workbench and execution environment that spans data ingest, exploration, storage, transformation, analytics, data science, machine learning and MLOps. And as Databricks continues to square off with Snowflake in the battle for independent data cloud provider and ecosystem, its combination of functional breadth and technical depth makes a great deal of sense.

TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Databricks.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.