VOOZH about

URL: https://thenewstack.io/showdown-at-the-lakehouse-databricks-muscles-up-with-tabular/

⇱ Showdown at the Lakehouse: Databricks Muscles Up With Tabular - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-07-11 13:23:17
Showdown at the Lakehouse: Databricks Muscles Up With Tabular
Data / Storage

Showdown at the Lakehouse: Databricks Muscles Up With Tabular

By acquiring Tabular, Databricks can combine Apache Iceberg expertise with its own Delta Lake format, and promises to unify the increasingly fragmented market for data lakehouses.
Jul 11th, 2024 1:23pm by Joab Jackson
👁 Featued image for: Showdown at the Lakehouse: Databricks Muscles Up With Tabular
Feature image via Unsplash.

A data warehouse is where you store structured historic data.

Most organizations, once buying into the value of Big Data, found that all their information couldn’t be confined in a relational structure. So this need led to the data lake, where you put all your unstructured object data, hopefully, to fish useful bits out of it later.

And now as generative AI apps hunger for even more data, an emerging format, called the data lakehouse, has emerged to keep both structured and unstructured data in the same location, offering ACID-level transactions on object storage. It is natural for streaming data, AI modeling and training, and other new workloads.

With a lakehouse, a retail company, to provide an example, could combine weather forecasts with user buying data to better stock shelves with the seasonally appropriate items customers want, for instance.

Databricks Buys Tabular

Now, the originator of the lakehouse concept, Databricks wants to unify the field, building an “open data lakehouse.” So last month it purchased the data management company Tabular.

With this reportedly US $1 billion purchase, the company plans to unify the two most popular formats for lakehouses, Apache Iceberg and Databricks’ own Delta Lake.

Tabular was founded by Ryan Blue, Daniel Weeks, who created Apache Iceberg while they were working at Netflix — and Jason Reid.

“A bit part of [the acquisition] is having the creators of Iceberg in the company,” said Adam Conway, senior vice president of products at Databricks, in a phone interview with TNS.

The company will work to steer the two projects closer together, he said.

There are some slight differences between the two formats. Delta is really well suited for streaming workloads, whereas Iceberg is built on a strong data catalog, which provides many data management capabilities.

“Our goal is just to make it so that doesn’t matter really which one you choose,” Conway said.

The Emergence of the Lakehouse

About a decade ago, Databricks recognized an emergent behavior of “people using their data lakes as data warehouses,” Conway said.

The beauty of the lakehouse was that it was an architecture that allowed the user to pick the best analytics engine for the job — as long as the data was in an open format.

This approach would disrupt the traditional data warehouse vendors — Google‘s BigQuery, Amazon Web ServicesRedshift, Teradata, and Snowflake — which built business models around storing data on their own proprietary systems.

“It’s part of their business, that lock-in,” Conway said.

In response, Databricks developed its own open source lakehouse Delta Lake, which was subsequently donated to the  Linux Foundation.

The Delta Lake format was open in that it could be used by open source analytics engines such as (primarily) Apache Spark, but also others like Trino and Presto. Thus far, over 10,000 companies globally use Delta Lake (it has a larger user base than Iceberg, Conway argued). It used to process over 4EB of data on average each day.

Despite the fact that both Iceberg and Delta Lake use the underlying Apache Parquet columnar data format, and they both offer largely the same functionality, the development of each format progressed independently, and so they have been largely incompatible. “They had different features but they were done in different ways,” Conway said.

Customers had started using both formats, though in many cases, the deployments were happening in different parts of the organization, which defeated the point of a unifying lakehouse altogether.

“Unfortunately, the lakehouse paradigm has been split between the two most popular formats,” admitted Ali Ghodsi, co-founder and CEO at Databricks, in a statement.

Apache and Delta Lake, Together Forever

Tabular also has some cool user-focused features that Databricks users will no doubt enjoy. One in particular is the data catalog format.

But the overall goal is clear: Databricks wants to get both formats as compatible with each other as possible.

The Tabular folks can also help in the development of  Delta Lake UniForm, a Databricks open source format that allows users to read both Iceberg and Delta Lake formats. Databricks opened UniForm to general availability during its user conference last month in San Francisco.

👁 Data Lake with UniForm

Credit: Databricks

Set the Data Free

The timing of the Tabular acquisition announcement fell on the same week as the Snowflake‘s own annual user conference. There, Snowflake announced that it would support Iceberg Tables as a format, and users could either store that data in Snowflake or on their own servers. It also launched its own open source data catalog, Polaris, which could index data from non-Snowflake sources and be used by any analytics engine, not just Snowflake’s.

Like most other data warehouses, Snowflake keeps the user data mostly on its own servers. The company’s move top bring-your-own-storage is a validation of the lakehouse format, Conway argued.

This bring-your-own-storage seems to be tracking with current industry best practices.

At the Aerospike‘s virtual Real-time data Summit last month, Google Vice President Of Engineering Sameet Agarwal had discussed the importance of disconnecting storage and compute.

Storage should be the global foundation of any business, he said. A uniform data format spans even across data centers and should combine hot, warm and cold workloads in the same storage system.

And the costs and scalability of data storage should not be yoked to that of computer power, and it must be scalable. “It’s very important that the cost of managing the system does not scale linearly with the amount of storage as the amount of storage grows, the cost of management cannot grow linearly with it,” he advised.

All of this leads to why cloud storage is the best option for data lakehouses, he said.

“We think of its evolution from a data lakehouse to an AI lakehouse,” Agarwal said. “We want the Data AI Lakehouse to be a single source of truth, not just for structured and semi-structured data, but also unstructured data.”

TRENDING STORIES
Joab Jackson is a senior editor for The New Stack, covering cloud native computing and system operations. He has reported on IT infrastructure and development for over 30 years, including stints at IDG and Government Computer News. Before that, he...
Read more from Joab Jackson
SHARE THIS STORY
TRENDING STORIES
Aerospike, Amazon Web Services, Google, the Linux Foundation and Snowflake are sponsors of The New Stack. 
TNS owner Insight Partners is an investor in: Real, Databricks.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.