VOOZH about

URL: https://thenewstack.io/architects-guide-to-apache-iceberg/

⇱ Architect’s Guide to Apache Iceberg - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-05-21 11:00:21
Architect’s Guide to Apache Iceberg
sponsor-minio,sponsored-post-contributed,
Data / Databases

Architect’s Guide to Apache Iceberg

Learn what's new in Iceberg and why convergence with Delta Lake and other open table formats is good for everyone working with data lakehouses.
May 21st, 2025 11:00am by Brenna Buuck
👁 Featued image for: Architect’s Guide to Apache Iceberg
Image from Song_about_summer on Shutterstock.
MinIO sponsored this post.

Apache Iceberg 1.9.0, released April 28, delivers a set of updates that do more than just extend its feature set. They signal something bigger: the gap between Delta Lake and Iceberg is closing. Features once exclusive to Delta Lake, like row-level operations with lineage, fast semi-structured data handling, are now available in Iceberg. And Iceberg is now supporting easier migration off Delta Lake, a sure sign that they are no longer competitors, but a victor inheriting the spoils.

Let’s explore what’s new in Iceberg 1.9.0, how it reflects Delta Lake’s historical advantages and what this convergence means for the future of the Lake house.

Original Differences Between Iceberg and Delta

Originally, Iceberg and Delta Lake made different architectural bets.

Delta Lake prioritized performance early, optimizing tightly around Parquet and Spark with a transaction log model. Iceberg, on the other hand, focused on long-term data organization — things like building a format-agnostic table spec, introducing snapshot-based versioning and defining a layered metadata hierarchy. Delta used flat transaction logs; Iceberg used manifest trees. Delta required Parquet; Iceberg supported multiple formats like Avro, Orc and of course Parquet. These distinctions gave each project a unique edge.

With Iceberg 1.9.0, however, the story shifts. Iceberg is closing performance gaps while preserving architectural clarity. Delta is adding compatibility layers. What once were differentiators are now shared capabilities.

Iceberg 1.9.0: What’s New?

Enhanced Row-Level Operations

Iceberg 1.9.0 allows the coexistence of equality deletes and row-lineage tracking. This advancement enables precise deletion of rows based on specified conditions and assignment of unique row IDs to inserted or updated rows, facilitating accurate data versioning and auditing.

Delta Lake has long supported this kind of row-level mutation and lineage tracking. Iceberg now matches that capability, closing one of the functional gaps between the two.

Delta Lake to Iceberg Migration

Iceberg offers a structured approach to migrating from Delta Lake through the `iceberg-delta-lake` module. This module provides the `snapshotDeltaLakeTable`action, enabling the creation of an Iceberg table that references the data files of an existing Delta Lake table without data duplication. It also supports maintaining the transactional history during migration, ensuring continuity in data operations.

The result is a more direct and efficient way to move from Delta to Iceberg and a clear sign that Iceberg is becoming the dominant open table format.

Variant Data Type Support

Iceberg 1.9.0 introduces a `variant` logical type for storing semi-structured data (like JSON) in a binary format. This avoids the performance overhead of parsing and storing JSON as strings.

The idea comes directly from Delta Lake, which introduced the same feature to improve query performance by up to eight times in benchmarked scenarios. Iceberg adopting this capability makes it a viable option for low-latency workloads with semi-structured data like involving logs and events.

Native Geospatial Support

Iceberg 1.9.0 adds a new `geometry` logical type, enabling efficient storage and querying of spatial data sets. Key features include:

  • Support for Well-Known Binary (WKB) encoding.
  • Default Coordinate Reference System (CRS) set to OGC:CRS84.
  • Multidimensional support for XY, XYZ, XYM and XYZM coordinate formats.
  • Optional spatial statistics like bounding boxes to enhance query performance and spatial indexing.

This geospatial model aligns with the GeoParquet specification, ensuring compatibility with open data standards. It’s an example of Iceberg — and by extension the data community — circling around a common standard.

REST Catalog: More Enterprise-Ready

Improvements to REST catalog authentication include:

  • Support for pluggable authentication handlers.
  • Clearer separation between auth and request logic.
  • Expanded testing for enterprise identity systems.

This is a foundational update for production-grade deployments that use the Iceberg REST catalogs for multiengine or multi-tenant environments, a very common use case in enterprise data lakehouse deployments.

Deprecating the Past: Hadoop 2 and Spark 3.3 Dropped

Support for Hadoop 2 and Spark 3.3 has been removed. This isn’t just house cleaning: it’s a signal. If you’re still tied to legacy Hadoop infrastructure, it’s time to plan your exit. Iceberg is moving forward with modern runtimes, cloud native storage and scale-out compute.

Other Notables

  • Partition statistics APIs: Exposes partition-level metadata for better planning and pruning.
  • Nanosecond timestamp support: Extended precision for Parquet backends.
  • InternalData API: Improved integration paths for engines like Spark, Flink and Trino.

You can find the full release notes here.

Convergence Is Good for Everybody

Delta Lake has long been the default choice for Databricks users. But since Databricks acquired Tabular, the catalog company founded by Iceberg’s creators, the future of open-table formats looks a lot more unified.

Iceberg is gaining the performance and usability features that made Delta Lake popular while staying true to its architectural clarity: independent catalog support, spec-driven evolution and openness. Delta Lake is starting to expose REST interfaces and compatibility layers like UniForm.

This is convergence. And it’s good for everyone building on top of lakehouses. Organizing around standards lowers the cognitive and operational overhead for teams adopting or migrating lakehouses. Which means that data engineers, architects, analysts and AI engineers don’t have to relearn tools, platforms or functionality. When things work the way you expect them to, everything is easier.

Why Storage Matters

If the storage can’t keep up, nothing else matters.

Iceberg depends on fast scans, fast metadata operations and high throughput.

Modern object-storage is designed for that. It runs on commodity hardware and deploys to private clouds, data centers, colos or edge locations, all while delivering the best performance on the least amount of hardware. The economics of a private cloud Iceberg deployment is unbeatable: no egress fees or cost for GETS and PUTs means you can scale up as far and fast as you need without worrying about a sky-high cloud bill. Not to mention that the most secure deployments are still those in airgapped deployments.

Storage isn’t the exciting part of the stack, but if it’s too slow, too expensive or not safe enough, everything else falls through.

The Path Forward

The convergence of Delta Lake and Iceberg isn’t about one winning over the other. It’s about the ecosystem maturing. As both projects evolve to adopt each other’s strengths, the real winner is the user. Teams can now choose tools based on architectural fit and operational goals, not just feature checklists or vendor alignment.

This shift pushes the industry toward greater interoperability, more open standards and simpler decisions. It lowers switching costs, encourages best practices and frees teams to focus on building reliable, high-performance data systems rather than navigating format silos.

This is progress.

MinIO delivers high-performance, Kubernetes-native object storage. Open source, software-defined and S3 compatible, they are optimized for the multicloud. MinIO runs across any public, private, colo or edge cloud and is performant enough for any primary storage workload, from databases to AI/ML.
Learn More
The latest from MinIO
TRENDING STORIES
Brenna Buuck is the subject matter expert at MinIO for databases and datalakes. A data engineer turned developer evangelist, she is passionate about coding, data, and learning. She endeavors to inspire and educate other developers about the latest tools and...
Read more from Brenna Buuck
MinIO sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Databricks.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.