VOOZH about

URL: https://thenewstack.io/aws-debuts-a-distributed-sql-database-s3-tables-for-iceberg/

⇱ AWS Debuts a Distributed SQL Database, Amazon S3 Tables for Iceberg - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-12-04 14:00:57
AWS Debuts a Distributed SQL Database, Amazon S3 Tables for Iceberg
sponsor-andela,sponsored-topic,
Cloud Services / Data / Databases

AWS Debuts a Distributed SQL Database, Amazon S3 Tables for Iceberg

AWS re:Invent 2024: AWS plants lake house tables on its S3 object storage, and debuted a globally-distributed SQL database.
Dec 4th, 2024 2:00pm by Joab Jackson
👁 Featued image for: AWS Debuts a Distributed SQL Database, Amazon S3 Tables for Iceberg
Feature image from the AWS re:Invent keynote livestream. 

Staying abreast of all the latest trends in data management, Amazon Web Services has introduced support for Apache Lakehouse tables for its S3 object storage service, as well as debuted a distributed SQL database that offers unlimited scalability with transactional-consistency and low latency.

Matt Garman, who is the new AWS CEO, introduced the technology at the company’s annual AWS re:Invent conference, being held this week in Las Vegas.

A New Bucket Type

For organizations building out multisource open source data lake houses for analytics, the company has introduced managed service offering Apache Iceberg Tables, called Amazon S3 Tables.

The company claims the data store service offers three times faster query performance and up to 10 times more transactions per second for analytics workloads, compared to storing the data in a general purpose S3 bucket.

AWS claims that Amazon S3 Tables, now generally available, is “the first cloud object store with fully-managed support for Apache Iceberg,” though the company is following in the steps of both Snowflake and Databricks, which earlier this year expanded on their support of Apache Iceberg.

A new bucket type, Amazon S3 Tables brings a number of benefits around faster analytics, allowing apps to discover data more quickly through queryable object metadata.

S3 is currently the largest object store in the world, holding over 400 trillion objects for millions of customers. AWS had found that Apache Parquet had become one of the fastest-growing data file formats on S3. It was used to store tabular data, a preferred format for data querying. Iceberg is one of a number of Open Table Formats (OTF) that can manage Apache Parquet files.

One of the most widely used engines for accessing Parquet files is Iceberg, which provides the ability to query the data with SQL through the user’s preferred query engine, such as Apache Spark or Apache Flink.

Iceberg comes with its own challenges, Garman told the audience.

“A lot of customers will tell you that — as many open source projects are — Iceberg is actually really challenging to manage, particularly at scale. It’s hard to manage the performance,  the scalability, the security,” Garman said. “And so what happens is, you hire dedicated teams to do this to take care of, things like cable maintenance, data compaction, access controls, all of these things that you go into managing and trying to get better performance out of your iceberg implementations.”

“S3 is completely reinventing object storage specifically for the data lake world to deliver better performance, better cost and better scale.” — Matt Garman, AWS re:Invent

And that is the selling point of Amazon S3 Tables, to take care of all these chores automatically. “We basically improve the performance and scalability of all of your Iceberg tables.

Amazon S3 Tables takes care of the maintenance that comes with Tables, such as compaction and snapshot chores. It also offers row-level transactions, queryable snapshots via time travel functionality, schema evolution, and table-level access controls.

Amazon S3 Tables is integrated (in preview) with the AWS Glue Data Catalog, which provides a gateway to AWS’ own visualization and analysis services such as Amazon Athena, Redshift, EMR, and QuickSight.

Better Metadata

Amazon S3 Tables also eliminates the need for customers to build and/or maintain their own metadata systems.

As the size of user data grows into the petabyte level, object metadata grows exponentially more important; attributes such as date and location of origin can be essential for finding the data you need, Garman explained.

Managing metadata can also be a chore, Garman pointed out. You have to store the data, then associate it with the relevant object, and then build and event processing pipeline to bring it up during searches.

A related feature S3 Metadata, now in preview, automatically generates metadata for each new object, including system information about the object itself (size, source, etc.). Users/Apps can add in their own customized metadata as well (i.e. product SKUs, transaction IDs, content ratings, customer details etc.).

“S3 metadata is the fastest and easiest way for you to instantly discover information about your s3 data,” Garman said.

“We automatically store all of your object metadata in an Iceberg table, and then you can use your favorite analytics tool to easily interact and query that data, so you can quickly learn more about your objects and find the object you’re looking for,” he said. “And as objects change, s3 automatically actually updates that metadata in minutes, so it’s always up to date.”

Amazon Aurora DSQL, a Distributed SQL Database

Garman also introduced a new, distributed version of the company’s Aurora SQL database service, called Amazon Aurora DSQL.

PostgreSQL-compatible DSQL offers nearly unlimited scalability, according to the company, as partitions can be spread out across multiple disks and even across multiple availability zones.

It offers strong consistency and 99.999% multiregion availability.

Single-server database systems can offer strong consistency, though they are confined to a single region where the server lives. There are also distributed databases that can offer multiregion availability, though they suffer in performance, as it takes time to synchronize the database cluster across all the regions.

AWS built DSQL to do both, Garman said.

And, as a managed service, DSQL has no infrastructure to manage. No need to provision, patch, or manage database instances. Updates and security patching happen with no downtime

There are a number of high-performance distributed relational databases, such as CockroachDB, although AWS claims that DSQL is 4x faster than competitors.

Aurora DSQL does this by decoupling transaction processing from storage.

“We actually separated the transaction processing from the storage layer so you don’t need every single statement to go check at commit time,” Garman explained. “Instead, you do the single on commit; we parallelize all of the writes at the same time across all of the regions, so you can get strong consistency across regions with super fast writes to the database.”

The Amazon Elastic Compute Cloud (EC2) instance holding the database is synchronized through the Amazon Time Sync Service, which ensures microsecond-level time precision. As a result, each region’s copy of the database sees each database operation in the exact order in which they occurred.

As companies build an international customer base, they find that a single-node database can’t offer global consistency and sufficiently low latency.

Companies such as Autodesk, Electronic Arts, Klarna, QRT, and Razorpay are exploring the additional benefits a multiregion distributed database would bring.  For instance, Razorpay, an Indian financial technology company, could use DSQL to support its growing user base with the strong multiregional consistency needed for financial use cases.

The serverless NoSQL DynamoDB has gone global as well, offering a multiregion, multi-active database that provides 99.999% availability (using the same technology and architecture as DSQL).

Andela provides the world’s largest private marketplace for global remote tech talent driven by an AI-powered platform to manage the complete contract hiring lifecycle. Andela helps companies scale teams & deliver projects faster via specialized areas: App Engineering, AI, Cloud, Data & Analytics.
Learn More
The latest from Andela
Hear more from our sponsor
TRENDING STORIES
Joab Jackson is a senior editor for The New Stack, covering cloud native computing and system operations. He has reported on IT infrastructure and development for over 30 years, including stints at IDG and Government Computer News. Before that, he...
Read more from Joab Jackson
SHARE THIS STORY
TRENDING STORIES
Amazon Web Services and Snowflake are sponsors of The New Stack. 
TNS owner Insight Partners is an investor in: Databricks.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.