VOOZH about

URL: https://thenewstack.io/tidb-x-open-source-database/

⇱ S3 is the new network: Rethinking data architecture for the cloud era - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-02-02 04:00:16
S3 is the new network: Rethinking data architecture for the cloud era
sponsor-pingcap,sponsored-post-contributed,
Cloud Services / Networking / Storage

S3 is the new network: Rethinking data architecture for the cloud era

Cloud object storage provides a highly durable, always-on, strongly-consistent single source of truth. It’s not as fast as local storage, but it doesn’t have to be. Cloud object storage will, for all intents and purposes, be the network.
Feb 2nd, 2026 4:00am by Max Liu
👁 Featued image for: S3 is the new network: Rethinking data architecture for the cloud era
Sara Oliveira for Unsplash+
PingCAP sponsored this post.

For decades, distributed databases have been built around the assumption that storage will live close to compute.

The farther data travels over the network, the reasoning goes, the greater the potential for delay. Local RAID (redundant array of independent disks) arrays, network-attached storage (NAS), and cluster file systems keep data close, making it quick and easy to access.

But in a distributed system, keeping the entire data store close to compute makes scaling slow, cumbersome, and expensive. Each time a node or cluster is replicated, its associated data must be replicated as well.

It isn’t ideal, but until recently, there wasn’t any reasonable alternative. Databases had to scale. Service-level agreements (SLAs) had to be met. Wide-area networks weren’t reliable enough to support high-performance databases at scale. Database designers accordingly spent a great deal of energy solving problems related to coordination, consistency, and replication logic.

But imagine things were different. What if they didn’t have to worry about the network, where their data lived, or how to get it from Point A to Point B? How would they design a database then?

That’s the intriguing question raised by the advent of cloud object storage services like AWS S3, Google Cloud Storage, and Microsoft Blob Storage.

What is cloud object storage?

The structure of cloud object storage services couldn’t be simpler. They’re essentially giant heaps of data, accessed via an API, through key/value pairings.

Their unlimited storage capacity and their “everywhere” availability make them revolutionary. They can hold billions of records — images, logs, training data, whatever you need — and crucially, they can make every one of those records available to compute anywhere in the world, at any level of workload.

S3 is extremely reliable. It’s designed for 11 nines of durability (that’s 99.999999999%) and 99.99% availability, and it replicates data automatically across Amazon’s regional facilities. This means data on S3 is extremely safe and highly available without the need to manage physical disks or replication.

In addition, S3 scales seamlessly. There are no fixed volumes. No need for capacity planning. The amount of data you can store is practically unlimited, and performance scales with parallel access rather than being limited by a single-server bottleneck. These guarantees free architects from worrying about low-level storage failures, capacity, and edge cases involving consistency.

What services like S3 lack in sheer speed, they more than make up for in reliability and ease of maintenance.

In short, cloud object storage provides a highly durable, always-on, strongly-consistent single source of truth. It’s not as fast as local storage, but it doesn’t have to be. What services like S3 lack in sheer speed, they more than make up for in reliability and ease of maintenance. Instead of worrying about shards, segmentation, and software-defined networks, a database can simply retrieve data with confidence that it will be delivered in a reasonable amount of time.

What this means is that for the next generation of distributed databases, cloud object storage will, for all intents and purposes, be the network.

Architectural patterns emerging around object storage

Building on cloud object storage enables several architectural patterns that were previously impractical.

  • Ephemeral compute clusters: Keeping object storage separate from compute makes it easier to spin up clusters temporarily for a specific job and tear them down afterward. This is especially useful for AI agents, which often construct temporary databases to accomplish tasks. Compute can be spun up at will without the overhead of data replication.
  • Event-driven workflows: The arrival of a new object in S3 can trigger a Lambda function, start a training job, or notify downstream consumers. This sort of workflow would be impractical in a system with highly replicated data, but it’s trivial when data is centralized in a single store.
  • AI and ML pipelines: Many distributed machine learning workflows benefit from a centralized object storage data store. Training datasets, feature stores, model checkpoints, and experiment logs all commonly live in object stores. Frameworks like TensorFlow, PyTorch, and SageMaker are designed to stream data directly from object storage.
  • Tiering storage at large scale: Databases often classify data as either in-demand (“hot”) or rarely accessed (“cold”). Hot data is stored on high-speed flash storage, while cold data is stored on a more cost-efficient spinning disk. Provisioning hot and cold storage normally requires manual intervention and careful capacity planning. But with cloud object storage, the database can automatically handle tiering, shuffling data between the object store and the high-speed cache based on demand. The availability and infinite capacity of the object store make planning unnecessary.

Example: TiDB X

Now let’s see how these capabilities translate into a real-world design. PingCAP uses cloud object storage as the foundation for TiDB X, the latest version of our popular open source distributed SQL database, TiDB.

👁 A chart showing TiDB X’s architecture with built-in object storage.

TiDB X’s architecture with built-in object storage.

As shown in the diagram above, TiDB X fully separates compute and storage, using S3 for the shared backend. Compute nodes scale independently up and down. Fast local caches and Raft ensure consistency and low-latency access for hot data. Instead of keeping the entire data store close by, TiDB X keeps only the most active data near compute. TiDB X monitors query patterns, latency targets, and data characteristics, then reshapes itself in response to demand.

Its object storage-based architecture streamlines recovery and backup processes. By using S3 for primary data persistence, TiDB X reduces the overhead of traditional backup maintenance, enabling significantly faster completion times. This design also mitigates the impact of node failures: since local state functions primarily serve as a cache for durable, replicated storage, a failed instance can be replaced by retrieving its required state directly from object storage to resume operations.

From an operational perspective, cloud object storage makes TiDB X both highly adaptable and extremely cost-efficient. Its autoscaler responds not just to preset infrastructure thresholds, but to contextual signals like query patterns, latency targets, and data types. This enables it to reshape its resources in real time to address different tasks.

In sum, by building atop AWS’s high-performance object data store, TiDB X demonstrates how a cloud database can achieve elasticity, performance, and simplicity without sacrificing consistency or scale.

S3 as the communication fabric

Keeping large relational data stores close to compute resources has always been a compromise. It was an expensive solution to a problem created by the limitations of traditional networking.

With architectures like TiDB, we see that the sheer power and scale of services like S3 have made the old workarounds unnecessary. They’ve rendered traditional architectures increasingly obsolete. More than that, they’ve enabled practices, such as ephemeral compute, suited to a world where users are more likely to be AI agents than humans.

As AI reshapes business organizations and best practices, the database itself is changing form. In large part, it’s services like S3 that are making that shift possible. By making data placeless, ubiquitous, and effortlessly accessible, cloud object storage is overturning the assumptions that once guided database design. The result will be databases that are more flexible and resilient  — ones that are simpler to manage and scale almost effortlessly.

TiDB, powered by PingCAP, unlocks limitless scale for data-intensive businesses. Our advanced distributed SQL database enables leading enterprises and digital native companies to build petabyte-grade clusters while managing millions of tables, frequent schema changes, and zero-downtime scaling.
Learn More
The latest from PingCAP
Hear more from our sponsor
TRENDING STORIES
Max Liu is the co-founder and CEO of TiDB, powered by PingCAP. He has more than 10 years of experience in system infrastructure and software technologies. He is the co-author of the following open source projects: TiDB, TiKV and Codis,...
Read more from Max Liu
PingCAP sponsored this post.
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.