VOOZH about

URL: https://thenewstack.io/why-latency-is-quietly-breaking-enterprise-ai-at-scale/

⇱ Why Latency Is Quietly Breaking Enterprise AI at Scale - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-08-06 11:00:10
Why Latency Is Quietly Breaking Enterprise AI at Scale
sponsor-yugabyte,sponsored-post-contributed,
AI / AI Operations / Databases

Why Latency Is Quietly Breaking Enterprise AI at Scale

Enterprise AI systems are failing due to database latency issues. Learn how geo-distributed architectures and optimized data layers solve AI performance problems at scale.
Aug 6th, 2025 11:00am by Andrew Marshall
👁 Featued image for: Why Latency Is Quietly Breaking Enterprise AI at Scale
Featured image by vilax on Shutterstock.
Yugabyte sponsored this post.

As enterprises invest an ever-increasing percentage of their tech budget on AI, they expect it to deliver groundbreaking efficiencies and more informed decision-making. But there’s a problem many don’t see coming: Latency.

For AI systems to be beneficial, they must be able to access and process data quickly, whether they’re generating content, classifying data or making real-time decisions. Every millisecond counts. The root cause of lag in many AI pipelines isn’t the model or the compute layer; it’s the database.

The AI-Latency Connection: Why Speed Matters

To work effectively, AI requires two critical phases: training and inference. Both are heavily dependent on fast, reliable access to large data volumes. When an AI model makes decisions or generates outputs in real time during inference, latency becomes especially important. Any delay in fetching the necessary data can slow down results, degrade user experience or worse, cause outright system failures.

Think of a fraud detection system scanning a transaction or an AI assistant generating a response. If the underlying database can’t keep up, the AI model stalls. Latency isn’t just an inconvenience; it undermines the entire value proposition of AI.

As these systems scale, the problem compounds. More users, more data and more regions introduce more potential points of failure unless the data infrastructure is built for low-latency, distributed access.

When Latency Breaks AI

Recent outages in generative AI platforms are a real-world example showing how seemingly minor delays in database responsiveness can lead to massive failures. In another domain, autonomous vehicles depend on real-time decisions backed by massive AI models. Even minor delays while accessing sensor data or environment maps can impact safe navigation and result in delays or accidents.

Low latency doesn’t just enhance performance. It also ensures trust, safety and business continuity.

Making the Most of Your Data Layer

It’s easy to overlook the database when talking about AI. But that’s a mistake. If the model is the brain, the database is the circulatory system. The brain will stop functioning if data isn’t moving quickly enough.

This means that a robust architecture is required to secure fast and reliable access to data, regardless of where users, applications or models are located. This is where geo-distributed databases become vital.

Building for AI Resilience: Geo-Distributed Architectures

Geo-distribution reduces the distance between your AI models and your data physically and in the network. This involves replicating and locating data closer to where it’s needed. The result is consistently low-latency access, even across regions and availability zones.

Here are six deployment topologies that support low-latency, resilient AI operations, plus the potential tradeoffs:

1. Single-Region Multizone Cluster

A single-region multizone cluster is made up of three or more nodes that work together and share data across zones within the same region. While this setup offers advantages, it also comes with drawbacks like increased read and write latency for applications accessing data from outside the region, and limited protection against region-wide outages caused by weather-related events and natural disasters. This configuration is best suited for situations where you need strong consistency, high availability and resilience within a single region, especially if your users or applications are located nearby and can benefit from low-latency access.

2. Synchronous Replication

Clusters using synchronous replication provide high availability and resilience, ensuring zero data loss (RPO) and minimal recovery time (RTO). However, deploying across multiple regions can increase write latency, and follower reads, and may sacrifice consistency to achieve lower latency.

3. Unidirectional Asynchronous Replication

Multi-region clusters using unidirectional asynchronous replication provide disaster recovery with non-zero recovery point objective (RPO) and recovery time objective (RTO). They offer strong consistency and low-latency reads and writes within the source cluster region, while the sink cluster maintains eventual (timeline) consistency. However, because the sink cluster is read-only and doesn’t handle writes, clients located outside the source region may experience high latency. Since xCluster replication bypasses the query layer for replicated data, database triggers won’t execute, which can cause unpredictable behavior.

4. Bidirectional Asynchronous Replication

Bidirectional asynchronous replication aids in disaster recovery with non-zero RPO and RTO, delivering strong consistency in the write-handling cluster and eventual consistency in the remote cluster, along with low-latency reads and writes. However, it comes with tradeoffs: Database triggers won’t fire due to query layer bypass; unique constraints aren’t enforced since replication occurs at the write-ahead logging (WAL) level, risking data inconsistencies; and auto-increment IDs can cause conflicts in active-active setups, so using unique user IDs (UUIDs) is recommended instead.

5. Geo-Partitioning With Data Pinning

Geo-partitioning with data pinning is best for use cases requiring data to reside in specific geographic regions because it delivers regulatory compliance, strong consistency and low-latency access within that region. It’s suited for logically partitioned data sets, such as country-specific user accounts or localized product catalogs. It’s important to consider that cross-region latency may occur when users access their data outside the pinned region.

6. Read Replicas

Read replicas offer fast, timeline-consistent reads and low-latency writes to the primary cluster, maintaining overall stronger consistency. However, replicas don’t improve resilience because they’re tied to the primary and cannot handle writes. Write latency may remain high for remote clients, even if a nearby read replica exists.

Latency isn’t a bug, but it’s often the result of architectural decisions that have been made too early and revisited too late. For AI to succeed at scale, latency must be considered at the database layer and designated a primary design concern.

Enterprises that invest in a low-latency, geo-aware data infrastructure will not only be able to keep their AI systems running but also ensure that they’re faster, smarter and truly transformative.

Check out YugabyteDB – the AI-native, PostgreSQL-compatible, distributed database for modern apps. Resilient, scalable, flexible, and 100% open source.
Learn More
The latest from Yugabyte
TRENDING STORIES
Andrew Marshall is the VP of Product Marketing for Yugabyte, maker of YugabyteDB. His passion for technology and developer tools spans 25 years, encompassing stops at companies such as AWS, Microsoft, PagerDuty, and New Relic.
Read more from Andrew Marshall
Yugabyte sponsored this post.
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
👁 Image
Discover YugabyteDB: the ultra-resilient, globally distributed, elastically scalable, open source database.