VOOZH about

URL: https://thenewstack.io/future-proof-ai-infrastructure/

⇱ The infrastructure lock-in costing AI companies hundreds of millions - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-06-30 15:04:40
The infrastructure lock-in costing AI companies hundreds of millions
AI Infrastructure / AI Strategy / Hardware

The infrastructure lock-in costing AI companies hundreds of millions

AI is evolving faster than the hardware beneath it. Here's why Nvidia, AMD and the hyperscalers are rethinking AI infrastructure to avoid costly lock-in.
Jun 30th, 2026 3:04pm by Amanda Caswell
👁 Featued image for: The infrastructure lock-in costing AI companies hundreds of millions
Egor Komarov for Unsplash

For two years, the AI infrastructure race has been dominated by one question: Who has the fastest GPU? Jim Keller thinks that’s becoming the wrong question.

In a recent interview with EE Times, the Tenstorrent CEO argues that the riskiest move an organization can make right now is optimizing its AI infrastructure for the models it’s running today. It’s not because those models are bad — but because they won’t be the models it’s running in 18 months. Keller invoked Rent’s Rule and Amdahl’s Law to argue that memory, networking, and system-level balance now matter more than peak floating-point performance.

Not because those models are bad — but because they won’t be the models it’s running in 18 months.

It sounds like the start of Keller’s product pitch, but there’s real weight behind it, because AI has evolved faster than the infrastructure underneath it. And the companies that spent hundreds of millions building around one generation of models are now staring down the cost of doing it all over again.

That fear is called lock-in, and it’s reshaping how the biggest players in AI think about hardware.

Workloads outgrew the GPU

In 2023 and 2024, AI infrastructure was a relatively simple procurement problem: Train large language models, serve them to users, and buy as many GPUs as Nvidia can ship. The workloads were predictable and GPUs handled them well.

Then AI outgrew the infrastructure it had been built for.

Reasoning models spend more time working through problems instead of jumping straight to an answer. Agents bounce between APIs, databases and code before completing a task. Multimodal models mix text with images, audio and video. None of those workloads stress hardware in quite the same way — and that’s forcing infrastructure teams to rethink assumptions that made perfect sense just two years ago.

AI outgrew the infrastructure it had been built for.

No single chip architecture handles all of that equally well. And the organizations building AI infrastructure are starting to realize that the question isn’t just which accelerator is fastest — it’s how do we build systems that won’t need to be torn apart every time AI takes another leap?

Nvidia is already selling one answer

Look at what Jensen Huang has been talking about, and it’s not GPUs anymore.

At GTC 2026, Nvidia unveiled the Vera Rubin platform — seven chips designed to operate as a single system: the Rubin GPU, Vera CPU, NVLink 6 switch, ConnectX-9 networking, BlueField-4 DPU, and more. The Vera CPU exists

for the CPU-intensive work of agentic AI — tool calls, code execution, orchestration. Nvidia calls these deployments “AI factories,” and the language is deliberate. They’re selling complete infrastructure, not individual accelerators.

When the company with 70% market share stops leading with GPU benchmarks and starts talking about system-level co-design, it tells you where the center of gravity in this market is moving.

That reframing matters. When the company with 70% market share stops leading with GPU benchmarks and starts talking about system-level co-design, it tells you where the center of gravity in this market is moving. Compute still matters. But Nvidia is conceding — through its product architecture if not its marketing — that raw accelerator performance alone won’t be enough for what’s coming.

Hyperscalers design their own silicon

AMD sees the same problem, even if it’s taking a different route. Helios brings together CPUs, GPUs and networking into one rack-scale platform, reflecting a broader shift away from treating the GPU as the center of the universe. The pitch isn’t “our accelerator is faster.” It’s that the infrastructure surrounding the chip increasingly matters just as much as the chip itself.

The hyperscalers have been making this argument with their wallets for even longer. Google has spent a decade co-designing its TPU silicon, interconnects and software framework — its seventh-generation Ironwood chip is now generally available — giving it unusual control over the full stack. Amazon went the opposite direction, building separate chips for separate jobs: Trainium for training, Inferentia for inference, with Trainium3 now in production and serving customers like Anthropic. Microsoft’s Maia 200 targets inference costs, while the company simultaneously deploys Nvidia’s Vera Rubin NVL72 for training and experimentation — arguably the most pragmatic dual-track strategy in the market.

Behind all of them sits Broadcom, whose AI semiconductor revenue recently crossed $10 billion in a single quarter, driven by staggering demand for custom accelerators and data center switches. Broadcom designs custom accelerators for Google, Meta and others while supplying the Tomahawk and Jericho switch silicon that connects those accelerators at data center scale. Custom ASIC shipments are projected to grow roughly 45% year over year in 2026 — triple the growth rate of merchant GPUs.

Then there are the companies that decided building a better GPU wasn’t the answer.

Cerebras questioned the need for thousands of interconnected chips, opting instead for a wafer-scale processor that keeps far more of the workload on a single piece of silicon. Groq took the opposite approach, optimizing almost entirely for inference. SambaNova focused on enterprise AI, building systems where efficiently serving multiple models matters more than posting the fastest benchmark.

Adaptability beats raw speed

The first wave of generative AI rewarded whoever could buy the most compute. That made sense when most organizations were solving the same problem. Today, AI workloads are changing so quickly that infrastructure teams are starting to optimize for something different: adaptability.

Keller’s example illustrates the point. Tenstorrent’s BlackHole architecture uses standard Ethernet instead of proprietary interconnects, allowing its hardware to slot alongside existing GPU deployments rather than replacing them. Keller told EE Times that one customer used Tenstorrent’s Galaxy servers to increase token throughput on GPUs they already owned instead of rebuilding their infrastructure from scratch.

Whether Tenstorrent’s approach becomes the industry standard is almost beside the point.

The bigger idea is already spreading. Across the industry, companies are spending less time asking how to build the fastest AI hardware and more time asking how to build hardware that won’t have to be replaced every time AI takes another leap forward.

The question that matters now

No one knows what AI workloads will look like three or five years from now. That’s the problem.

Infrastructure refresh cycles are measured in years. AI models seem to reinvent themselves every few months. Building around today’s workloads is starting to look like a risky bet when tomorrow’s could demand something different.

Infrastructure refresh cycles are measured in years. AI models seem to reinvent themselves every few months.

Every company is responding in its own way. They have different strategies but are still asking the same question: How do you build infrastructure that outlasts the AI running on it?

That may prove to be a more important engineering challenge than building the next record-breaking accelerator. And it’s the challenge driving companies from Nvidia and AMD to Google, Amazon, Broadcom and Tenstorrent.

TRENDING STORIES
Amanda Caswell is an AI journalist, certified prompt engineer, and technology commentator whose work and expertise have been featured on Fox News and CBS News. She covers artificial intelligence, developer tools, foundation models, and emerging technologies, with a particular focus...
Read more from Amanda Caswell
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Anthropic.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.