VOOZH about

URL: https://thenewstack.io/confronting-ais-next-big-challenge-inference-compute/

⇱ Confronting AI’s Next Big Challenge: Inference Compute - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-08-06 06:00:19
Confronting AI’s Next Big Challenge: Inference Compute
podcast,video,
AI / Hardware / Operations

Confronting AI’s Next Big Challenge: Inference Compute

Inference computing will become a very heterogeneous space, with solutions tailored to different use cases — and agentic AI will turbocharge demand, said Sid Sheth of d-matrix in this episode of The New Stack Makers.
Aug 6th, 2025 6:00am by Heather Joslyn
👁 Featued image for: Confronting AI’s Next Big Challenge: Inference Compute

The computing demands of training AI models may get a lot of the attention from the tech industry — just ask NVIDIA’s shareholders. But the needs posed by AI inference may leave today’s cutting-edge GPUs in the dust.

“If you look at the world of pretraining, it has been kind of monolithic,” said Sid Sheth, founder and CEO of d-Matrix, in this episode of The New Stack Makers. “GPUs have dominated. Specifically, GPUs from one company have dominated the landscape. But as you enter the world of inference, it is not really a one-size-fits-all.

“There are too many different workloads, each workload with very different requirements. … you might be a user who cares all about cost. There might be some user who cares all about interactivity, where I really want to be able to interact with the model. There might be users who don’t really care about either, and just care about throughput.”

With this variety of user profiles, Sheth said, “it’s not like the same hardware or the same computing infrastructure can serve all these needs simultaneously.

He summed up, “The world of inference is going to be truly heterogeneous, where you will have dedicated, best-in-class hardware to serve specific needs from specific users.”

Keeping Memory and Compute Close

One of the biggest challenges with inference compute, Sheth said, is keeping the memory (which holds the data) as close as possible to the compute. “The reason for that is you are kind of making a lot of trips to memory. When you talk about generative AI workloads, you’re generating content that relies on caching data. So all the previous data gets cached. And every time you generate a new token you are, essentially, tapping into that cache data to figure out what the next best token needs to be.”

With AI agents, “that problem gets multiplied — 10x or 100x. So, the memory footprint becomes very, very very important, and keeping that memory close to compute becomes very important. The less distance the data has to travel to get to the compute, the faster your inference is going to be. And the more optimal your inference is going to be, the lower cost your inference is going to be.”

In this episode, Sheth discussed and showcased d-Matrix’s AI inference platform, Corsair, which takes an innovative approach to architecting and locating memory and compute. d-Matrix builds specialized chiplets, he said, “and then we co-package these chiplets into a fabric, and that gives us that elasticity and modularity in the platform. We can always scale it up or scale it down, depending on the customer’s requirements.”

In Corsair, memory and compute are layered directly on top of each other — like a stack of pancakes — cutting the travel distance down significantly. “The data is sitting inside this memory, and it’s raining down into the compute, which is sitting right underneath it,” Sheth said. “The surface area is much greater when you package things this way. Obviously, there’s more surface area between the memory and the compute, and a lot more data can drop down into the compute.”

Check out the full episode to learn more about inference, why it needs different infrastructure than AI model training, and what Sheth sees ahead for AI infra more generally.

TRENDING STORIES
Heather Joslyn is the former editor-in-chief of The New Stack. She previously worked as editor-in-chief of Container Solutions, a Cloud Native consulting company, and as an editor/reporter at The Chronicle of Philanthropy and the Baltimore City Paper.
Read more from Heather Joslyn
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.