VOOZH about

URL: https://thenewstack.io/inception-labs-mercury-2-diffusion/

⇱ Inception says its diffusion LLM is 10x faster than Claude, ChatGPT, Gemini - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-03-02 13:30:30
Inception says its diffusion LLM is 10x faster than Claude, ChatGPT, Gemini
podcast,video,
AI / AI Engineering / AI Models

Inception says its diffusion LLM is 10x faster than Claude, ChatGPT, Gemini

Move over, "fancy autocomplete." Inception CEO Stefano Ermon joins The New Stack to explain how Mercury 2 uses diffusion to deliver LLM speeds 10x faster.
Mar 2nd, 2026 1:30pm by TNS Staff
👁 Featued image for: Inception says its diffusion LLM is 10x faster than Claude, ChatGPT, Gemini

Last week, Inception launched Mercury 2, a large language model based on diffusion rather than the autoregressive approach used by every major AI lab. And on this week’s episode of The New Stack Agents, Inception CEO and co-founder Stefano Ermon explains how the diffusion model of generative AI could reshape how we build AI applications.

But first, some background: Traditional LLMs generate text one token at a time, left to right, a system that Ermon calls “fancy autocomplete.” Meanwhile, diffusion models work differently: They start with a rough answer and refine it in parallel, much like image models like Stable Diffusion crystallize images from noise. The result is a model that produces over 1,000 tokens per second — five to ten times faster than speed-optimized models from OpenAI, Anthropic, and Google, according to Inception’s own testing.

“What we’re seeing is that our Mercury 2 model, which is a reasoning model, is actually able to match the quality of these speed optimized models from [frontier labs OpenAI, Anthropic, Meta, and Google], while being five to 10x faster in terms of, like, the end to end latency, how long you need to wait before it gives you an answer,” Ermon tells TNS Senior Editor for AI Frederic Lardinois.

Autoregressive models are slower because they move data through memory instead of doing math. Diffusion models focus on parallel computation, which is what GPUs were built for. And GPU giant Nvidia, an investor in Inception, is helping optimize the serving engine, Ermon says.

Ermon, who pioneered diffusion models for images at Stanford and published the foundational text diffusion paper that won Best Paper at ICML 2024, is candid about the trade-offs: Mercury 2 matches the quality of Claude Haiku and Google Flash-class models, not Claude Opus or OpenAI GPT-4. But he argues the economics will win out as models scale. Reinforcement learning, the technique behind today’s reasoning models, is also naturally faster on diffusion architectures since its bottleneck is inference.

Inception is the only company shipping a production diffusion LLM — Google’s text diffusion model is still “experimental.” Mercury 2 is now available via an OpenAI-compatible API, with AWS Bedrock integration coming soon.

Listen to the full conversation on The New Stack Agents.

TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Anthropic, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.