VOOZH about

URL: https://thenewstack.io/openais-new-codex-spark-is-optimized-for-speed/

⇱ OpenAI's new Codex Spark model is built for speed - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-02-12 10:00:47
OpenAI's new Codex Spark model is built for speed
AI / AI Models

OpenAI’s new Codex Spark model is built for speed

Feb 12th, 2026 10:00am by Frederic Lardinois
👁 Featued image for: OpenAI’s new Codex Spark model is built for speed

OpenAI’s new GPT-5.3-Codex-Spark model is a bit of a departure for the company’s family of Codex software development models: its focus is squarely on reducing latency.

Powered by Cerebras’ 125-petaflop Wafer Scale Engine 3, the Codex Spark model is meant for use cases where latency matters as much — or more — than intelligence. And fast it is: Codex Spark can deliver more than 1,000 tokens per second.

When OpenAI launched GPT-5.3-Codex only a few days ago, it highlighted how the team was able to bring down latency by 25 percent. However, whereas that model excels at long-running coding and agentic tasks, where latency is less important, Codex Spark is designed for rapid prototyping and obtaining answers quickly.

The core idea here is to have two models that are complementary: a fast one for real-time collaboration and a slower one for long-running tasks where deeper reasoning is called for.

OpenAI notes that its new model is best suited for making small, very targeted edits to code. An additional benefit of the speed, though, is that the model can be easily interrupted and redirected, helping developers iterate quickly.

However, since it’s optimized for this use case, it will also only feature a 128,000-token context window at launch. It is also text-only. Over time, the team plans to add additional capabilities to this faster model family, including larger models, longer context lengths, and multimodal inputs.

Benchmarks

The company admits that the new model will underperform GPT-5.3-Codex, “but can accomplish the tasks in a fraction of the time.”

On the standard SWE-Bench Pro benchmark, Codex Spark indeed scores significantly lower than GPT-5.3-Codex, but it does get to usable results much faster, which may just be enough for many use cases.

👁 Image

Credit: OpenAI.

On Terminal-Bench 2.0, which looks at how well the model performs at agentic workflows in the terminal, it also scores significantly lower than the larger GPT-5.3-Codex (58.4% vs 77.3%).

Availability

The GPT-5.3-Codex-Spark tier is now available as a research preview for ChatGPT Pro users in the CLI, VS Code and the Codex app (which has now been downloaded more than 1 million times). A select number of OpenAI partners will also get early access to Codex Spark in the API.

OpenAI notes that capacity for the new Codex Spark model may be constrained, with slower access and temporary queuing. The model will have its own rate limits, and using it will not count toward the company’s regular rate limits.

Since it’s not available in the API yet, OpenAI has not published any pricing information.

Why OpenAI opted for Cerebras’ wafer-scale AI accelerators

Using different model tiers is not exactly a new idea, of course. Anthropic, with its three tiers of models (Haiku, Sonnet, and Opus), and others have long used a similar approach of offering models that are mostly differentiated by their intelligence, speed, and pricing. OpenAI itself has long offered nano versions of its models.

The major difference here is that OpenAI is also using a very different hardware platform for this new model.

It’s no coincidence that OpenAI chose to run this model on Cerebras’ hardware. Early in 2026, the two companies announced a multi-year partnership agreement that is reportedly worth up to $10 billion. Under this agreement, Cerebras will build and host data centers that will deliver 750 megawatts of capacity to run its wafer-scale chips for OpenAI.

Cerebras’ chips are gigantic when compared with most standard GPUs and AI accelerators. NVIDIA’s flagship Blackwell B200 accelerator features 208 billion transistors. Cerebras’ chip features four trillion, spread between almost 900,000 cores.

But it’s not just the pure computing power. At this point, the real bottleneck for inference isn’t compute but memory bandwidth. Cerebras promises to do away with this bottleneck by using on-chip memory and up to 27 petabytes per second of internal bandwidth.

In its announcement, OpenAI stresses that GPUs remain foundational for its training and inference pipelines. But the company also notes that “Cerebras complements that foundation by excelling at workflows that demand extremely low latency, tightening the end-to-end loop so Codex feels more responsive as you iterate.”

As Sean Lie, the CTO and co-founder of Cerebras, puts it: “What excites us most about GPT-5.3-Codex Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible — new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning.”

TRENDING STORIES
Before joining The New Stack as its senior editor for AI, Frederic was the enterprise editor at TechCrunch, where he covered everything from the rise of the cloud and the earliest days of Kubernetes to the advent of quantum computing....
Read more from Frederic Lardinois
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI, Anthropic.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.