VOOZH about

URL: https://thenewstack.io/openais-gpt-5-3-codex-helped-build-itself/

⇱ OpenAI's GPT-5.3-Codex helped build itself - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-02-05 10:58:56
OpenAI's GPT-5.3-Codex helped build itself
AI / AI Agents

OpenAI’s GPT-5.3-Codex helped build itself

GPT-5.3-Codex helped debug its own training and is OpenAI's first model designated "high-capability" for cybersecurity tasks.
Feb 5th, 2026 10:58am by Frederic Lardinois
👁 Featued image for: OpenAI’s GPT-5.3-Codex helped build itself
Freatured image credit: The New Stack.

OpenAI’s new GPT-5.3-Codex model is the company’s most capable agentic coding model yet. However, unlike previous Codex models, it focuses not only on coding.

With this model, which OpenAI made available on Thursday to its paid users across Codex-powered tools and APIs, the company has set a new goal: to create an agent that can write code and do everything else developers or any professional would do on a computer.

OpenAI says that “the model advances both the frontier coding performance of GPT-5.2-Codex and the reasoning and professional knowledge capabilities of GPT-5.2, together in one model, which is also 25% faster.”

There’s another aspect in which building this model differed from previous efforts: according to OpenAI, the model was “instrumental in creating itself.” The team says it used an early version of the model to debug training runs, manage the model’s deployment, and analyze test results and evaluations.

According to OpenAI, building this model was especially challenging because it combines these coding and general agentic capabilities, which made training and deployment difficult.

That’s where Codex itself came in. “The engineering team used Codex to optimize and adapt the harness for GPT-5.3-Codex,” the team writes in the announcement. “When we started seeing strange edge cases impacting users, team members used Codex to identify context rendering bugs and root cause low cache hit rates. GPT-5.3-Codex is continuing to help the team throughout the launch by dynamically scaling GPU clusters to adjust to traffic surges and keeping latency stable.”

That’s only a first step in having these models build and improve themselves. Yet, just as we’ve seen a speed-up in feature launches for agentic coding tools because developers now use those same tools to make them, we’ll likely see more of this from models from frontier labs.

GPT-5.3-Codex benchmarks

Unsurprisingly, the new model performs well on coding benchmarks, but in today’s announcement, OpenAI rightly downplays them and focuses more on the practical advances these improvements bring. The company notes that the new model can now build complex games and apps from scratch over the course of days. OpenAI also stresses that the model better understands the user’s intent and chooses more sensible defaults when there is ambiguity.

OpenAI is also emphasizing the new model’s ability to handle cybersecurity tasks, in part because it’s the first model the company has directly trained to identify vulnerabilities. But that also means it should be pretty good at exploiting security issues, something OpenAI acknowledges: “While we don’t have definitive evidence it can automate cyber attacks end-to-end, we’re taking a precautionary approach and deploying our most comprehensive cybersecurity safety stack to date. Our mitigations include safety training, automated monitoring, trusted access for advanced capabilities, and enforcement pipelines including threat intelligence.”
👁 Image

To compare models, benchmarks still matter, though, and the new model does quite well there. Across the board, it delivers leading scores, including on TerminalBench 2.0, which tests the models’ agentic coding skills, as well as SWE-bench Verified (which tests the models’ Python skills) and SWE-Bench Pro (which tests across four programming languages).

With scores of 77.3% on TerminalBench 2.0, it easily beats Anthropic’s just-launched Opus 4.6 model.

👁 Image

But since OpenAI specifically notes that this model isn’t just about coding, it’s especially noteworthy that on the OSWorld-Verified benchmark, which tests agents on open-ended tasks in real computer environments, it scores  64.7% here.

In its announcement, the OpenAI team argues that, “together, these results across coding, frontend, and computer-use and real-world tasks show that GPT‑5.3-Codex isn’t just better at individual tasks, but marks a step change toward a single, general-purpose agent that can reason, build, and execute across the full spectrum of real-world technical work.”

TRENDING STORIES
Before joining The New Stack as its senior editor for AI, Frederic was the enterprise editor at TechCrunch, where he covered everything from the rise of the cloud and the earliest days of Kubernetes to the advent of quantum computing....
Read more from Frederic Lardinois
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI, Anthropic.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.