I have a problem where a first impression of a tool really, really sticks around. If an app disappointed me massively the first time around, I struggle to convince myself it deserves another shot, even when everyone around me is saying otherwise. But if there's anything I've learned about the AI and tech industry, it's that tools tend to bounce back when you least expect it.

An example I like to give here is Google's AI efforts. They began with Bard, which was nothing but a disaster, and then rebuilt everything under Gemini. While they did that fairly publicly, they continued to work on extremely impressive tools within the Google Labs experimental playground, which paved the way for some genuinely excellent products like NotebookLM, Jules, Stitch, Learn Your Way, Antigravity, and others. While some of these are admittedly still in their experimental phase, some like NotebookLM made it out and turned out to be a breakout hit that nobody would have predicted from the company that completely botched Bard.

Comebacks happen extremely fast in this industry. Codex, it turns out, had one of its own, and I didn't realize until I gave it a genuine second try.

The limits alone make Codex worth a second look

Your wallet will thank you

I don't need to tell you this, but AI tools use a crazy amount of compute under the hood. Every prompt, every code generation, every iteration you ask it to do, each task you make it do, eats into a budget. And unfortunately, that budget gets passed down to you in the form of usage limits. Every major AI company is constantly balancing the same question: how much can they realistically allow users to do before the math stops working with them and begins working against them? And given how pricey AI tools have gotten now (which links back to more demand and higher compute costs), you need to ensure that you're getting the most out of every dollar you spend.

Right now, Codex is making that easier than most. Claude's brutal limits are well-documented, and I don't really need to reiterate how quickly you can burn through your five-hour or weekly allowance on even a moderately complex task. It's been a sore point in the community for months, and Anthropic knows it well. While the company did announce double limits for Claude Code across all plans during their developer conference, I'm not going to comment on that right now since I haven't gone hands on with it just yet.

What I will point out, though, is that Codex gives you noticeably more room to work with — and it has for a while. In fact, I'm on the $20 Codex tier and the $100 Claude Max tier, and I've found that the former still gives me more usable runway on a day-to-day basis. Five times the price for less room to work. That says everything you need to know about where the value is right now.

Codex is superior at reasoning and more autonomous

Codex doesn't need you as much as you think

A few weeks ago, I wrote an article comparing Claude Code and Codex and I mentioned that I wasn't the biggest fan of Codex going off and doing its own thing. I wanted more back-and-forth, more questions upfront, and just wanted to be more involved in the process. Claude Code asked me all the questions it had before beginning to build, whereas Codex often just went straight to work. That was one of my main criticisms about the tool back then. However, after spending more time with Codex, I've started to appreciate its approach.

What I've come to realize is that the right approach depends entirely on what you're doing. If I'm starting something from scratch with a vague idea, I still want Claude Code's upfront questions. But if I know what I want and just need it built, Codex's bias toward action saves me real time. The other thing I didn't account for in my earlier article entirely was that asking lots of questions upfront isn't really a sign of a smarter tool.

Codex makes assumptions, yes, but they're usually reasonable assumptions. And when it gets something wrong, iterating on the result is often faster than the Q&A session would have been in the first place. Codex also seems to understand context a lot better and suggests complete implementations that work out the gate.

Codex feels like it finally knows what it wants to be

This isn't just a coding tool anymore

All the major AI labs have been shipping features left and right, and while Anthropic has seemingly been coming at every possible direction with new additions to the Claude ecosystem, OpenAI has taken a more focused approach with Codex. Instead of trying to be everything at once, they've been steadily layering capabilities within Codex. The best part is that they aren't just making the CLI better — they're improving the desktop app as a whole. For instance, the tool recently got computer use, which allows it to navigate any app on your Mac, parallel agents mean you can hand off multiple tasks at once without waiting around, and there's even a built-in browser that lets you open pages, annotate what you're seeing, and feed that context directly back to Codex.

The company also added a /goal slash command a few weeks ago, which essentially turns Codex into a long-horizon autonomous agent. It lets you define a persistent objective that Codex works toward across multiple turns, sometimes for hours, without you babysitting a single step. It plans, executes, verifies, and course-corrects on its own. It's the kind of feature that makes you realize how much time you've been spending micromanaging your coding agent. Codex is also testing something called Chronicle, which I found super impressive by the description alone. It's currently in opt-in research preview for Pro subscribers on macOS only, so I haven't had the chance to try it yet. The feature sounds somewhat like Microsoft's Recall, but built more around making your life easier than simply helping you "recall" stuff.

It watches your screen in the background and turns that context into persistent memories so Codex knows what you've been working on without you having to explain it every session. These get saved as local markdown files that Codex can reference later. The screen captures themselves are deleted after six hours, but the memories they create persist until you manually remove them. Beyond the obvious privacy concerns this will inevitably come with, if it works as well as it sounds, it could eliminate one of the most tedious parts of working with any AI coding agent: the constant re-explaining.

GPT-5.5 is what makes all of this actually work

Finally, while you'll always hear people praise Claude's models for coding-related work, it turns out that it turns out that the model powering Codex right now is genuinely impressive. GPT-5.5 scored 82.7% on Terminal-Bench 2.0, blowing past Opus 4.7's 69.4%. Claude still leads on SWE-Bench Pro, so it's not a clean sweep, but the overall picture has shifted. I've been using it extensively the past few weeks, and it's genuinely making me rethink how I pick my tools. I used to choose based on brand loyalty and first impressions. Now I'm choosing based on what actually performs.

That said, I do think Claude Code still dominates when it comes to UI and frontend work. The code it produces for anything visual just tends to be cleaner, more polished, and closer to what you'd actually want to ship. GPT-5.5 is strong across the board, but if I'm building something people are going to look at, Claude is still my first call.