Though every AI lab out there is practically unstoppable right now, launching new capabilities at a pace that's genuinely hard to keep up with, Anthropic's been shipping updates at a rate that makes the rest of the industry feel like they're standing still. Just in the past few days alone, the company launched Claude Design, routines in Claude Code, Claude for Word, Managed Agents, released Cowork generally and the list goes on.
For many, the most anticipated release in the entire wave was Anthropic's newest flagship model, Opus 4.7. Now, people expressing their disappointment when a new model drops is nothing new. Every release cycle comes with enraged users. You'll even find people getting genuinely emotional about it, and mourning the old model like they lost a friend. But after spending a week with Opus 4.7, I think the complaints this time actually hold weight.
Want to stay in the loop with the latest in AI? The XDA AI Insider newsletter drops weekly with deep dives, tool recommendations, and hands-on coverage you won't find anywhere else on the site. Subscribe by modifying your newsletter preferences!
Opus 4.7 is Anthropic's most capable model yet
Classic marketing speak, or is it?
On the 16th of April, Anthropic announced that Claude Opus 4.7 is generally available. The AI lab highlighted that the model is a "notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks." Despite being less capable than Mythos (the model Anthropic decided is way too dangerous to release publicly right now), Opus 4.7 is still positioned as the company's most powerful model you can actually get your hands on.
Its strengths are in software engineering, vision, and real-world professional tasks. Benchmark-wise, Opus 4.7 beats Opus 4.6 on 12 of 14 reported benchmarks. Finally, the model also shows significant improvement in agentic safety, meaning it's a lot better at recognizing and refusing prompt injection attacks when you're using it as an agent.
Opus 4.7 finally does what you tell it to
It actually puts your needs first
Before I go any further into this article, I do want to point out that I'm not going to focus on benchmarks in this article or how much better the new model is at the tasks Anthropic claims it excels at. I want to focus on what the changes actually mean for the average user who wants to use the model day-to-day and whether Opus 4.7 genuinely feels like a better experience than what came before it.
One of the biggest problems with AI models from the very beginning is that they have a tendency to ignore your instructions and do what they think you want instead. Sure, what it does might be better than what you originally asked for, but that's not really the point. If I tell a model to do something specific, and it decides it knows better, that's not helpful at all. Opus 4.7 is noticeably better about this. This is something Anthropic itself pointed out in the announcement post, noting that 4.7 takes instructions literally whereas previous models "interpreted instructions loosely or skipped parts entirely," and that users should re-tune their prompts and harnesses accordingly.
For instance, if you tell the model to modify a specific line within your code, Opus 4.7 seems to do exactly that and nothing more. It doesn't go ahead and refactor surrounding code it wasn't asked to touch, and doesn't second-guess your choices with its own suggestions. I can see this being a huge win for a lot of people, especially developers who are tired of fighting with LLMs every time they rewrite half their file when all they asked for was a one-line change.
Unfortunately, the better listening comes at a cost
It follows orders so well it forgot to think
At first, I thought Opus 4.7 taking instructions more literally and asking for your input more than making assumptions itself was a straight upgrade. That was until I realized it seems to apply to everything most of the time, including when you actually want the model to just...figure it out.
For instance, if you don't explicitly tell the model to search the web for something, there's a good chance it just won't (even when it clearly needs to). I asked Opus 4.7 what OpenClaw is. Pretty basic question, and one that can be answered with a single Google search easily. Interestingly, the response I got was as follows:
OpenClaw is an open-source reimplementation of Captain Claw, a 2D platformer originally released by Monolith Productions in 1997. The original game stars Captain Nathaniel J. Claw, a swashbuckling pirate cat imprisoned by the evil Cocker Spaniards, who escapes and hunts for the nine pieces of the Amulet of Nine Lives.
I switched to Opus 4.6 on a new chat, and though its first resort wasn't searching the web, it simply told me it's not sure exactly which OpenClaw I'm referring to, and asked me for more context. Now, the above is just one very simple example. In a few days of testing, I've noticed Opus 4.7 hallucinates far more than previous models, and it all links back to this point. I find this interesting, though.
Anthropic claimed in the announcement that Opus 4.7 is measurably more honest than its predecessor, and scored a 91.7% on the MASK honesty benchmark, whereas Opus 4.6 scored 90.3%. This makes it very clear that honesty on a benchmark and honesty in practice are two very different things. A model that confidently makes something up because it never thought to search the web first isn't really "lying." Instead, it just doesn't know what it doesn't know.
Opus 4.7 is also somehow... lazier?
It does the bare minimum
This follows the previous points pretty closely. Given that Opus 4.7 is so fixated on doing exactly what you tell it and nothing more, it often feels like the model just isn't trying anymore. It'll do the bare minimum of what you asked and give you a surface-level answer. For instance, I was setting up a Claude Project for my expense tracking workflow, and I decided to ask Claude if there's any way to connect it to Excel.
Just for reference, there is literally a Claude for Excel add-in that Anthropic shipped a few months ago. Instead of telling me about this add-in, Opus 4.7 simply checked the tools I had Claude connected to and said that it doesn't have an Excel connector. When I asked it about Claude for Excel, that's when it searched the web to find "accurate info rather than guess" and said: Yes, that exists — I was wrong to leave it out. Let me give you the straight facts.
To be clear, this isn't an isolated complaint or something only I've noticed. For instance, this Reddit user was talking to Opus 4.7 about some changes they were making to their home network.
When they asked the model if a setup was possible, Opus 4.7 said it wasn't. However, when the OP searched it up, the first result was a post explaining how to set it up. They then asked Claude why it didn't find the information, and it responded: Fair criticism. The honest answer: I didn't search when I should have.
The OP posted a bunch of screenshots of the same conversation where Opus 4.7 kept apologizing for being wrong and saying it should have verified by searching the web, only to make the exact same mistake on the very next question. To top it all off, the OP asked the model to list down the mistakes, hallucinations, and wrong assumptions it made throughout the conversation, and it listed down seven!
To top it off, it's also burning through your tokens
This shouldn't surprise anyone, but Opus 4.7 also chews through significantly more tokens than its predecessor. This is all thanks to its updated tokenizer, which the company says can map the same text to 1.0x to 1.35x more tokens than Opus 4.6.
In other words, the exact same prompt you were running before could now cost you up to 35% more. And that's before you factor in the model needing a lot more hand-holding and extra back-and-forth because it won't search or go deeper unless you explicitly tell it to.
