VOOZH about

URL: https://thenewstack.io/claude-opus-47-flaky-performance/

⇱ AI shrinkflation: Why Anthropic's Claude Opus 4.7 may be less capable than the model it replaced - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-04-23 08:52:54
AI shrinkflation: Why Anthropic's Claude Opus 4.7 may be less capable than the model it replaced
AI Engineering / AI Models / Large Language Models / Operations

AI shrinkflation: Why Anthropic’s Claude Opus 4.7 may be less capable than the model it replaced

Claude Opus 4.7 users report self-contradicting responses and degraded performance, raising questions about AI model quality, safety tradeoffs, and shrinkflation.
Apr 23rd, 2026 8:52am by Adrian Bridgwater
👁 Featued image for: AI shrinkflation: Why Anthropic’s Claude Opus 4.7 may be less capable than the model it replaced
Erry S. Nugroho

A week is a long time in politics, but it’s even longer in AI developer power user circles. That’s the feeling many are voicing after Anthropic made Claude Opus 4.7 available last Wednesday, with its pledge to outperform its predecessor as an AI services model built to handle complex reasoning and nuanced analysis.

The fanfare from Anthropic promised an ability to handle “complex, long-running tasks with rigor and consistency” along with capabilities designed to pay precise attention to instructions. The model even devises ways to verify its own outputs before reporting back.

Literally, not the problem

As reported on The New Stack, we know that Opus 4.7’s more literal instruction-following means that some “prompts written for earlier models can sometimes now produce unexpected results”, meaning some Claude users may need to adjust their prompt-writing style. 

But that’s not what users are fired up about; the real beef is out there, and it doesn’t smell of roses.

Reddit user and Ph.D. students/JulioMcLaughlin2 explains in the subreddit r/artificial how they asked Claude 4.7 (with adaptive thinking turned on) to work through a detailed proof, and it just spirals with answers that read, “oh wait, that doesn’t work, let me try again” – five times in a single response.

“Yes, there’s a workaround to explicitly tell it to think before answering. But… why is that necessary? I’m paying $20/month. This is supposed to be a top-tier model. Instead, it burns through time, second-guesses itself mid-response, and often fails to land anywhere useful on problems I’m fairly sure 4.6 would have handled more coherently a month ago,” they lament.

The disquiet appears to be emanating from every channel. Opus4.7 user, AI blogger and Shakespearean sonnet invoker Upali R. is also having a rough time. 

Writing on Medium, Upali says that he had been using Opus 4.7 to develop a MicroSaaS productivity app with a few API integrations, a mid backend and a Flutter (Google’s open source cross-platform UI toolkit) frontend. He calls it the “mythical single-person developer project” i.e. one with ambitions to a fault.

Sucking on a Flutter project 

“It was nearly the third hour, when something was amiss, and I was watching it sucking on a Flutter project which I had been attempting to make in two weeks. I had wrongly estimated and overrated the strengths of these models,” groaned Upali.

He explains that he had been utilizing AI as an intelligent autocomplete, which was useful, but he had found a “lesson in the ceiling” i.e. the extent to which the tool would ultimately go upwards before failing.

“A single or two nice exchanges, and the model is off course. It debilitates your functions. It spoils your edifice. That roof was alright. Thou [all] round wouldst toil,” he wrote.

The feeling among users appears to be that Anthropic has pared back the edges of model functionality. Users find the models more cautious and ultimately less intelligent, perhaps in the name of alignment to safe usage standards, as far as they exist. Upali himself suggests that the models developers pay for are “effectively the ‘lite’ versions” and that they are “throttled by safeguards” that act as a drag on performance.

AI shinkflation

All of which discussion of reality has led a commentator writing on Trading View to say that developers have started calling these moves “AI shrinkflation” i.e same model, next version iteration, less form and function. In other words, the real story is what’s not in the box.

Citing Project Glasswing as Anthropic’s move to gatekeep top-tier intelligence to work within stricter safety guidelines, this developer thinks this represents a double-edged sword. 

“On one hand, it builds the ‘God-model’ hype,” they write. “On the other, it confirms the suspicion that the models we can pay for are effectively the ‘lite’ versions, throttled by safeguards that act as a drag on performance, or that there is some kind of degradation or throttling due to overuse.”

Ghosts in the machine?

The question arises then, are we seeing some kind of degradation or throttling due to overuse, or is there some deeper ghost in the machine that now manifests itself as a big?

Setting the record straight, Guy Currier, analyst at The Futurum Group tells The New Stack that what we’re seeing here with Opus 4.7 should absolutely not be characterized as a bug. He insists that it’s the “uncomfortable truth” of the reality of the second stage of every transformative technology cycle. 

“The first stage was euphoria: throw everything at generative AI, marvel at the output, promote your use of it,” Currier says. “Now, in the second stage, we’re experiencing failures and discouragement. Anthropic is trying to get ahead of this with a model that questions its own confidence, directly addressing the common complaint that AI’s authoritative tone misleads users into blind trust.”

He points to a wider irony here and says that it’s a kind of tauological Catch-22 when “confident self-doubt can become a doom loop”, trading time for intended quality. 

“Most users still lack the prompt craft to steer AI effectively. There is no free lunch. The market, meaning human beings, always and naturally needs normal human time to mature into disciplined, skilled use of transformative tools and begin to experience enduring value,” Currier adds.

An open door for OpenAI Codex?

Do these developments open the door for OpenAI’s Codex to flourish? For developers, of course, Claude and Opus still win in the real-world usage game, but distrust in Anthropic’s much-touted forthcoming Mythos model may now be somewhat tarnished.

OpenAI, meanwhile, is capitalizing on the market for all it can. The organization this month aimed to match Anthropic updates with a new Codex release that promises to give developers more of the tools and apps they use every day. The 

According to an OpenAI blog post last Thursday, “The Codex app also now includes deeper support for developer workflows, like reviewing PRs, viewing multiple files & terminals, connecting to remote devboxes via SSH, and an in-app browser to make it faster to iterate on frontend designs, apps, and games.”

Got content rot, or not?

In search of some kind of summary insight, Jan Hauser, co-founder of UK-based digital product building specialist Applifting tells The New Stack that the “perceived dumbness” of Opus 4.6 and 4.7 comes down to at least a couple of factors. 

“First is how much ‘effort’ (i.e. reasoning tokens) can actually be burned on a query,” Hauser says. “Historically, 4.6 was quite generous with this, and the model would return really good outputs as a result. Before the 4.7 release, Anthropic started limiting this and people began to notice.”

Second, he thinks, is expectations – people get used to high-quality outputs very quickly and are fairly sensitive to any drop in quality. 

“One could say this development mirrors the direction all AI labs are heading: less intelligence for the same or more money. Most people also point to context rot as a contributing factor, which is made even worse by the fact that Anthropic set the 1M context version as the default,” adds Hauser.

In search of the real issue and agenda playing out here, developers will naturally be wondering whether the AI model giants are trying to save on inference costs, lay down more robust security controls, feeling the impact of content rot, or play some other element of the game as they tie down harness updates. 

What may ultimately be the unerring truth is that model behavior consistency is an impossibly complex measure to deliver as a constant in the face of multi-faceted multi-modal optimization and extension.

TRENDING STORIES
Adrian Bridgwater is a technology journalist with three decades of press experience. He has an extensive background in communications, starting in print media, newspapers and also television. Primarily working as an analysis writer dedicated to a software application development ‘beat’,...
Read more from Adrian Bridgwater
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Anthropic, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.