VOOZH about

URL: https://thenewstack.io/anthropics-opus-4-6-is-a-step-change-for-the-enterprise/

⇱ Anthropic debuts Opus 4.6 with standout scores for solving hard problems that other AIs miss - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-02-05 09:45:21
Anthropic debuts Opus 4.6 with standout scores for solving hard problems that other AIs miss
AI / AI Agents

Anthropic debuts Opus 4.6 with standout scores for solving hard problems that other AIs miss

Anthropic's new flagship model also powers agent teams in Claude Code and now features a one-million token context window.
Feb 5th, 2026 9:45am by Frederic Lardinois
👁 Featued image for: Anthropic debuts Opus 4.6 with standout scores for solving hard problems that other AIs miss

Anthropic launched Opus 4.6 on Thursday, an update to its flagship Opus model that delivers major improvements over its predecessor—and many competitors—across virtually every benchmark.

Opus 4.6 also adds several useful new features, including a one-million-token context window, the ability to output up to 128,000 tokens, and agent teams in Claude code that can work on tasks in parallel.

Pricing remains the same as before: $5/$25 per million input/output tokens.

A step-change for enterprise users

The company argues that Opus 4.6 is a step change in using large language models (LLMs) for enterprise workflows, thanks to its ability to handle more complex tasks and deliver results faster.

As an Anthropic spokesperson tells The New Stack, “it gets much closer to production-ready quality on the first try than what we’ve seen with any model – documents, spreadsheets, and presentations will need less back-and-forth on iterations.”

Anthropic notes that Claude in Excel, for example, can now handle longer-running, more complex tasks and multi-step changes in a single pass.

👁 Image

Claude Opus 4.6 in PowerPoint (credit: Anthropic).

Benchmarks

As has been the tradition for Anthropic models, Opus 4.6 once again improves on coding benchmarks, except for the SWE-bench verified tests and the MCP Atlas benchmark for testing tool usage, both of which show small regressions. That’s a bit of an anomaly, especially given that the model performs exceedingly well on similar benchmarks that examine agentic coding in the terminal (Terminal Bench 2.0) and agentic tool use (t2-bench).

On Terminal Bench, Opus 4.6 scores 65.4%, up from 59.8% for Opus 4.5, and on the OSWorld agentic computer use benchmark, its score rose from 66.3% to 72.7%. This now puts it ahead of OpenAI’s GPT-5.2 and Google’s Gemini 3 Pro, and according to Anthropic, the new model performs especially well at diagnosing more complex bugs.

👁 Image

Claude Opus 4.6 Terminal Bench benchmark (credit: Anthropic).

Anthropic reports similar gains across benchmarks. The standout, though, is its score of 68.8% on the ARC AGI 2 benchmark, which is less about achieving PhD-level performance in specialized tasks and more about solving problems that are easy for humans and very hard for AI systems. Opus 4.5 scored only 37.6%, while Gemini 3 Pro scored 45.1% and GPT-5.2 scored 54.2%.

Benchmarks, of course, only tell some of the story and don’t always reflect how these models work in practice. Anthropic argues that, in its internal use, Opus 4.6 has handled far more challenging tasks, even without explicit instructions, and has done so more quickly and with better results.

In its safety evaluations, Anthropic found that Opus 4.6 matches Opus 4.5 in terms of misalignments such as deception, sycophancy, and encouraging user delusions.

👁 Image

More features

Even though the version number change is small, there are other updates here beyond improved reasoning performance. Opus 4.6 is the first model in the Opus family to feature a one-million-token context window, for example.

It’s also the first Anthropic model to use adaptive thinking, which allows it to consider contextual clues to determine how much effort to invest in a prompt. Developers still get more control over this with the /effort parameter to make explicit tradeoffs between quality, inference speed, and cost. Previously, though, the option was only to enable or disable extended thinking, so this now gives them a bit more control.

For API users, Claude can now use compaction to summarize context, allowing it to handle longer-running tasks without hitting its context limits.

There’s also a nod to digital sovereignty in this update. If your workloads can only run in the United States, that’s now an option, but you will pay 10% more for it. More typically, we see companies offer this for workloads that can’t run in the United States,

Agent teams

For developers, the most interesting new feature may be agent teams, though. While developers have found ways around this, by default, Claude Code has only run one agent at a time until now. Now, Anthropic is introducing agent teams, which allow developers to split work across agents. Those agents can then work in parallel and coordinate their efforts autonomously.

Anthropic notes that this is especially useful for read-heavy work, such as codebase reviews.

TRENDING STORIES
Before joining The New Stack as its senior editor for AI, Frederic was the enterprise editor at TechCrunch, where he covered everything from the rise of the cloud and the earliest days of Kubernetes to the advent of quantum computing....
Read more from Frederic Lardinois
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Anthropic, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.