VOOZH about

URL: https://thenewstack.io/claude-million-token-pricing/

⇱ Anthropic makes a pricing change that matters for Claude's longest prompts - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2026-03-16 06:33:09
Anthropic makes a pricing change that matters for Claude's longest prompts
AI Agents / AI Models

Anthropic makes a pricing change that matters for Claude’s longest prompts

Anthropic removes long-context pricing surcharge for Claude Opus 4.6 and Sonnet 4.6, making 1-million-token context windows available at standard per-token rates.
Mar 16th, 2026 6:33am by Paul Sawers
👁 Featued image for: Anthropic makes a pricing change that matters for Claude’s longest prompts
Rifky Nur Setyadi for Unsplash+

Anthropic announced Friday that the 1-million-token context window for Claude Opus 4.6 and Claude Sonnet 4.6 is now generally available, with standard pricing replacing the premium long-context rates that previously kicked in once prompts crossed a certain size threshold.

The company debuted the two models within weeks of each other in February. Claude Opus 4.6 is Anthropic’s flagship model for enterprise workloads that require sustained reasoning across large internal datasets and complex coding tasks. Claude Sonnet 4.6, meanwhile, is the company’s more efficient general-purpose model, designed for high-throughput developer use and production applications that need strong reasoning performance at lower cost than Opus.

Both models shipped with a 1-million-token context window — a limit that allows developers to place very large amounts of information into a single prompt. That can include entire code repositories, lengthy research papers, legal filings, or large collections of internal documents that an AI system needs to analyze together.

There was, however, an important caveat: While the models technically supported prompts approaching the 1-million-token limit, requests exceeding roughly 200,000 tokens were billed at higher “long-context” pricing tiers, moving the entire request into a premium rate band.

That pricing distinction is what Anthropic has now removed.

Under the new arrangement, requests are billed at the same per-token rate regardless of prompt size. A prompt containing hundreds of thousands of tokens is now priced using the same per-token rate as a much smaller request.

The road to 1 million tokens

Anthropic’s move toward million-token context windows has unfolded in stages over the past two years.

Earlier Claude models launched with a 200,000-token limit, already one of the largest publicly available context windows at the time. When Anthropic introduced the Claude 3 family in early 2024, the company noted that the models were technically capable of processing inputs exceeding one million tokens, though access to those larger contexts was initially limited to “specific use-cases” and available only on request.

The first public release of a 1-million-token window arrived in August 2025, when Anthropic introduced the capability in Claude Sonnet 4. The jump represented a fivefold increase over the earlier Sonnet models, albeit with a tiered pricing structure tied to prompt size.

It’s also worth noting that Anthropic was, in some respects, playing catch-up: both Google and OpenAI had already introduced models capable of handling prompts approaching one million tokens.

Still, the million-token milestone has become an increasingly visible benchmark among AI model providers. Larger context windows allow models to process longer documents or broader datasets without breaking the task into multiple steps.

Under the current pricing, Claude Opus 4.6 costs about $5 per million input tokens and $25 per million output tokens, while Claude Sonnet 4.6 costs roughly $3 per million input tokens and $15 per million output tokens. Previously, Sonnet input pricing rose from about $3 to roughly $6 per million tokens once prompts exceeded the long-context threshold, while Opus input pricing increased from about $5 to around $10 per million tokens. Output token pricing also rose under the premium tier.

Anthropic said the 1-million-token context window is available on the Claude Platform natively and through Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry. Claude Code Max, Team, and Enterprise users running Opus 4.6 will also get the full 1-million-token context window by default.

What cheaper long prompts change for developers

For developers, the removal of the long-context surcharge could influence how applications are designed.

A popular mechanism for keeping costs down has been to minimize the amount of information sent to a model at once. Retrieval systems — which pull only the most relevant snippets of data — became a common architectural pattern partly because sending very large prompts could quickly become expensive.

With the premium tier gone, so is that constraint. Developers can still rely on retrieval systems to manage token usage, but they may also choose to send larger bodies of information directly to the model when a broader context is useful.

That could make certain workflows simpler. Instead of chunking documents into smaller segments or orchestrating multiple model calls, developers can sometimes place a larger slice of data into a single prompt and ask the model to reason across it.

For AI-native coding tools, this approach is particularly attractive. A model with access to a large context window can inspect more of a codebase at once — including multiple files, documentation, and previous conversations — which can improve tasks such as debugging, code refactoring, or generating pull requests.

Brad Feld, Techstars co-founder and venture capitalist, said the larger context window can remove some of the engineering workarounds developers previously needed to manage limited context sizes.

“The 1M token context window for Claude Code changes the engineering calculus completely,” Feld writes in a LinkedIn post. “I built four markdown state machines totaling 4,700 lines to manage my development workflow — from ticket to deployment. Most of that complexity existed because of the 200K context limit.”

With a larger window, he writes, many of those mechanisms become unnecessary.

“With 1M tokens, reliability is largely solved by having enough room. The constraint shifts to wall-clock speed — and speed comes from parallelism.”

Translated, the model now has enough memory to keep track of long tasks, and the main bottleneck becomes how quickly it can process all that information.

It’s worth stressing that removing the surcharge doesn’t make large prompts free. Token usage still increases with input size, and developers must weigh this cost against other architectural approaches.

But by eliminating the pricing threshold, Anthropic has made long-context workloads easier to experiment with — and potentially easier to deploy in production systems.

TRENDING STORIES
Paul is an experienced technology journalist covering some of the biggest stories from Europe and beyond, most recently at TechCrunch where he covered startups, enterprise, Big Tech, infrastructure, open source, AI, regulation, and more. Based in London, these days Paul...
Read more from Paul Sawers
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Anthropic, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.