![]() |
VOOZH | about |
TrueFoundry recognized in Gartner Hype Cycle for Platform Engineering 2026. Read the full report β
Join our VAR & VAD ecosystem β deliver enterprise AI governance across LLMs, MCPs & Agents. Become a Partner β
Get instant access to a live TrueFoundry environment. Deploy models, route LLM traffic, and explore the full platform β your sandbox is ready in seconds, no credit card required.
Blazingly fast way to build, track and deploy your models!
Claude Code enforces usage through two overlapping limits β a 5-hour rolling window plus a weekly cap on active compute hours β and that quota is shared across Claude Code, Claude.ai, and Cowork.
Anthropic governs Claude Code through a dual-layer usage system: a 5-hour rolling window for short-term activity, and a weekly cap on active compute hours. The same bucket is shared across Claude Code, Claude.ai chat, and Cowork β burn tokens in one, you lose capacity in the others.
Here's what each plan gets, followed by exactly how the limits work and how to stay productive within them.
Note: Anthropic reduced 5-hour limits during weekday peak hours (5β11 AM PT) starting March 2026, and acknowledged on March 31 that users are hitting limits faster than expected. The numbers below reflect the post-March 28 reality.
| Plan | Price | 5-hour Window | Weekly Cap | Models | Best For |
|---|---|---|---|---|---|
| Free | $0 | ~40 short messages/day (Claude.ai only) | β | Limited Sonnet | No Claude Code access |
| Pro | $20/mo $17/mo annual | ~10β45 prompts | ~40β80 Sonnet hours | Sonnet 4.6 only | Solo devs, small codebases |
| Max 5x | $100/mo | ~50β225 prompts | ~140β240 Sonnet hrs + 15β35 Opus hrs | Sonnet 4.6 + Opus 4.6/4.7 | Full-time Claude Code users |
| Max 20x | $200/mo | ~200β900 prompts | 240β480 Sonnet hrs + 24β40 Opus hrs | Sonnet 4.6 + Opus 4.6/4.7 | Heavy daily use, midβlarge codebases |
| Team Premium | $100/seat 5-seat minimum | Max 5x equivalent per seat | Custom | Sonnet + Opus | Teams needing shared admin + analytics |
| API (pay-as-you-go) | Sonnet 4.6: $3 / $15 Opus 4.6: $5 / $25 (per MTok) |
Tier-based RPM/TPM (not session) |
None |
All models, all regions |
Production workloads, custom integrations |
Per-seat Claude Code subscriptions break at team scale. Pool a single enterprise Anthropic contract across your whole team through a gateway β add Bedrock and Vertex Claude for failover and extra capacity. Same models, central cost tracking, no per-seat tax.
A small share of power users were running 24-hour sessions, sharing credentials across teams, and burning thousands of dollars of compute on $20 subscriptions β degrading service for everyone else. The 5-hour window and weekly cap are Anthropic's guardrails against that: fair-use enforcement, anti-abuse, and a way to keep plan pricing sustainable without quietly restricting features.
Claude Codeβs usage model operates on two distinct control layers β one managing short-term activity bursts, and another regulating total weekly compute consumption. Together, they define how Anthropic balances fairness, scalability, and system reliability across its user base.
1. The Five-Hour Rolling Window
The 5-hour rolling window caps how many prompts you can send in a session. The counter starts on your first prompt, not on a fixed clock β fire your first prompt at 10 AM and the window resets at 3 PM, regardless of how many prompts you sent in between.
Capacity scales with your plan: roughly 10β45 prompts per window on Pro, up to 900 on Max 20x. The exact number depends on prompt length, context size, and model choice.
This rolling-window model is common across LLM providers β OpenAI and Google use similar quota structures. If you're working across multiple providers, comparing how each handles rate limits becomes important.
2. The Weekly Active Hours Cap
In parallel, a weekly cap restricts the total number of βactive compute hoursβ available per subscription. Anthropic defines an active hour not as wall-clock time, but as periods when Claude models are actively processing tokens or executing code-related reasoning. Idle moments such as file browsing or conversational pauses do not count toward this quota.
For Pro plans, this equates to roughly 40β80 active hours per week using Sonnet models, while Max tiers extend that range up to 480 Sonnet hours or 40 Opus hours, depending on session concurrency and model complexity.
3. Unified Enforcement and Visibility
These two limit types β rolling and weekly β are tightly coupled. Once either boundary is reached, all new prompts are blocked, even if the other counter remains under its limit. No manual resets or support overrides are allowed.
Developers have access only to basic countdown timers for usage visibility, leaving limited insight into granular token or model-level consumption. For teams managing multiple projects, this can make quota planning and observability difficult β a challenge thatβs increasingly common in modern AI workloads.
From an infrastructure perspective, this rate-limiting approach resembles a centralized quota manager: efficient for fairness, but rigid for flexibility. Enterprise-grade systems β such as TrueFoundryβs AI Gateway β solve this by offering API-driven governance, Otel-compliant observability, and fine-grained usage analytics, allowing teams to monitor and optimize model calls in real time without arbitrary hard stops.
Key Metrics for Evaluating Gateway
| Criteria | What should you evaluate ? | Priority | TrueFoundry |
|---|---|---|---|
| Latency | Adds <10ms p95 overhead for time-to-first-token? | Must Have | β Supported |
| Data Residency | Keeps logs within your region (EU/US)? | Depends on use case | β Supported |
| Latency-Based Routing | Automatically reroutes based on real-time latency/failures? | Must Have | β Supported |
| Key Rotation & Revocation | Rotate or revoke keys without downtime? | Must Have | β Supported |
| Key Rotation & Revocation | Rotate or revoke keys without downtime? | Must Have | β Supported |
| Key Rotation & Revocation | Rotate or revoke keys without downtime? | Must Have | β Supported |
| Key Rotation & Revocation | Rotate or revoke keys without downtime? | Must Have | β Supported |
| Key Rotation & Revocation | Rotate or revoke keys without downtime? | Must Have | β Supported |
Selecting the right plan depends on how frequently and deeply you expect to work with Claude Code.
The Free tier offers about 40 short messages per day, but excludes access to the agentic Claude Code capabilities. It is best suited for casual experimentation, testing smaller snippets, or initial onboarding before adopting a paid plan.
The Pro tier, priced at $20/month, unlocks the full Claude Code functionality β providing roughly 45 prompts per five-hour window along with a weekly usage cap suitable for individual developers. Users managing smaller codebases or coding in shorter bursts will find it ideal. Notably, the Pro tier includes Sonnet model access, but does not support Opus, which is reserved for deeper architectural reasoning and advanced refactoring tasks.
The Max plans deliver up to 20Γ higher throughput, scaling proportionally with pricing. The Max 5x plan ($100/month) and Max 20x plan ($200/month) are designed for enterprise teams, heavy solo developers, and agencies handling multiple concurrent projects. These tiers combine Sonnet and Opus hours to power intensive, multi-session workflows. However, even these plans have boundaries β once 50 sessions per month are reached, access throttling may occur.
Finally, Team and Enterprise plans include administrative controls, usage analytics, and the ability to purchase custom volume limits or overflow capacity. These options best serve organizations seeking predictable throughput and centralized governance across distributed teams.
The "messages per 5 hours" you see in the UI is a simplification. Claude actually meters by tokens β every prompt, file attachment, tool definition, and line of conversation history draws from the same quota.
This matters because token usage doesn't scale with message count. It scales with context. Referencing five medium-sized files can burn 30,000+ tokens β equivalent to dozens of plain prompts. Agentic sessions are even worse: each turn replays your system prompt, file references, and tool definitions. "Ultrathink" mode can multiply consumption 5x compared to a regular session.
Advanced developers model their requests against Anthropic's free token-counting API before execution to avoid premature quota exhaustion. For Claude Code workflows specifically, tools and MCP servers often consume more tokens than the actual prompt β teams running MCP servers across Claude Code tend to underestimate this overhead.
Skip the workarounds. Try TrueFoundry live.
Spin up a sandbox in seconds β route Claude Code across Anthropic, Bedrock, and Vertex from one endpoint. No setup, no credit card.
Reaching a rate limit immediately pauses all new prompts. Both the web interface and CLI display explicit error messages indicating window expiry and the exact time of reset. Existing threads remain in read-only mode, allowing users to review or copy code, but no further requests can be processed.
This block persists until the timer resets, whether after the five-hour rolling window or the weekly usage cycle. Developers requiring immediate overflow must switch to API pay-as-you-go plans or alternate tools β support teams cannot manually reset or extend quotas in real time.
Unlike some SaaS systems, Claude does not provide detailed per-prompt or per-token breakdowns, requiring developers to self-monitor usage. For heavily sessioned workflows, teams often maintain manual tracking or use custom scripts to estimate remaining capacity.
Developers on Pro plans can upgrade for greater throughput, but should remain realistic about ceilings even on the Max tiers. Large-scale codebase refactoring or architecture-level debugging often demands disciplined context management, strategic prompt design, and awareness of token costs to operate efficiently within defined limits.
To make the most of Claude Code under its rate limits, developers must optimize how they structure prompts, manage context, and plan usage windows. The most effective users adopt disciplined, token-aware workflows that maximize output while minimizing unnecessary consumption.
Some best practices to improve efficiency and stay within quota limits are:
CLAUDE.md and attached project documentation concise. Every added or updated line is reprocessed with each message, making context bloat a costly mistake.By adopting these practices, teams can significantly extend their effective throughput, prevent workflow interruptions, and maintain a consistent development pace β even under tight compute and token constraints.
These quota controls constitute a major evolution in how agentic coding tools are consumed. For solo developers, limits are rarely felt in short, intermittent sessions. However, frequent and intensive users must adjust expectations, moving toward disciplined session planning, backup tooling, and hybridized workflows.
Large organizations and agencies benefit most from the Team and Enterprise options, with administrative dashboards, usage analytics, and extra controls for cross-team planning. Those running heavy-duty operations may mix Claude Code with Cursor, Copilot, Gemini, or roll their overflow workload to Anthropicβs API with usage-based billing.
The economic calculation should align subscription choice with expected productivity and project complexity. For most Pro users, the savings generated by using Claude Code far outstrip the subscription cost. For Max plans, high-billable developers and teams are best served by intentional, quota-aware workflow management.
As the competitive landscape evolves, and as new model versions bring improved capability at greater computational cost, users should expect quotas to tighten further rather than loosen. Proactive adaptation and a willingness to blend tools will define the most effective development operations going forward.
Claude Code represents a new era of agentic, autonomous software assistance, enabling developers to offload repetitive and complex coding tasks, reflect on architecture, and execute deep refactoring at scale. With the introduction of rate limits and usage quotas, getting the most out of Claude now requires a blend of technical planning, workflow optimization, and strategic tool selection.
By understanding how quotas and token accounting work, staying vigilant about context management and prompt design, and aligning coding patterns with rolling and weekly allocation windows, teams can preserve both performance and accessibility. Those with heavier or always-on workloads should explore API-based integrations or deploy Claude as part of a multi-tool development pipeline.
This is where infrastructure platforms like TrueFoundry play a crucial role. TrueFoundryβs AI Gateway enables teams to integrate models like Claude β along with OpenAI, Gemini, or custom LLMs β through a unified, vendor-agnostic interface. It provides governance, observability, and scalability without enforcing hard usage ceilings, ensuring that enterprises maintain flexibility and control over their AI workloads across any provider.
Managing rate limits and compute costs is becoming essential for both individual developers and enterprise AI teams. Beyond understanding how Claudeβs rolling and weekly limits work, you can also take proactive control over your usage budgets and API consumption with infrastructure platforms like TrueFoundryβs AI Gateway.
Hereβs how teams can maintain cost and quota efficiency at scale:
This kind of infrastructure-level control helps organizations balance innovation with governance β letting developers work freely while ensuring usage remains predictable, auditable, and within budget.
For a practical walkthrough on setting up visibility, we recommend reading our guide on cost tracking Claude code with TrueFoundry's AI Gateway, which details how to visualize token spend and prevent budget overages.
Anthropicβs quota system reflects a broader challenge in modern AI infrastructure: governing resource usage while maintaining high performance. As organizations adopt more agentic and model-intensive workloads, it becomes essential to manage compute, observability, and governance without being locked into vendor-specific rate limits or SDKs.
This is where TrueFoundryβs AI Gateway acts as a powerful abstraction layer. Rather than replacing the model, it provides the operational scaffolding that allows teams to integrate Claude Code alongside other endpoints through a single, unified interface. This approach ensures that while Claude provides the agentic intelligence, TrueFoundry supplies the operational flexibility needed to scale it.
For a technical walkthrough on connecting your CLI and IDEs, you can refer to our documentation on Claude code integration.
Using the AI Gateway enables teams to:
By combining the reasoning capabilities of tools like Claude with the governance of TrueFoundry, teams can build resilient, scalable AI development pipelines that evolve alongside the technology.
Ready to scale your AI operations? Book a demo to see TrueFoundry in action
Yes, there are strict Claude code limits governing usage, including a five-hour rolling window and weekly caps. While Claude Pro offers higher capacity for these language models, heavy workloads often hit these ceilings. TrueFoundryβs AI Gateway helps manage these constraints by enabling fallback to other providers when quotas are reached.
The 5-hour window functions as claude code rate limit, capping the burst activity for a user. It restricts the number of messages or input tokens allowed before a reset occurs. TrueFoundry mitigates this by allowing you to set custom rate limits and route traffic dynamically.
Rather than reducing them, Anthropic restructured the Claude quota to prevent abuse by heavy users. They introduced weekly rate limits to ensure fairness and system reliability. TrueFoundry ensures your use case remains scalable by balancing loads across multiple accounts or API endpoints.
Claude code max limits depend on your subscription, with token limits varying significantly between models. A large context window accelerates consumption, as every file and message counts. TrueFoundry provides visibility into these costs, helping you optimize token limits better than the default console.
These Claude limits restrict total active compute time, offering roughly 40-80 hours of Sonnet or fewer hours of Opus for Pro users. Once hit, you must wait for a reset. TrueFoundry's AI Gateway helps teams track usage and switch providers to avoid downtime.
Claude limits are not strictly daily but operate on a five-hour rolling window. Heavy usage impacts your context window limit quickly. TrueFoundry mitigates this by allowing you to set custom budgets and rate limits across all your AI models, ensuring Claude AI usage remains efficient.
To bypass Claude code rate limits, you must wait for the window to reset or switch to the Claude API for pay-as-you-go API usage. For a better way, TrueFoundry enables seamless failover to other large language models, ensuring uninterrupted code generation workflows.
TrueFoundry AI Gateway delivers ~3β4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.
Product
Company
Resources