👁 Blank white background with no objects or features visible.

TrueFoundry recognized in Gartner Hype Cycle for Platform Engineering 2026. Read the full report →

Join our VAR & VAD ecosystem — deliver enterprise AI governance across LLMs, MCPs & Agents. Become a Partner →

Book Demo

👁 Three horizontal black bars of varying lengths on a white background, menu or list icon symbol.

👁 bg

👁 Blank white background with no objects or features visible in the empty space provided entirely.

Go back

👁 TrueFoundry Logo

Try TrueFoundry — Live, Right Now

Get instant access to a live TrueFoundry environment. Deploy models, route LLM traffic, and explore the full platform — your sandbox is ready in seconds, no credit card required.

9.9

👁 Red star symbol on white background, a five-pointed star icon in a blurry coral color.
👁 C2 logo with stylized orange letter and arrow symbol on a white background.

Loved by Enterprises and Startups

👁 Cargill logo with stylized gray swoosh above the company name on a white background.
👁 MAVENIR logo with stylized text and underline on the letter M in black on white background.
👁 Whatfix software logo with stylized letter W and trademark symbol on white background.
👁 Wadhwani AI logo featuring a stylized starburst design on a clean white background.
👁 Games logo with stylized sunburst design on white background.
👁 Grey Aviso logo featuring a stylized triangle with a dot on a white background.
👁 Aviva logo displayed on a white background with dark grey text and distinctive dot design element.
👁 JanitorAI Logo

Claude Code Limits Explained (2026 Edition)

👁 Image

By Sahajmeet Kaur

Published: June 12, 2026

👁 Image

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

⚡ TL;DR

Claude Code enforces usage through two overlapping limits — a 5-hour rolling window plus a weekly cap on active compute hours — and that quota is shared across Claude Code, Claude.ai, and Cowork.

Anthropic governs Claude Code through a dual-layer usage system: a 5-hour rolling window for short-term activity, and a weekly cap on active compute hours. The same bucket is shared across Claude Code, Claude.ai chat, and Cowork — burn tokens in one, you lose capacity in the others.

Here's what each plan gets, followed by exactly how the limits work and how to stay productive within them.

Note: Anthropic reduced 5-hour limits during weekday peak hours (5–11 AM PT) starting March 2026, and acknowledged on March 31 that users are hitting limits faster than expected. The numbers below reflect the post-March 28 reality.

Claude Code Limits by Plan: Pro, Max, and API Compared

Plan	Price	5-hour Window	Weekly Cap	Models	Best For
Free	$0	~40 short messages/day (Claude.ai only)	—	Limited Sonnet	No Claude Code access
Pro	$20/mo $17/mo annual	~10–45 prompts	~40–80 Sonnet hours	Sonnet 4.6 only	Solo devs, small codebases
Max 5x	$100/mo	~50–225 prompts	~140–240 Sonnet hrs + 15–35 Opus hrs	Sonnet 4.6 + Opus 4.6/4.7	Full-time Claude Code users
Max 20x	$200/mo	~200–900 prompts	240–480 Sonnet hrs + 24–40 Opus hrs	Sonnet 4.6 + Opus 4.6/4.7	Heavy daily use, mid–large codebases
Team Premium	$100/seat 5-seat minimum	Max 5x equivalent per seat	Custom	Sonnet + Opus	Teams needing shared admin + analytics
API (pay-as-you-go)	Sonnet 4.6: $3 / $15 Opus 4.6: $5 / $25 (per MTok)	Tier-based RPM/TPM (not session)	None	All models, all regions	Production workloads, custom integrations

Prompt counts are ranges from independent testing — actual usage varies based on prompt length, context, model choice, and server load.
Subscription limits are shared across Claude Code, Claude.ai chat, and Cowork.
Peak-hour throttling: 5-hour limits are reduced weekdays 5–11 AM PT.
Input prompts over 200K tokens are billed at 2× the standard API rate.
Opus 4.7 can generate ~35% more tokens than 4.6 for the same input, increasing effective costs.

For platform & infra teams

20 developers × $200/mo = $4,000/mo. And they're still hitting limits.

Per-seat Claude Code subscriptions break at team scale. Pool a single enterprise Anthropic contract across your whole team through a gateway — add Bedrock and Vertex Claude for failover and extra capacity. Same models, central cost tracking, no per-seat tax.

Route Claude across Anthropic + Bedrock + Vertex from one endpoint
Per-team budgets, rate limits, and RBAC out of the box
Token-level cost attribution by developer or project
BYOK with your existing enterprise contract

See how the AI Gateway works Or jump straight to the Claude Code Max setup guide

Why Limits Are Necessary

A small share of power users were running 24-hour sessions, sharing credentials across teams, and burning thousands of dollars of compute on $20 subscriptions — degrading service for everyone else. The 5-hour window and weekly cap are Anthropic's guardrails against that: fair-use enforcement, anti-abuse, and a way to keep plan pricing sustainable without quietly restricting features.

Understanding the Rate Limits Structure

Claude Code’s usage model operates on two distinct control layers — one managing short-term activity bursts, and another regulating total weekly compute consumption. Together, they define how Anthropic balances fairness, scalability, and system reliability across its user base.

1. The Five-Hour Rolling Window
The 5-hour rolling window caps how many prompts you can send in a session. The counter starts on your first prompt, not on a fixed clock — fire your first prompt at 10 AM and the window resets at 3 PM, regardless of how many prompts you sent in between.

Capacity scales with your plan: roughly 10–45 prompts per window on Pro, up to 900 on Max 20x. The exact number depends on prompt length, context size, and model choice.

This rolling-window model is common across LLM providers — OpenAI and Google use similar quota structures. If you're working across multiple providers, comparing how each handles rate limits becomes important.

2. The Weekly Active Hours Cap
In parallel, a weekly cap restricts the total number of “active compute hours” available per subscription. Anthropic defines an active hour not as wall-clock time, but as periods when Claude models are actively processing tokens or executing code-related reasoning. Idle moments such as file browsing or conversational pauses do not count toward this quota.

For Pro plans, this equates to roughly 40–80 active hours per week using Sonnet models, while Max tiers extend that range up to 480 Sonnet hours or 40 Opus hours, depending on session concurrency and model complexity.

3. Unified Enforcement and Visibility
These two limit types — rolling and weekly — are tightly coupled. Once either boundary is reached, all new prompts are blocked, even if the other counter remains under its limit. No manual resets or support overrides are allowed.

Developers have access only to basic countdown timers for usage visibility, leaving limited insight into granular token or model-level consumption. For teams managing multiple projects, this can make quota planning and observability difficult — a challenge that’s increasingly common in modern AI workloads.

From an infrastructure perspective, this rate-limiting approach resembles a centralized quota manager: efficient for fairness, but rigid for flexibility. Enterprise-grade systems — such as TrueFoundry’s AI Gateway — solve this by offering API-driven governance, Otel-compliant observability, and fine-grained usage analytics, allowing teams to monitor and optimize model calls in real time without arbitrary hard stops.

Key Metrics for Evaluating Gateway

Criteria	What should you evaluate ?	Priority	TrueFoundry
Latency	Adds <10ms p95 overhead for time-to-first-token?	Must Have	✅ Supported
Data Residency	Keeps logs within your region (EU/US)?	Depends on use case	✅ Supported
Latency-Based Routing	Automatically reroutes based on real-time latency/failures?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported

👁 Image

Evaluating an AI Gateway?

A practical guide used by platform & infra teams

Differences Across Free, Pro, and Max Plans

Selecting the right plan depends on how frequently and deeply you expect to work with Claude Code.

The Free tier offers about 40 short messages per day, but excludes access to the agentic Claude Code capabilities. It is best suited for casual experimentation, testing smaller snippets, or initial onboarding before adopting a paid plan.

The Pro tier, priced at $20/month, unlocks the full Claude Code functionality — providing roughly 45 prompts per five-hour window along with a weekly usage cap suitable for individual developers. Users managing smaller codebases or coding in shorter bursts will find it ideal. Notably, the Pro tier includes Sonnet model access, but does not support Opus, which is reserved for deeper architectural reasoning and advanced refactoring tasks.

The Max plans deliver up to 20× higher throughput, scaling proportionally with pricing. The Max 5x plan ($100/month) and Max 20x plan ($200/month) are designed for enterprise teams, heavy solo developers, and agencies handling multiple concurrent projects. These tiers combine Sonnet and Opus hours to power intensive, multi-session workflows. However, even these plans have boundaries — once 50 sessions per month are reached, access throttling may occur.

Finally, Team and Enterprise plans include administrative controls, usage analytics, and the ability to purchase custom volume limits or overflow capacity. These options best serve organizations seeking predictable throughput and centralized governance across distributed teams.

How Claude Code Counts Tokens (And Why It Matters)

The "messages per 5 hours" you see in the UI is a simplification. Claude actually meters by tokens — every prompt, file attachment, tool definition, and line of conversation history draws from the same quota.

This matters because token usage doesn't scale with message count. It scales with context. Referencing five medium-sized files can burn 30,000+ tokens — equivalent to dozens of plain prompts. Agentic sessions are even worse: each turn replays your system prompt, file references, and tool definitions. "Ultrathink" mode can multiply consumption 5x compared to a regular session.

Model choice is the biggest single lever:

Opus 4.6 / 4.7 — Highest cost per request, deepest reasoning. Use for architecture-level refactors.
Sonnet 4.6 — Balanced cost and capability. Right for most day-to-day refactoring and analysis.
Haiku 4.5 — Cheapest and fastest. Best for well-scoped tasks or quick edits.

Advanced developers model their requests against Anthropic's free token-counting API before execution to avoid premature quota exhaustion. For Claude Code workflows specifically, tools and MCP servers often consume more tokens than the actual prompt — teams running MCP servers across Claude Code tend to underestimate this overhead.

Skip the workarounds. Try TrueFoundry live.

Spin up a sandbox in seconds — route Claude Code across Anthropic, Bedrock, and Vertex from one endpoint. No setup, no credit card.

Launch the sandbox

Live environment • No credit card

What Happens When You Hit the Limit?

Reaching a rate limit immediately pauses all new prompts. Both the web interface and CLI display explicit error messages indicating window expiry and the exact time of reset. Existing threads remain in read-only mode, allowing users to review or copy code, but no further requests can be processed.

This block persists until the timer resets, whether after the five-hour rolling window or the weekly usage cycle. Developers requiring immediate overflow must switch to API pay-as-you-go plans or alternate tools — support teams cannot manually reset or extend quotas in real time.

Unlike some SaaS systems, Claude does not provide detailed per-prompt or per-token breakdowns, requiring developers to self-monitor usage. For heavily sessioned workflows, teams often maintain manual tracking or use custom scripts to estimate remaining capacity.

Developers on Pro plans can upgrade for greater throughput, but should remain realistic about ceilings even on the Max tiers. Large-scale codebase refactoring or architecture-level debugging often demands disciplined context management, strategic prompt design, and awareness of token costs to operate efficiently within defined limits.

Optimizing Your Workflow for the Claude Code

To make the most of Claude Code under its rate limits, developers must optimize how they structure prompts, manage context, and plan usage windows. The most effective users adopt disciplined, token-aware workflows that maximize output while minimizing unnecessary consumption.

Some best practices to improve efficiency and stay within quota limits are:

Design for token and context awareness: Structure interactions to focus on high-impact coding tasks. Avoid unnecessary or repetitive exchanges that increase token load without adding value.
Clear context regularly: End long-running sessions after key milestones and start fresh ones to reset context and maintain prompt relevance. This helps control hidden token buildup over time.
Keep context files lean: Keep your CLAUDE.md and attached project documentation concise. Every added or updated line is reprocessed with each message, making context bloat a costly mistake.
Disable unused tools or plugins: Turn off integrations not needed in a session to reduce incidental token and compute usage.
Use auto-compact strategically: Summarization tools can help, but excessive use may introduce hidden token costs if old logs and references persist.
Optimize prompt structure: Combine multiple related instructions into a single, well-scoped prompt instead of spreading them across multiple exchanges. Teams often use centralized tools for prompt management to version-control these system instructions, ensuring that optimized, token-efficient prompts are reused across the organization.
Time sessions around rolling windows: Because Claude operates on rolling usage windows, start major development tasks right after a reset to ensure maximum quota availability. Some teams even schedule coding sessions to align with reset cycles.
Select models intentionally: Use Sonnet for most daily coding and refactoring work, Opus for deep architectural reasoning or debugging across large codebases, and Haiku for short, targeted tasks such as writing tests or formatting.
Use extended thinking modes sparingly: “Ultrathink” or extended reasoning modes are powerful but computationally expensive — deploy them only when the additional context depth delivers clear value.
Batch and automate with backoff logic: Implement exponential backoff, batching scripts, or queued orchestration to manage retries efficiently and spread workloads within quota boundaries.

By adopting these practices, teams can significantly extend their effective throughput, prevent workflow interruptions, and maintain a consistent development pace — even under tight compute and token constraints.

The Implications for Developers and Organizations

These quota controls constitute a major evolution in how agentic coding tools are consumed. For solo developers, limits are rarely felt in short, intermittent sessions. However, frequent and intensive users must adjust expectations, moving toward disciplined session planning, backup tooling, and hybridized workflows.

Large organizations and agencies benefit most from the Team and Enterprise options, with administrative dashboards, usage analytics, and extra controls for cross-team planning. Those running heavy-duty operations may mix Claude Code with Cursor, Copilot, Gemini, or roll their overflow workload to Anthropic’s API with usage-based billing.

The economic calculation should align subscription choice with expected productivity and project complexity. For most Pro users, the savings generated by using Claude Code far outstrip the subscription cost. For Max plans, high-billable developers and teams are best served by intentional, quota-aware workflow management.

As the competitive landscape evolves, and as new model versions bring improved capability at greater computational cost, users should expect quotas to tighten further rather than loosen. Proactive adaptation and a willingness to blend tools will define the most effective development operations going forward.

Claude Code represents a new era of agentic, autonomous software assistance, enabling developers to offload repetitive and complex coding tasks, reflect on architecture, and execute deep refactoring at scale. With the introduction of rate limits and usage quotas, getting the most out of Claude now requires a blend of technical planning, workflow optimization, and strategic tool selection.

By understanding how quotas and token accounting work, staying vigilant about context management and prompt design, and aligning coding patterns with rolling and weekly allocation windows, teams can preserve both performance and accessibility. Those with heavier or always-on workloads should explore API-based integrations or deploy Claude as part of a multi-tool development pipeline.

This is where infrastructure platforms like TrueFoundry play a crucial role. TrueFoundry’s AI Gateway enables teams to integrate models like Claude — along with OpenAI, Gemini, or custom LLMs — through a unified, vendor-agnostic interface. It provides governance, observability, and scalability without enforcing hard usage ceilings, ensuring that enterprises maintain flexibility and control over their AI workloads across any provider.

Controlling AI Costs and Usage Effectively

Managing rate limits and compute costs is becoming essential for both individual developers and enterprise AI teams. Beyond understanding how Claude’s rolling and weekly limits work, you can also take proactive control over your usage budgets and API consumption with infrastructure platforms like TrueFoundry’s AI Gateway.

Here’s how teams can maintain cost and quota efficiency at scale:

Set Dynamic Rate Limits per Model or Endpoint
With TrueFoundry’s AI Gateway, teams can define per-endpoint rate limits across providers like Claude, OpenAI, or Gemini. This ensures that no individual service or user exceeds compute capacity or quota unexpectedly.
Define Budget Caps for Each Project or Team
You can configure monthly or project-based budget thresholds, automatically pausing or throttling workloads when spend approaches predefined limits. This helps control cloud GPU costs and prevents runaway usage.
Monitor and Optimize with Real-Time Analytics
All model calls and compute metrics are OpenTelemetry (OTel)-compliant, meaning you can export usage data into existing monitoring tools like Grafana, Datadog, or Prometheus for unified observability.
Automate Policy Enforcement via API or GitOps
The platform is fully API-driven, allowing teams to script and enforce their own governance logic — whether through CI/CD workflows or infrastructure-as-code.
Gain Visibility with a Centralized Dashboard
The AI Gateway provides a unified dashboard showing model-level consumption, cost trends, and traffic analytics.

👁 TrueFoundry AI Gateway interface showing how to configure rate-limiting rules through the Configtab

‍ “Rate Limits” or “Usage Dashboard” view from TrueFoundry

This kind of infrastructure-level control helps organizations balance innovation with governance — letting developers work freely while ensuring usage remains predictable, auditable, and within budget.

For a practical walkthrough on setting up visibility, we recommend reading our guide on cost tracking Claude code with TrueFoundry's AI Gateway, which details how to visualize token spend and prevent budget overages.

Enhancing Claude Code Governance with TrueFoundry

Anthropic’s quota system reflects a broader challenge in modern AI infrastructure: governing resource usage while maintaining high performance. As organizations adopt more agentic and model-intensive workloads, it becomes essential to manage compute, observability, and governance without being locked into vendor-specific rate limits or SDKs.

This is where TrueFoundry’s AI Gateway acts as a powerful abstraction layer. Rather than replacing the model, it provides the operational scaffolding that allows teams to integrate Claude Code alongside other endpoints through a single, unified interface. This approach ensures that while Claude provides the agentic intelligence, TrueFoundry supplies the operational flexibility needed to scale it.

For a technical walkthrough on connecting your CLI and IDEs, you can refer to our documentation on Claude code integration.

Using the AI Gateway enables teams to:

Unified Integration: Integrate any OpenAI-compatible endpoint, custom model, or Claude via one interface.
Seamless Governance: Maintain API-level governance and rate management without needing to alter application code.
Deep Observability: Gain fine-grained visibility via Open Telemetry-compliant logs that are exportable to any monitoring tool.
Strategic Portability: Retain control and flexibility by allowing deployments on any Kubernetes cluster, avoiding vendor lock-in.

By combining the reasoning capabilities of tools like Claude with the governance of TrueFoundry, teams can build resilient, scalable AI development pipelines that evolve alongside the technology.

Ready to scale your AI operations? Book a demo to see TrueFoundry in action

Frequently Asked Questions

Does the Claude code have usage limits?

Yes, there are strict Claude code limits governing usage, including a five-hour rolling window and weekly caps. While Claude Pro offers higher capacity for these language models, heavy workloads often hit these ceilings. TrueFoundry’s AI Gateway helps manage these constraints by enabling fallback to other providers when quotas are reached.

What is the 5-hour limit on Claude Code?

The 5-hour window functions as claude code rate limit, capping the burst activity for a user. It restricts the number of messages or input tokens allowed before a reset occurs. TrueFoundry mitigates this by allowing you to set custom rate limits and route traffic dynamically.

Did Claude reduce limits?

Rather than reducing them, Anthropic restructured the Claude quota to prevent abuse by heavy users. They introduced weekly rate limits to ensure fairness and system reliability. TrueFoundry ensures your use case remains scalable by balancing loads across multiple accounts or API endpoints.

What is the maximum number of tokens for Claude Code?

Claude code max limits depend on your subscription, with token limits varying significantly between models. A large context window accelerates consumption, as every file and message counts. TrueFoundry provides visibility into these costs, helping you optimize token limits better than the default console.

What is the weekly limit for Claude Code check?

These Claude limits restrict total active compute time, offering roughly 40-80 hours of Sonnet or fewer hours of Opus for Pro users. Once hit, you must wait for a reset. TrueFoundry's AI Gateway helps teams track usage and switch providers to avoid downtime.

Does Claude AI have a daily limit?

Claude limits are not strictly daily but operate on a five-hour rolling window. Heavy usage impacts your context window limit quickly. TrueFoundry mitigates this by allowing you to set custom budgets and rate limits across all your AI models, ensuring Claude AI usage remains efficient.

How to get past the Claude message limit?

To bypass Claude code rate limits, you must wait for the window to reset or switch to the Claude API for pay-as-you-go API usage. For a better way, TrueFoundry enables seamless failover to other large language models, ensuring uninterrupted code generation workflows.

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now