Spend limits cap how much each developer can spend through your Claude apps gateway in a given day, week, or month. When a developer passes their cap, the gateway returns 429 on their next request and blocks them until the period resets or an admin raises the cap. Use spend limits to give each developer, group, or the whole organization a ceiling on a credential everyone shares.
A Claude apps gateway forwards all inference through one shared upstream credential, so your provider’s bill attributes everything to that credential, not to individual developers. Without per-developer limits, one runaway agent fleet can spend the organization’s entire commitment. Spend limits are the gateway’s per-developer view and circuit breaker on top of that shared bill.
Set a cap
With the admin: block configured in gateway.yaml, the gateway serves an admin API at /v1/organizations/spend_limits and enforces caps live on every inference request. Caps themselves are set through that API, not in gateway.yaml; each POST /v1/organizations/spend_limits request creates or replaces one cap from {scope, amount, period}. The API mirrors the wire shapes of Anthropic’s public Admin API spend-limits endpoints, so an HTTP client written against that contract can target the gateway by changing its base URL.
This request sets an org-wide default of $500 per month for every developer:
curl -sS https://claude-gateway.internal.example.com/v1/organizations/spend_limits \
-H "x-api-key: $GATEWAY_ADMIN_WRITE_KEY" \
-H "Content-Type: application/json" \
-d '{"scope": {"type": "organization"}, "amount": "50000", "period": "monthly"}'
This request layers a tighter $100-per-day cap on each member of the contractors group:
curl -sS https://claude-gateway.internal.example.com/v1/organizations/spend_limits \
-H "x-api-key: $GATEWAY_ADMIN_WRITE_KEY" \
-H "Content-Type: application/json" \
-d '{"scope": {"type": "rbac_group", "rbac_group_id": "contractors"}, "amount": "10000", "period": "daily"}'
| Field | Values | Description |
|---|
scope.type | user, rbac_group, organization | user targets one developer by their OpenID Connect (OIDC) sub, the stable user ID your identity provider assigns; pass it as scope.user_id. rbac_group targets an IdP group by name; pass it as scope.rbac_group_id. organization is the org-wide default. The gateway accepts all three; Anthropic’s public POST is user-only today. |
amount | Whole-number string of USD cents, or null | null is unlimited. "0" is a zero cap, which blocks every request. |
period | daily, weekly, monthly | A scope can hold one cap per period, and each enforces independently: a developer is blocked if over any of them. |
A group or organization cap is a per-seat default that each member inherits, not a shared pool. Per period, a developer’s effective cap resolves in this order: a per-user override, then the most restrictive of their group caps, then the org default, then unlimited. admin.group_limit_mode: max flips the multi-group tie-break to least-restrictive instead.
Authenticate to the admin API
Send one of:
- An
x-api-key header matching a key in admin.write_keys for full access, or admin.read_keys for GET-only access. Each key carries an id that appears in the audit log as admin-key:<id>, so give Terraform, CI, and each automation its own.
- A gateway bearer token whose
groups claim includes one of admin.admin_groups. This is full access and audits as oidc:<sub>, so prefer it for human admins.
How enforcement works
On each /v1/messages request, the gateway resolves the developer’s caps and period-to-date spend in one Postgres query. If they’re over any cap, the request returns 429 with error.type: billing_error and the header x-should-retry: false. The message is spend limit reached, followed by your admin.blocked_message if set.
/v1/messages/count_tokens is exempt. Token counting is free, so it runs regardless of cap state.
After each response, a usage meter reads token counts off the response as it streams to the client, prices them at USD list price, and increments Postgres counters for all three period buckets. The meter is a single reader on the stream, so the client’s bytes are untouched and a metering failure doesn’t break the response.
Spend limits estimate spend from token counts at USD list price; they’re a circuit breaker, not an invoice. For authoritative billing, reconcile against your provider’s own usage reporting, such as the Anthropic Usage & Cost Admin API, invocation logs on Bedrock, or Cloud Monitoring on Google Cloud.
Pricing uses the same table the Claude Code CLI uses for its own cost display, with the same model-ID canonicalization across Anthropic, Bedrock (us.anthropic.…-v1:0), Agent Platform (claude-…@date), and Foundry ID forms. A model ID the table can’t place, such as a Foundry deployment name or an inference-profile ARN, is priced at the unknown-model default tier of $5/$25 per million input/output tokens rather than zero, so an unrecognized ID can’t bypass a cap by going unmetered. The gateway warns at boot and once per ID at runtime when a model prices through the fallback.
Client aborts are billed too. The upstream reports output tokens only in the stream’s terminal frame, so an aborted stream doesn’t carry them. The meter keeps a conservative floor estimate from the streamed content size, about four characters per token, and bills it when and only when the terminal usage frame is missing. A complete stream always bills the upstream-reported count. Without this, a capped developer could stream output and abort each request immediately before the end, spending without ever being counted.
Postgres availability
The pre-check queries Postgres with a two-second timeout. If the store is unreachable or times out, enforcement fails open by default: the request proceeds and the gateway logs a warning. Set enforcement.fail_closed_on_error: true to fail closed instead, which returns the same 429 billing_error with the message spend limit unavailable. Fail-open keeps a store outage from becoming an inference outage; fail-closed guarantees no unmetered spend.
Admin API reference
The endpoints below are served under /v1/organizations/spend_limits.
| Method and path | Description |
|---|
GET /v1/organizations/spend_limits | List configured caps. Query: ?limit=&after_id=&before_id=. |
POST /v1/organizations/spend_limits | Create or replace a cap for {scope, period}. |
GET /v1/organizations/spend_limits/{id} | Fetch one cap by its spl_-prefixed ID. |
DELETE /v1/organizations/spend_limits/{id} | Delete one cap. Returns {type: "spend_limit_deleted", id}. |
GET /v1/organizations/spend_limits/effective | Resolved cap and to-date spend per principal per period. |
GET /v1/organizations/spend_limits/audit | Admin mutation trail, newest-first. Query: ?limit=. |
Conventions mirror Anthropic’s Admin API:
- A
type on every object
spl_-prefixed IDs
- Amounts as whole-number strings of USD cents;
POST rejects any other currency with 400
- The
{type: "error", error: {type, message}, request_id} error envelope
- A
request-id response header on every admin response, success or error, matching the body’s request_id
Every mutation writes a before/after row to admin_audit in the same transaction, attributed to admin-key:<id> or oidc:<sub>.
The spend_limit_increase_requests queue is not implemented.
/effective
GET /v1/organizations/spend_limits/effective returns Anthropic’s SpendSummary schema: each row is a principal for a period, with the resolved cap, period-to-date spend, and an actor object. Gateway-specific differences:
user_id is the OIDC sub.
actor.name and actor.email_address are null until the principal’s first inference request through the gateway. The gateway has no user directory; it records last-seen values from each user’s own session JWT.
- Each row also carries a
groups array, the principal’s last-seen IdP groups. This is a gateway extension so an admin UI can show every cap tier that applies; Anthropic-shaped clients ignore it.
- Without a
user_ids[] filter, it lists principals with recorded spend, because the gateway can’t enumerate all org members.
Group-sourced caps resolve against those last-seen groups with the same group_limit_mode tie-break that enforcement uses, so the viewer shows the cap that actually applies.
| Query parameter | Description |
|---|
user_ids[] | Repeatable. Filter to specific principals by OIDC sub. |
period[] | Repeatable. Filter to daily, weekly, or monthly rows. |
sort | spend_desc lists top spenders first. Requires exactly one period[]. |
q | Case-insensitive substring filter over the OIDC sub, last-seen email, and last-seen display name. |
limit / page | Page size (1–1000, default 20) and the opaque cursor from the previous response’s next_page. |
q= and user_ids[]= ride GET query strings, so any fronting proxy or load balancer captures them in its access logs. If your PII log policy is strict, scrub these parameters there.
/audit
Returns the spend-limit mutation trail: who changed which cap, before/after snapshots, and the optional reason, newest-first. has_more is exact. This endpoint follows the local Admin API conventions rather than a first-party wire shape.
The raw list pages by after_id and before_id, which are mutually exclusive spl_… IDs; results are ordered by creation and has_more reflects the traversal direction. /effective pages by the opaque next_page token passed back as ?page=, with principals ordered ascending so pages stay stable while spend is being recorded. limit is 1–1000, default 20, on both.
Data lifecycle
The gateway holds four spend-related tables; an hourly sweep enforces the retention windows:
identity_retention_days is deliberately shorter than spend_retention_months: a deprovisioned identity stops refreshing and ages out, while its anonymous spend counters remain for year-over-year reporting.
When a developer leaves, delete any per-user cap via DELETE /v1/organizations/spend_limits/{id}; their spend and identity rows age out on the retention windows above. To erase one person immediately, for offboarding or a data subject access request (DSAR), run DELETE FROM principal_emails WHERE principal = '<sub>' directly against the gateway database. That removes the only table holding their email, name, and groups. The spend and admin_audit rows reference the pseudonymous OIDC sub only and age out on their own windows.