VOOZH about

URL: https://dev.to/nucleusos/running-claude-code-at-zero-per-token-cost-the-max-plan-oauth-shim-pattern-501a

⇱ Running Claude Code at zero per-token cost: the Max-plan OAuth shim pattern - DEV Community


If you're running Claude Code or any agentic system that calls the Anthropic Messages API, you're probably paying per token. For light use, that's fine. For multi-agent systems with parallel workloads, it adds up fast.

There's a different model: Claude Max subscription. Flat monthly cost, no per-token billing. The problem is that Max exposes a browser OAuth flow, not an API key. Your agent code expects ANTHROPIC_API_KEY. Max doesn't give you one.

We built a shim that bridges the two.

What it is

oauth.nucleusos.dev is an HTTP wrapper that exposes the Anthropic Messages API endpoint at /v1/messages. Internally, it routes each request through a Max-plan OAuth bearer token instead of an API key. The interface is 1:1 with the native Anthropic API.

To use it, set two environment variables in your agent:

ANTHROPIC_BASE_URL=https://oauth.nucleusos.dev
ANTHROPIC_API_KEY=<your-shared-secret>

Your existing Claude Code installation, any Claude SDK wrapper, or raw HTTP client keeps working without code changes. The billing model changes; the interface doesn't.

The wire shape

Each incoming request hits an HMAC-validated gate (the ANTHROPIC_API_KEY value you set is treated as a shared secret, compared with hmac.compare_digest — no timing oracle). Valid requests get the Bearer token substituted and are forwarded to api.anthropic.com. The response streams back unchanged.

Security model: the shim trusts whoever holds the shared secret. This is for personal or team use — not for handing out to arbitrary clients. If you're running it on a VPS for your own agent fleet, that's the threat model it's designed for.

Smoke test results (actual, not synthetic)

  • GET /health → 200 {"status": "ok"}
  • Wrong shared secret → 401
  • Valid request to /v1/messages with a real Claude model → returned actual Anthropic response shape, usage: {"input_tokens": 17, "output_tokens": 6}

What we added after the initial ship

The initial PR (#578, 2026-06-16) covered Claude only. Subsequent PRs added:

  • Gemini API routing on a separate port (8890) — same deployment, same auth model, second provider. One shim, Claude + Gemini.
  • Non-root Dockerfile (USER nobody) — the initial container ran as root, caught in review.
  • _HTTP_TIMEOUT_S bumped to 600 — the default 300s timeout was clipping long agent runs. Opus calls on complex prompts run long; you need the headroom.
  • gemini_keys.txt env override — you can bake a key file into the image for air-gapped deployments.

Current deployment: OCI A1 ARM (24GB, Mumbai). CPU is the binding constraint on this instance, not RAM.

The cost math

Claude Max subscription: $100/month (Pro tier) or $200/month (Max tier with higher usage limits).

Anthropic API comparison for equivalent workloads depends heavily on your token mix. For a team running Claude Code across multiple parallel sessions with shared context, API costs can exceed $200/month easily. The breakeven calculation is specific to your token volume.

We're not publishing specific numbers because our workload (5 AI agents, agentic coding sessions, multi-file context) may not generalize to yours. The pattern is worth knowing; the math is worth running against your own usage.

What this isn't

This isn't a production multi-tenant API. It's a single-org shim. If you need rate-limiting, per-user billing, or audit logs, you need more infrastructure than this.

It also doesn't give you higher rate limits than Max plan imposes. If you're hitting Max-plan throttles under heavy load, the shim won't help.

Self-hosting

The Dockerfile and compose file are in the repo. The README covers the non-root deployment, env vars, and the gemini_keys.txt override.

The shim is ~200 lines of Python (FastAPI). Not a framework. Inspectable in an afternoon.

Why we built it in-house

We're building Eidetic Works on our own agentic substrate. Claude Code is the primary execution surface. Token costs are a real operational cost for us, not a theoretical one. The shim was born from a direct cost problem, not from wanting to build infrastructure.

The telemetry endpoint we shipped in the same cohort (eidetic.works/api/telemetry/metrics) is what lets us measure whether it's working — we can now count daemon installs without relying on download counts or manual surveys.

What's next

The Gemini routing is v1 — it works but hasn't been load-tested under the same conditions as the Claude path. The next substantive addition is probably a structured logging layer so we can see which models are taking which paths at what latency. Right now it's effective but opaque.

If you build on this pattern, drop a comment below — interested in what you add.


Read more on what we're building at eidetic.works