Voozh

--- title: "Your MCP Server Is Eating Your Context Window. There's a Simpler Way" description: "TL;DR: MCP tool definitions can burn 55,000+ tokens before an agent processes a single user message. We built the Apideck CLI as an AI-agent interface instead:an ~80-token agent prompt replaces tens of thousands of tokens of schema, with progressive disclosure via `--help` and structural safety baked into the binary. Any agent that can run shell commands can use it. No protocol support required." author: "Samir Amzani" published: "2026-03-16T08:00+00:00" updated: "2026-03-17T11:28:06.432Z" url: "https://www.apideck.com/blog/mcp-server-eating-context-window-cli-alternative" tags: ["AI", "Industry insights", "Product"] --- # Your MCP Server Is Eating Your Context Window. There's a Simpler Way ## The problem demos never show you Here's a scenario that'll feel familiar if you've wired up MCP servers for anything beyond a demo. You connect GitHub, Slack, and Sentry. Three services, maybe 40 tools total. Before your agent has read a single user message, [55,000 tokens of tool definitions](https://www.mmntm.net/articles/mcp-context-tax) are sitting in the context window. That's over a quarter of Claude's 200k limit. Gone. It gets worse. Each MCP tool costs [550-1,400 tokens](https://www.mmntm.net/articles/mcp-context-tax) for its name, description, JSON schema, field descriptions, enums, and system instructions. Connect a real API surface, say a SaaS platform with 50+ endpoints, and you're looking at 50,000+ tokens just to describe what the agent *could* do, with almost nothing left for what it *should* do. One team [reported](https://www.agentpmt.com/articles/thousands-of-mcp-tools-zero-context-left-the-bloat-tax-breaking-ai-agents) three MCP servers consuming 143,000 of 200,000 tokens. That's 72% of the context window burned on tool definitions. The agent had 57,000 tokens left for the actual conversation, retrieved documents, reasoning, and response. Good luck building anything useful in that space. This isn't a theoretical concern. David Zhang ([@dzhng](https://x.com/dzhng/status/2029518820872945889)), building Duet, described ripping out their MCP integrations entirely, even after getting OAuth and dynamic client registration working. The tradeoff was impossible: - **Load everything up front** → lose working memory for reasoning and history - **Limit integrations** → agent can only talk to a few services - **Build dynamic tool loading** → add latency and middleware complexity He called it a "trilemma." And the numbers hold up under controlled testing. A [recent benchmark by Scalekit](https://www.scalekit.com/blog/mcp-vs-cli-use) ran 75 head-to-head comparisons (same model, Claude Sonnet 4, same tasks, same prompts) and found MCP costing **4 to 32x more tokens** than CLI for identical operations. Their simplest task, checking a repo's language, consumed 1,365 tokens via CLI and 44,026 via MCP. The overhead is almost entirely schema: 43 tool definitions injected into every conversation, of which the agent uses one or two. ## Three approaches to the same problem The industry is converging on three responses to context bloat. Each has a sweet spot. ### MCP with compression tricks The first response is to keep MCP but fight the bloat. Teams compress schemas, use tool search to load definitions on demand, or build middleware that slices OpenAPI specs into smaller chunks. This works for small, well-defined interactions like looking up an issue, creating a ticket, or fetching a document. MCP's structured tool calls and typed schemas are genuinely useful when you have a tight set of operations that agents use frequently. But it adds infrastructure. You need a tool registry, search logic, caching, and routing. You're building a service to manage your services. And you're still paying per-tool token costs every time the agent decides it needs a new capability. ### Code execution The code execution approach treats the agent like a developer with a persistent workspace. When the agent needs a new integration, it reads the API docs, writes code against the SDK, runs it, and saves the script for reuse. Duet pioneered this pattern by letting agents write and maintain their own integration scripts. This is powerful for long-lived workspace agents that maintain state across sessions and need complex workflows: loops, conditionals, polling, batch operations. Things that are awkward to express as individual tool calls become natural in code. There's a more targeted variant worth watching: [Code Mode](https://blog.cloudflare.com/code-mode/). Instead of writing arbitrary code against raw APIs, the agent writes short orchestration scripts that call structured MCP tools underneath. A [benchmark by Sideko](https://www.portofcontext.com/blog/cli-vs-mcp-vs-code-mode) across 12 Stripe tasks showed Code Mode MCP using 58% fewer tokens than raw MCP and 56% fewer than CLI. The key insight: on multi-step tasks like creating an invoice with line items, CLI required 19 LLM round trips, raw MCP needed 12, and Code Mode collapsed it to 4. The agent writes a TypeScript program that handles the looping internally, without going back to the LLM at each step. This matters because CLI's efficiency advantage, which is real for single-step discovery and reads, can erode on complex chained writes where each round trip compounds context. Code Mode offers a middle ground: structured tool access without the schema bloat, plus the ability to batch operations without per-step LLM overhead. The tradeoff is that your agent is writing and executing code against production APIs. Even sandboxed, the safety surface is larger than a CLI with structural permissions. You need review mechanisms and trust in your agent's judgment. But for workflows that involve loops and dependent state, it's a pattern worth considering alongside CLI. ### CLI as the agent interface The third approach is the one we took. Instead of loading schemas into the context window or letting the agent write integration code, you give it a CLI. A well-designed CLI is a progressive disclosure system by nature. When a human developer needs to use a tool they haven't touched before, they don't read the entire API reference. They run `tool --help`, find the subcommand they need, run `tool subcommand --help`, and get the specific flags for that operation. They pay attention costs proportional to what they actually need. Agents can do exactly the same thing. And the token economics are dramatically different. ## Why CLIs are the pragmatic sweet spot ### Progressive disclosure saves tokens Here's what the [Apideck CLI](https://github.com/apideck-libraries/cli) agent prompt looks like. This is the entire thing an AI agent needs in its system prompt: ``` Use `apideck` to interact with the Apideck Unified API. Available APIs: `apideck --list` List resources: `apideck

URL: https://www.apideck.com/blog/mcp-server-eating-context-window-cli-alternative.md

⇱