Voozh

You've probably seen the benchmarks by now. Bifrost does 11 microseconds. LiteLLM Python does 40-50ms. The messaging is simple: latency matters for gateways. But this misses what teams actually building with Claude Code and Codex have discovered: the real problem isn't gateway latency alone. It's that coding agents need two completely different infrastructure layers, and teams are treating them like one.

The Two Layers of Coding Agent Infrastructure

When you deploy Claude Code or Codex into a team, you're actually solving two separate problems:

1. The Control Plane: Agent Management & Governance

What it does: Manage agent lifecycle, sessions, memory, permissions, scheduling, tool access, and audit trails.

What teams need here:

Create agents without touching provider consoles
Run agents on a schedule or via webhook
Persist session state across restarts
Control which teams/users can access which tools
Log every agent action for compliance
Manage MCP (Model Context Protocol) tool access across a team
Integrate agents with internal systems (databases, git, APIs)

Why a gateway can't do this: A gateway sees request-response pairs. It doesn't know that Agent-A is trying to access Customer Database and needs to be blocked, or that you want to run a code review agent every night at 2 AM. These are control decisions that live above the request layer.

2. The Data Plane: Fast LLM Routing & Reliability

What it does: Route LLM requests to the right provider, handle fallbacks, track costs, log traffic, enforce budgets.

Why it matters for coding agents: Each code edit, test run, or tool invocation is an LLM call. Claude Code can make 30-50 calls per task. If your gateway adds 1ms per call, that's 30-50ms of compounded overhead. With 40-50ms per Python gateway call, you're looking at 1.2-2.5 seconds of pure gateway latency on a single 30-call task. That's the latency you actually feel.

The Gap Teams Are Hitting

Most gateway discussions treat these as one problem. A fast gateway that routes LLM requests. That's necessary, but not sufficient.

Here's what I'm seeing in production teams running Claude Code:

Someone needs to create agents without coding → Control plane job
Agents need to persist across pod restarts → Control plane job
Tool definitions need to stay in sync → Control plane job
Agent A needs MCP server X, Agent B can't have it → Control plane job
We need to route to Claude when latency matters, fallback to Gemini on rate limits → Data plane job
We need sub-millisecond overhead so 30 calls don't add 1.5 seconds → Data plane job
We need cost tracking per agent → Data plane job
We need audit logs of every tool call → Hybrid: agent control + data logging

You can't do #1-4 with a pure gateway. You can't efficiently do #5-8 with a control platform that doesn't understand routing.

Teams are currently solving this by bolting together:

A managed service for agent orchestration (Bedrock, Anthropic Hosted)
A separate gateway for LLM routing (Bifrost, Portkey, Kong)
Custom scripts to sync them
Extra glue for observability

That works, but it creates operational friction.

What Production Teams Actually Need

The teams I've talked to who are scaling Claude Code and Codex beyond a single developer describe it like this:

"Claude Code is incredible when it's one engineer using it locally. The moment we try to run it on a team, we need:

A place to define 'these are our agents' (not 30 copies in 30 notebooks)

A way to say 'this agent runs on a schedule'

Control over which tools each agent can access

Visibility into what each agent is doing

The ability to route to Claude or Gemini based on task type without editing the agent

Sub-millisecond gateway latency so 30 LLM calls don't turn into 1.5 seconds of overhead"

That's control plane + data plane thinking. Two distinct layers.

Where the Separation Matters

Control Plane (Agent Platform)

Harness abstraction: swap Claude Code ↔ Codex ↔ OpenCode without rewriting agents
Session persistence: pause an agent, restart, pick up from where it left off
Scheduling: cron, webhooks, API triggers
Tool management: centralized MCP registry, per-agent capability matrix
Memory: persistent context across runs
Access control: who can create/run/modify which agents

Example: You define an agent in the platform UI, attach it to Claude Code, give it GitHub + AWS MCPs, schedule it to run every night on your codebase's failing tests, and it automatically creates PRs with fixes. All without anyone touching provider consoles.

Data Plane (Fast Gateway)

Sub-millisecond routing: multiple calls per task, latency compounds
Multi-provider routing: route to Claude for complex tasks, Gemini for simple ones
Fallback chains: if Claude rate-limits, automatically try Gemini
Cost tracking: per-provider, per-model, per-agent visibility
Budget enforcement: hard caps that prevent runaway costs
Observability: structured logs of every LLM call

Example: Your coding agent makes 40 calls. Gateway adds <1ms per call (40 microseconds total). Control plane tracks that this agent used Claude on 25 calls, Gemini on 15, cost $0.23. If Claude hits rate limits, gateway transparently retries on Gemini without the agent knowing.

Recent Signals That This Separation Is Hardening

Claude Code: Now supports Hooks for policy enforcement at key lifecycle events (TaskStarted, ToolCall, TaskCompleted). That's control-plane-level thinking inside the agent harness.

Codex: Added Managed Agents API for creating/running agents from your own infrastructure. That's recognizing the control plane problem.

LiteLLM-Rust: Launched June 2026 specifically for "coding agent workloads" with <1ms target on Claude Code calls, integrated sandbox support (E2B, Daytona), and durable sessions on the roadmap. That's explicitly targeting the data plane + agent runtime integration.

TrueFoundry, Kong, Portkey: All shipping "agent gateway" features that blur the line — they're trying to build control + data in one platform.

The market is recognizing that governance + routing are different concerns, even if some platforms try to unify them.

The Practical Decision Framework

If you're running Claude Code on a single developer or small team:

Local Claude Code + direct API calls to Claude/OpenAI
No gateway needed yet

If you're scaling Claude Code or Codex across a team:

Pick your control plane: self-hosted (LiteLLM Agent Platform), managed (Bedrock), or API-first (Codex API)
Pick your data plane: fast gateway (LiteLLM-Rust, Bifrost) or managed routing (OpenRouter, Portkey)
Ensure they can talk: same config format, shared database, compatible APIs

If you're mixing multiple harnesses (Claude Code + Codex + OpenCode):

Control plane becomes critical: you need abstraction over harness differences
Data plane must support all providers those harnesses call

If you need compliance/audit/data residency:

Control plane must self-host
Data plane must self-host
Consider platforms that do both (LiteLLM Agent Platform + LiteLLM-Rust + LiteLLM core)

What I'd Measure

When evaluating control + data infrastructure for coding agents:

Control Plane:

Can I create an agent in a UI? How much YAML/JSON?
Does it understand session state? Can agents pause and resume?
Can I attach MCPs to agents and control per-agent access?
Do audit logs show me every tool call and agent action?
Can I schedule agents on cron/webhook/API?

Data Plane:

What's the measured overhead per request? (target: <1ms for coding agents)
Does it support all the providers my harnesses call?
Can I route based on task type or cost? (not just round-robin)
Does it fail gracefully when a provider is rate-limited?
Is cost/token tracking per-agent visible?

Integration:

Do the two layers use the same config format?
Do they share the same database or API?
Can I swap data planes without redeploying agents?

The Unsexy Part

The sexiest discussion is always about latency. But in teams I talk to running Claude Code at scale, the conversation goes:

"Okay, gateway overhead is solved. Now: how do we keep prod from running experiments? How do we keep Agent-A from accessing the customer database? How do I know what happened yesterday when Agent-B deleted something? Can I run this every night? Can I give the junior engineer the ability to create agents without giving them API access?"

That's all control plane work. And it's unglamorous, but it's what stops you from sleeping at 3 AM.

Wrapping Up

Coding agents (Claude Code, Codex, OpenCode) need two halves:

Control Plane: Agent creation, sessions, memory, scheduling, tool governance, audit trails. This is where you keep your team safe.
Data Plane: Sub-millisecond LLM routing, fallbacks, cost tracking, multi-provider support. This is where you keep your latency low.

A fast gateway solves problem #2. A control platform solves problem #1. Both are table stakes for production.

The platforms that will win here are the ones that make the separation clear and let teams pick the right tool for each job — or that do both well without unnecessary coupling.

What's your experience been? Are you running coding agents on a team? What's the first thing that broke when you tried to scale from one developer to five?

Resources

LiteLLM Agent Platform docs — Control plane for multi-harness agent management
LiteLLM-Rust launch — Rust-based data plane targeting <1ms overhead on coding agents
LiteLLM core routing & fallbacks — Multi-provider routing for LLM calls
Anthropic Hooks for Claude Code — Policy enforcement at agent lifecycle events
OpenAI Codex Managed Agents API — API-first control of Codex agents

Paul Twist is an AI infrastructure engineer based in Berlin, focused on production agent systems and multi-provider LLM routing.

URL: https://dev.to/paultwist/why-coding-agents-need-two-halves-of-infrastructure-control-plane-fast-data-plane-2eda

⇱ Why Coding Agents Need Two Halves of Infrastructure: Control Plane + Fast Data Plane - DEV Community