Deterministic AI Testing with Session Recording in cagent
AI agents introduce a challenge that traditional software doesn’t have: non-determinism. The same prompt can produce different outputs across runs, making reliable testing difficult. Add API costs and latency to the mix, and developer productivity takes a hit.
Session recording in cagent addresses this directly. Record an AI interaction once, replay it indefinitely—with identical results, zero API costs, and millisecond execution times.
How session recording works
cagent implements the VCR pattern, a proven approach for HTTP mocking. During recording, cagent proxies requests to the AI provider, captures the full request/response cycle, and saves it to a YAML “cassette” file. During replay, incoming requests are matched against the recording and served from cache—no network calls required.
Getting started
Recording a session requires a single flag:
cagent run my-agent.yaml --record "What is Docker?" # creates: cagent-recording-1736089234.yaml cagent run my-agent.yaml --record="my-test" "Explain containers" # creates: my-test.yaml
Replaying uses the --fake flag with the cassette path:
cagent exec my-agent.yaml --fake my-test "Explain containers"
The replay completes in milliseconds with no API calls.
One implementation detail worth noting: tool call IDs are normalized before matching. OpenAI generates random IDs on each request, which would otherwise break replay. cagent handles this automatically.
Example: CI/CD integration testing
Consider a code review agent:
# code-reviewer.yaml agents: root: model: anthropic/claude-sonnet-4-0 description: Code review assistant instruction: | You are an expert code reviewer. Analyze code for best practices, security issues, performance concerns, and readability. toolsets: - type: filesystem
Record the interaction with --yolo to auto-approve tool calls:
cagent exec code-reviewer.yaml --record="code-review" --yolo \\ "Review pkg/auth/handler.go for security issues"
In CI, replay without API keys or network access:
cagent exec code-reviewer.yaml --fake code-review \\ "Review pkg/auth/handler.go for security issues"
Cassettes can be version-controlled alongside test code. When agent instructions change significantly, delete the cassette and re-record to capture the new behaviour.
Other use cases
Cost-effective prompt iteration. Record a single interaction with an expensive model, then iterate on agent configuration against that recording. The first run incurs API costs; subsequent iterations are free.
cagent exec ./agent.yaml --record="expensive-test" "Complex task"
for i in {1..100}; do
cagent exec ./agent-v$i.yaml --fake expensive-test "Complex task"
done
Issue reproduction. Users can record a session with --record bug-report and share the cassette file. Support teams replay the exact interaction locally for debugging.
Multi-agent systems. Recording captures the complete delegation graph: root agent decisions, sub-agent tool calls, and inter-agent communication.
Security and provider support
Cassettes automatically strip sensitive headers (Authorization, X-Api-Key) before saving, making them safe to commit to version control. The format is human-readable YAML:
version: 2
interactions:
- id: 0
request:
method: POST
url: <https://api.openai.com/v1/chat/completions>
body: "{...}"
response:
status: 200 OK
body: "data: {...}"
Session recording works with all supported providers: OpenAI, Anthropic, Google, Mistral, xAI, and Nebius.
Get started
Session recording is available now in cagent. To try it:
cagent run ./your-agent.yaml --record="my-session" "Your prompt here"
For questions, feedback, or feature requests, visit the cagent repository or join the GitHub Discussions.
Related Posts
-
May 12, 2026
Docker AI Governance: Unlock Agent Autonomy, Safely
Introducing Docker AI Governance: centralized control over how agents execute, what they can reach on the network, which credentials they can use, and which MCP tools they can call, so every developer in your company can run AI agents safely, wherever they work. Your laptop is the new prod Agents are the biggest productivity unlock…
Srini SekaranRead now
-
Jun 18, 2026
Coding Agent Horror Stories: The 13-Hour AWS Outage
Learn how an AI coding agent caused a 13-hour AWS outage and how Docker Sandboxes help reduce risk with scoped identities and isolated execution.
Ajeet Singh RainaRead now
-
Jun 16, 2026
Docker Content Trust: Retirement and Migration Guidance
Docker Content Trust (DCT) and the Notary v1 service at notary.docker.io are being fully retired (first announced in July of 2025). This blog explains what is changing, who is affected, and how to move to modern alternatives.
Julia WilsonandAditya TripathiRead now
-
Jun 15, 2026
Docker joins the Athena coalition: a cross-industry collaboration for supply chain security
AI is lowering the bar for supply chain attacks. Docker is joining the Athena alliance, a cross-industry effort to coordinate the defense of open source, building on our work to give every developer secure-by-default tools and our track record of sharing signals across the ecosystem.
Tushar JainRead now
