![]() |
VOOZH | about |
Jun 16, 2026
One week ago "loop engineering" was a term most developers hadn't heard. Today it is trending across X with 2,200+ posts, championed by Anthropic's Boris Cherny and OpenAI's Peter Steinberger, critiqued by Matt Pocock, and joked about by everyone who has watched Claude say "You're right to push back! I over-engineered this!" 87 times in a row. Here is the full picture.
Jun 20, 2026
Every developer asking "how do I actually build one of these loops?" gets the same answer: five components, three levels, and one feedback gate that says no. This guide walks you from a blank terminal to a working autonomous agent loop in under an hour β no orchestration framework required.
Jun 19, 2026
If you need agent loops today, start at explainx.ai/loops β around 100 copy-ready workflows with kickoff prompts, guardrails, and new entries every week. Matthew Berman's Forward Future library is a similarly strong option with 26 practitioner-contributed recipes; here is how both compare and what early adopters like Theo (t3.gg) are running in production.
Every week in mid-2026, a new term lands on Hacker News β context engineering, loop engineering, harness engineering β and teams treat them as interchangeable upgrades to "prompt engineering."
They are not interchangeable. They are four layers of the same stack, each with different units of work, failure modes, and tools.
Confusing them is expensive. A team that rewrites prompts when the harness has no verification step will never fix silent failure loops. A team that builds a sophisticated harness with vague goals will burn tokens forever.
This guide maps the full stack β with diagrams, diagnostics, and links to explainx.ai's deeper guides on each layer.
| Layer | Unit of work | You design⦠| Typical artifacts | When it dominates |
|---|---|---|---|---|
| Prompt | One message | Wording, format, CoT, few-shot | System prompt text, user template | Single-turn tasks, prototyping |
| Context | One model call | Full context package | CLAUDE.md, RAG chunks, tool list, history prune rules | Multi-turn agents, RAG, cost control |
| Loop | Entire run | Trigger β goal β verify β memory | /goal, cron, Agent View, triage specs | Autonomous coding, scheduled agents |
| Harness | Runtime execution | Tool sandbox, retries, checkpoints | Claude Code, LangGraph, custom orchestrator | Production reliability, benchmarks |
Nesting rule: Harness implements loops β each loop step assembles context β context contains prompts.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HARNESS ENGINEERING β
β Runtime: tool exec, sandbox, retries, checkpoints, logs β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β LOOP ENGINEERING β β
β β Workflow: trigger, goal, actions, verification, memory β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β CONTEXT ENGINEERING (per iteration) β β β
β β β Assembly: history, RAG, tools, CLAUDE.md, MCP β β β
β β β βββββββββββββββββββββββββββββββββββββββββββββ β β β
β β β β PROMPT ENGINEERING (messages inside) β β β β
β β β β Wording: role, format, constraints, CoT β β β β
β β β βββββββββββββββββββββββββββββββββββββββββββββ β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β² β²
β β
Boris Cherny: Andrej Karpathy:
"build loops" "context engineering"
(loop + harness) (context + prompt)
Think of it like web development:
| Agent stack | Web analogy |
|---|---|
| Prompt | Copy on a single button label |
| Context | Full page layout + data fetched for this view |
| Loop | User journey / multi-step checkout flow |
| Harness | Browser, server, DB, auth, error handling |
You would not fix a broken payment API by rewriting button copy. Same logic applies here.
Definition: Optimizing the text of individual messages β usually the system prompt and current user turn β to elicit better model behavior.
Techniques: Chain-of-thought, few-shot examples, role assignment, JSON schema instructions, temperature and sampling controls, system prompt structure.
Unit of work: One message pair (system + user) or one turn in a chat.
# Before (vague)
Summarize this doc.
# After (prompt-engineered)
You are a staff engineer writing release notes for developers.
Output: 3 bullets, each β€25 words, past tense, no marketing language.
If the doc lacks version numbers, say "Version unclear" β do not invent one.
That improves a single call when the right document is already in context.
Deep dive: Context vs prompt engineering β precise distinction (two-layer guide; this post extends it to four).
Definition: Designing the full package the model conditions on for each API call β not just message wording.
Karpathy's mid-2026 framing popularized the term: the hard part is not the clever user message; it is curating what fills the context window.
| Component | Context decision | Not a prompt decision |
|---|---|---|
| System prompt | Yes (wording) | Also placement β constraints at top |
| Conversation history | β | Keep, summarize, or drop turns |
| Retrieved docs | β | Which chunks, how many tokens |
| Tool definitions | β | Expose 3 tools or 30? |
| Tool outputs | β | Full stdout vs truncated summary |
| CLAUDE.md | β | Always-loaded project rules |
| SKILL.md | β | Load on trigger only |
| MCP results | β | Live data injected at query time |
From explainx.ai's context engineering guide:
[SYSTEM β always loaded]
Project: explainx.ai monorepo. Package manager: pnpm. Tests: pnpm test --filter web.
Never edit apps/mobile without explicit ask.
[RETRIEVED β grep hit, apps/web/lib/pathway-data.ts, lines 40-95]
{relevant_snippet_only}
[TOOLS β this task only]
Read, Edit, Bash(pnpm test:*)
[USER]
Fix the failing pathway progress test.
The user message is boring on purpose. Quality comes from what surrounds it.
| Surface | Layer | Loads when |
|---|---|---|
~/.claude/CLAUDE.md | Context | Every session |
./CLAUDE.md | Context | Project session |
SKILL.md | Context | Task trigger |
| MCP servers | Context + Harness | Tool call time |
| Context Mode / sandbox MCP | Context + Harness | Isolated file reads |
Full stack breakdown: CLAUDE.md vs SKILL.md vs MCP.
Learn the discipline: Context Engineering pathway.
Definition: Designing the autonomous workflow that decides when to call the model, what goal must be satisfied, and how to know the run is done β without you typing each turn.
Addy Osmani popularized loop engineering in June 2026, building on Boris Cherny at Anthropic:
"I don't prompt Claude anymore. I have loops that are running."
That quote is about layer 3, not layer 1. The loop still contains prompts β something generates them each iteration. You stop being that something.
From What is loop engineering?:
| Component | Question it answers | Bad design symptom |
|---|---|---|
| Trigger | What starts the run? | You still paste prompts manually |
| Goal | What verifiable state ends it? | Agent "finishes" but tests fail |
| Actions | What tools can it use? | Agent can't reach GitHub/DB |
| Verification | How do we check progress? | Infinite loops or premature stop |
| Memory | What persists across steps? | Re-reads same files, repeats edits |
name: morning_p1_triage
trigger: cron "0 8 * * 1-5"
goal: zero open GitHub issues labeled P1 without assignee
actions: [github_mcp.list_issues, github_mcp.comment, github_mcp.assign]
verify: script checks assignee field on all P1 issues
memory: log file of triaged issue IDs this week
max_iterations: 20
human_gate: none # read/write on issues only
No clever phrasing β workflow architecture.
| Dimension | Prompt | Loop |
|---|---|---|
| Who drives turns | You | System |
| Duration | Seconds | Minutes to hours |
| Output | Text | Verified outcome |
| Leverage | 1Γ | 10β100Γ |
| Primary skill | Phrasing | Systems design |
Implementation guides:
| Symptom | Loop fix |
|---|---|
| Runs forever | Add max_iterations + no-progress detector |
| Stops after one file | Tighten goal; add test verification |
| Does wrong work confidently | Goal too vague β use verifiable criteria |
| Repeats same edit | Memory / checkpoint missing |
Human oversight: When to let the agent run vs gate it.
Definition: Building or configuring the orchestration code that executes loops β parsing model output, calling tools safely, managing retries, assembling context each turn, and enforcing exit conditions.
Boris Cherny and Anthropic engineers use harness engineering for the systems that prompt Claude iteratively β observe, plan, act, reflect β over hours.
The agent harness is the concrete artifact:
Goal in β context assembly β model call β parse β execute tools
β capture results β verify β loop or exit β result out
| Component | What it does | Loop vs harness |
|---|---|---|
| Task definition | Converts goal to first prompt | Loop designs; harness encodes |
| Context manager | Prunes history, injects memory | Context rules; harness implements |
| Tool executor | Sandboxed bash, MCP, file I/O | Harness |
| Loop controller | Iteration limits, exit signals | Harness |
| Verification | Runs tests, scripts, diff checks | Loop specifies; harness runs |
| Retry / checkpoint | Idempotent retries, resume | Harness |
| Observability | Logs, traces, cost meters | Harness |
LangChain's Deep Agents team reported gains on Terminal-Bench 2.0 from harness changes alone β same underlying model. The pattern generalizes:
Better harness on the same model > same harness on a better model β for many agentic tasks.
Reason: the model only sees what the harness assembles and only gets the retries the harness allows. See agent harness engineering on Terminal-Bench.
| Product | Harness features |
|---|---|
| Claude Code | Tools, hooks, permission modes, sessions, subagents |
| Cursor / Codex | IDE integration, model routing, agent modes |
| LangGraph | Stateful graphs, checkpoints, human-in-the-loop |
| OpenCode | Open-source coding agent harness |
| Custom | Your retry logic, your verification scripts |
Minimal harness philosophy: Pi agent harness (Mario Zechner).
Self-improving harnesses: Self-harness agents (arxiv).
Deep read: Anthropic engineer on loops vs single prompts.
Task: "Migrate our auth module to Clerk and make all tests pass."
Migrate auth to Clerk. Make tests pass.
Works for a toy repo. Fails on a monorepo in one shot.
CLAUDE.md with package manager, test command, auth file mapapps/web/lib/auth/* via grep β not whole repoEach iteration of the agent sees a sane package.
trigger: developer runs /goal
goal: pnpm test --filter web exits 0 AND auth routes use Clerk SDK
verify: test command + grep for legacy auth imports
max_iterations: 50
memory: PROGRESS.md updated each checkpoint
Agent runs until verified β not until it says "done."
rm -rfProduction shipping requires all four.
Use this when an agent underperforms:
| Symptom | Likely layer | First fix |
|---|---|---|
| Model misunderstands instruction wording | Prompt | Rewrite system prompt; add few-shot |
| Model lacks facts not in prompt text | Context | Add RAG, CLAUDE.md, or file read |
| Model ignores constraints mid-session | Context | Move constraints to top; repeat before user msg |
| Wrong tool selected | Context | Reduce tool surface; improve schemas |
| Quality degrades after turn 15 | Context | Summarize/prune history |
| Never completes task | Loop | Add verifiable goal + test verification |
| Completes but wrongly | Loop | Strengthen verify step |
| Repeats same action | Loop | Add memory + no-progress detector |
| Duplicate emails / double writes | Harness | Idempotent retries, checkpoints |
| Hangs on subprocess | Harness | Timeouts, kill switches |
| Can't debug what happened | Harness | Structured logging, traces |
Fix bottom-up within a layer, outer layers first across layers: if verification never runs, no prompt edit helps.
| Role focus | Primary layers | Secondary |
|---|---|---|
| Content / marketing AI | Prompt, Context | β |
| Support bot with KB | Context, Prompt | Loop (escalation) |
| Internal coding assistant | Context, Loop | Harness (CI integration) |
| Autonomous coding agent | Loop, Harness | Context |
| Platform / agent infra | Harness | Loop, Context |
Prompt engineering is table stakes β like knowing SQL if you build backends.
Context engineering is required for any agent that reads your data.
Loop engineering is required when you want autonomy measured in hours, not seconds.
Harness engineering is required when failure has a cost β money, data, reputation.
Week 1 β Prompt + context basics
System prompts guide β Context vs prompt β CLAUDE.md vs SKILL.md vs MCP
Week 2 β Loop design
What is loop engineering? β Loop architecture β Claude Code loop guide
Week 3 β Harness hardening
Agent harness guide β Hooks β Human-in-the-loop gates
Ongoing β Pathways
Context Engineering pathway Β· Loop Engineering pathway Β· MCP pathway
Prompt, context, loop, and harness engineering are not four names for the same job. They are four layers of one stack:
Karpathy named the context crisis. Cherny named the loop shift. Production teams name the harness when benchmarks move without new models.
When someone says "we need better prompts" on a long-running agent, ask: Which layer is actually failing? The answer determines whether you edit a paragraph, redesign retrieval, rewrite the goal spec, or fix retry logic.
Get the layer right β then optimize inward.
Terminology and product features reflect the agent tooling landscape as of June 29, 2026.