VOOZH about

URL: https://www.explainx.ai/blog/context-prompt-loop-harness-engineering-stack-2026

⇱


← Back to blog
go deep
πŸ‘ Context vs Prompt vs Loop vs Harness Engineering: The Four-Layer Agent Stack

Related posts

Jun 16, 2026

Loop Engineering Is Now the Most-Discussed AI Skill on Developer Twitter

One week ago "loop engineering" was a term most developers hadn't heard. Today it is trending across X with 2,200+ posts, championed by Anthropic's Boris Cherny and OpenAI's Peter Steinberger, critiqued by Matt Pocock, and joked about by everyone who has watched Claude say "You're right to push back! I over-engineered this!" 87 times in a row. Here is the full picture.

Jun 20, 2026

How to Build Your First Agent Loop: A Step-by-Step Guide (2026)

Every developer asking "how do I actually build one of these loops?" gets the same answer: five components, three levels, and one feedback gate that says no. This guide walks you from a blank terminal to a working autonomous agent loop in under an hour β€” no orchestration framework required.

Jun 19, 2026

Matthew Berman Loop Library: Free Agent Workflows for Developers (2026)

If you need agent loops today, start at explainx.ai/loops β€” around 100 copy-ready workflows with kickoff prompts, guardrails, and new entries every week. Matthew Berman's Forward Future library is a similarly strong option with 26 practitioner-contributed recipes; here is how both compare and what early adopters like Theo (t3.gg) are running in production.

Every week in mid-2026, a new term lands on Hacker News β€” context engineering, loop engineering, harness engineering β€” and teams treat them as interchangeable upgrades to "prompt engineering."

They are not interchangeable. They are four layers of the same stack, each with different units of work, failure modes, and tools.

Confusing them is expensive. A team that rewrites prompts when the harness has no verification step will never fix silent failure loops. A team that builds a sophisticated harness with vague goals will burn tokens forever.

This guide maps the full stack β€” with diagrams, diagnostics, and links to explainx.ai's deeper guides on each layer.


TL;DR β€” the four layers at a glance

LayerUnit of workYou design…Typical artifactsWhen it dominates
PromptOne messageWording, format, CoT, few-shotSystem prompt text, user templateSingle-turn tasks, prototyping
ContextOne model callFull context packageCLAUDE.md, RAG chunks, tool list, history prune rulesMulti-turn agents, RAG, cost control
LoopEntire runTrigger β†’ goal β†’ verify β†’ memory/goal, cron, Agent View, triage specsAutonomous coding, scheduled agents
HarnessRuntime executionTool sandbox, retries, checkpointsClaude Code, LangGraph, custom orchestratorProduction reliability, benchmarks

Nesting rule: Harness implements loops β†’ each loop step assembles context β†’ context contains prompts.


The stack diagram

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ HARNESS ENGINEERING β”‚
β”‚ Runtime: tool exec, sandbox, retries, checkpoints, logs β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ LOOP ENGINEERING β”‚ β”‚
β”‚ β”‚ Workflow: trigger, goal, actions, verification, memory β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ CONTEXT ENGINEERING (per iteration) β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ Assembly: history, RAG, tools, CLAUDE.md, MCP β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ PROMPT ENGINEERING (messages inside) β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ Wording: role, format, constraints, CoT β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 β–² β–²
 β”‚ β”‚
 Boris Cherny: Andrej Karpathy:
 "build loops" "context engineering"
 (loop + harness) (context + prompt)

Think of it like web development:

Agent stackWeb analogy
PromptCopy on a single button label
ContextFull page layout + data fetched for this view
LoopUser journey / multi-step checkout flow
HarnessBrowser, server, DB, auth, error handling

You would not fix a broken payment API by rewriting button copy. Same logic applies here.


Layer 1 β€” Prompt engineering (innermost)

Definition: Optimizing the text of individual messages β€” usually the system prompt and current user turn β€” to elicit better model behavior.

Techniques: Chain-of-thought, few-shot examples, role assignment, JSON schema instructions, temperature and sampling controls, system prompt structure.

Unit of work: One message pair (system + user) or one turn in a chat.

Example β€” prompt-level fix

# Before (vague)
Summarize this doc.

# After (prompt-engineered)
You are a staff engineer writing release notes for developers.
Output: 3 bullets, each ≀25 words, past tense, no marketing language.
If the doc lacks version numbers, say "Version unclear" β€” do not invent one.

That improves a single call when the right document is already in context.

When prompt engineering is enough

When it is not enough

Deep dive: Context vs prompt engineering β€” precise distinction (two-layer guide; this post extends it to four).


Layer 2 β€” Context engineering (per call)

Definition: Designing the full package the model conditions on for each API call β€” not just message wording.

Karpathy's mid-2026 framing popularized the term: the hard part is not the clever user message; it is curating what fills the context window.

What lives in the context package

ComponentContext decisionNot a prompt decision
System promptYes (wording)Also placement β€” constraints at top
Conversation historyβ€”Keep, summarize, or drop turns
Retrieved docsβ€”Which chunks, how many tokens
Tool definitionsβ€”Expose 3 tools or 30?
Tool outputsβ€”Full stdout vs truncated summary
CLAUDE.mdβ€”Always-loaded project rules
SKILL.mdβ€”Load on trigger only
MCP resultsβ€”Live data injected at query time

The four context levers

From explainx.ai's context engineering guide:

  1. Content selection β€” does this token earn its place?
  2. Structure and ordering β€” constraints first; docs before user message
  3. Token budget β€” protect budget for variable content
  4. Cache placement β€” stable prefix (system + tools) before cache breakpoint

Example β€” context-level fix (same user message)

[SYSTEM β€” always loaded]
Project: explainx.ai monorepo. Package manager: pnpm. Tests: pnpm test --filter web.
Never edit apps/mobile without explicit ask.

[RETRIEVED β€” grep hit, apps/web/lib/pathway-data.ts, lines 40-95]
{relevant_snippet_only}

[TOOLS β€” this task only]
Read, Edit, Bash(pnpm test:*)

[USER]
Fix the failing pathway progress test.

The user message is boring on purpose. Quality comes from what surrounds it.

Context engineering surfaces in Claude Code

SurfaceLayerLoads when
~/.claude/CLAUDE.mdContextEvery session
./CLAUDE.mdContextProject session
SKILL.mdContextTask trigger
MCP serversContext + HarnessTool call time
Context Mode / sandbox MCPContext + HarnessIsolated file reads

Full stack breakdown: CLAUDE.md vs SKILL.md vs MCP.

When context engineering dominates

Learn the discipline: Context Engineering pathway.


Layer 3 β€” Loop engineering (workflow)

Definition: Designing the autonomous workflow that decides when to call the model, what goal must be satisfied, and how to know the run is done β€” without you typing each turn.

Addy Osmani popularized loop engineering in June 2026, building on Boris Cherny at Anthropic:

"I don't prompt Claude anymore. I have loops that are running."

That quote is about layer 3, not layer 1. The loop still contains prompts β€” something generates them each iteration. You stop being that something.

The five loop components

From What is loop engineering?:

ComponentQuestion it answersBad design symptom
TriggerWhat starts the run?You still paste prompts manually
GoalWhat verifiable state ends it?Agent "finishes" but tests fail
ActionsWhat tools can it use?Agent can't reach GitHub/DB
VerificationHow do we check progress?Infinite loops or premature stop
MemoryWhat persists across steps?Re-reads same files, repeats edits

Example β€” loop spec (not a prompt)

name: morning_p1_triage
trigger: cron "0 8 * * 1-5"
goal: zero open GitHub issues labeled P1 without assignee
actions: [github_mcp.list_issues, github_mcp.comment, github_mcp.assign]
verify: script checks assignee field on all P1 issues
memory: log file of triaged issue IDs this week
max_iterations: 20
human_gate: none # read/write on issues only

No clever phrasing β€” workflow architecture.

Loop engineering vs prompt engineering

DimensionPromptLoop
Who drives turnsYouSystem
DurationSecondsMinutes to hours
OutputTextVerified outcome
Leverage1Γ—10–100Γ—
Primary skillPhrasingSystems design

Implementation guides:

When loops fail (and it's not the prompt)

SymptomLoop fix
Runs foreverAdd max_iterations + no-progress detector
Stops after one fileTighten goal; add test verification
Does wrong work confidentlyGoal too vague β€” use verifiable criteria
Repeats same editMemory / checkpoint missing

Human oversight: When to let the agent run vs gate it.


Layer 4 β€” Harness engineering (runtime)

Definition: Building or configuring the orchestration code that executes loops β€” parsing model output, calling tools safely, managing retries, assembling context each turn, and enforcing exit conditions.

Boris Cherny and Anthropic engineers use harness engineering for the systems that prompt Claude iteratively β€” observe, plan, act, reflect β€” over hours.

The agent harness is the concrete artifact:

Goal in β†’ context assembly β†’ model call β†’ parse β†’ execute tools
 β†’ capture results β†’ verify β†’ loop or exit β†’ result out

Harness components

ComponentWhat it doesLoop vs harness
Task definitionConverts goal to first promptLoop designs; harness encodes
Context managerPrunes history, injects memoryContext rules; harness implements
Tool executorSandboxed bash, MCP, file I/OHarness
Loop controllerIteration limits, exit signalsHarness
VerificationRuns tests, scripts, diff checksLoop specifies; harness runs
Retry / checkpointIdempotent retries, resumeHarness
ObservabilityLogs, traces, cost metersHarness

Why harness beats model upgrades on benchmarks

LangChain's Deep Agents team reported gains on Terminal-Bench 2.0 from harness changes alone β€” same underlying model. The pattern generalizes:

Better harness on the same model > same harness on a better model β€” for many agentic tasks.

Reason: the model only sees what the harness assembles and only gets the retries the harness allows. See agent harness engineering on Terminal-Bench.

Products as harnesses

ProductHarness features
Claude CodeTools, hooks, permission modes, sessions, subagents
Cursor / CodexIDE integration, model routing, agent modes
LangGraphStateful graphs, checkpoints, human-in-the-loop
OpenCodeOpen-source coding agent harness
CustomYour retry logic, your verification scripts

Minimal harness philosophy: Pi agent harness (Mario Zechner).

Self-improving harnesses: Self-harness agents (arxiv).

Deep read: Anthropic engineer on loops vs single prompts.


How the layers interact on one real task

Task: "Migrate our auth module to Clerk and make all tests pass."

Prompt layer (insufficient alone)

Migrate auth to Clerk. Make tests pass.

Works for a toy repo. Fails on a monorepo in one shot.

+ Context layer

Each iteration of the agent sees a sane package.

+ Loop layer

trigger: developer runs /goal
goal: pnpm test --filter web exits 0 AND auth routes use Clerk SDK
verify: test command + grep for legacy auth imports
max_iterations: 50
memory: PROGRESS.md updated each checkpoint

Agent runs until verified β€” not until it says "done."

+ Harness layer

Production shipping requires all four.


Diagnostic β€” which layer is broken?

Use this when an agent underperforms:

SymptomLikely layerFirst fix
Model misunderstands instruction wordingPromptRewrite system prompt; add few-shot
Model lacks facts not in prompt textContextAdd RAG, CLAUDE.md, or file read
Model ignores constraints mid-sessionContextMove constraints to top; repeat before user msg
Wrong tool selectedContextReduce tool surface; improve schemas
Quality degrades after turn 15ContextSummarize/prune history
Never completes taskLoopAdd verifiable goal + test verification
Completes but wronglyLoopStrengthen verify step
Repeats same actionLoopAdd memory + no-progress detector
Duplicate emails / double writesHarnessIdempotent retries, checkpoints
Hangs on subprocessHarnessTimeouts, kill switches
Can't debug what happenedHarnessStructured logging, traces

Fix bottom-up within a layer, outer layers first across layers: if verification never runs, no prompt edit helps.


The 2026 career map

Role focusPrimary layersSecondary
Content / marketing AIPrompt, Contextβ€”
Support bot with KBContext, PromptLoop (escalation)
Internal coding assistantContext, LoopHarness (CI integration)
Autonomous coding agentLoop, HarnessContext
Platform / agent infraHarnessLoop, Context

Prompt engineering is table stakes β€” like knowing SQL if you build backends.

Context engineering is required for any agent that reads your data.

Loop engineering is required when you want autonomy measured in hours, not seconds.

Harness engineering is required when failure has a cost β€” money, data, reputation.


Practical learning path

  1. Week 1 β€” Prompt + context basics
    System prompts guide β†’ Context vs prompt β†’ CLAUDE.md vs SKILL.md vs MCP

  2. Week 2 β€” Loop design
    What is loop engineering? β†’ Loop architecture β†’ Claude Code loop guide

  3. Week 3 β€” Harness hardening
    Agent harness guide β†’ Hooks β†’ Human-in-the-loop gates

  4. Ongoing β€” Pathways
    Context Engineering pathway Β· Loop Engineering pathway Β· MCP pathway


Bottom line

Prompt, context, loop, and harness engineering are not four names for the same job. They are four layers of one stack:

Karpathy named the context crisis. Cherny named the loop shift. Production teams name the harness when benchmarks move without new models.

When someone says "we need better prompts" on a long-running agent, ask: Which layer is actually failing? The answer determines whether you edit a paragraph, redesign retrieval, rewrite the goal spec, or fix retry logic.

Get the layer right β€” then optimize inward.


Related reading

Terminology and product features reflect the agent tooling landscape as of June 29, 2026.