VOOZH about

URL: https://dev.to/tacoda/the-harness-stack-4a7d

⇱ The Harness Stack - DEV Community


Ask five developers what an "agent harness" is and you will get five different answers. Some mean the model. Some mean a CLAUDE.md file. Some mean orchestration infrastructure. Everyone is building something real. But without shared vocabulary, we cannot learn from each other, cannot reason across systems, cannot even agree on where a problem lives when something goes wrong.

That is where we are with AI agent configuration. The word harness is everywhere, and it means everything. Which is another way of saying it means nothing precise enough to be useful.

This is not a minor inconvenience. In a field this young, the words we settle on shape the mental models we build. And mental models shape what we think to build next. Naming things carefully is an act of collective infrastructure.

This post proposes a taxonomy: The Harness Stack. Five named harnesses, each with a clear scope and responsibility. It is not prescriptive. You do not need all five. It is a shared map, offered as a starting point for a conversation the field needs to have.


The harness defined

A harness is the deliberately shaped configuration around an AI coding agent: everything that sits between the raw model and the work it does.

It spans the tool you chose, the global preferences that travel with you, the project-level scaffolding inside a codebase, the cross-project conventions an organization shares, and the orchestration that coordinates multiple agents at once.

A harness is not the agent. It is not the code the agent edits. It is the context that decides how the agent behaves when it encounters a task.


The five harnesses

The Model Harness

The AI coding tool itself. Claude Code, Cursor, Copilot, Pi, whatever you are running.

This is the product layer: the capabilities, interfaces, and built-in behaviors the tool ships with. You do not configure the Model Harness. You choose it. And that choice matters more than it might seem, because everything above it is built on assumptions the tool makes about how agents should work, what context they can hold, what hooks they expose.

The discipline worth cultivating here is loose coupling. Your higher-level configuration should not be written for a specific tool. It should be written for a class of tools that the Model Harness happens to satisfy today. We are not quite at the point where swapping models is frictionless, but designing toward that portability now is an investment that compounds.

The Agent Harness

How the tool is configured globally, across all your work, not just one project.

This is where memory lives, along with persistent preferences, user-level settings, and the context that travels with you from codebase to codebase. In Claude Code, this is your global CLAUDE.md. In claude.ai, it is memory and system-level instructions. The Agent Harness answers a deceptively important question: how is this agent configured to behave before it encounters any specific project?

The distinction between the Model Harness and the Agent Harness is easy to collapse and important to preserve. The tool is what it ships as. The agent is what you have made of it. That gap, between default behavior and deliberately shaped behavior, is where a surprising amount of leverage lives. An agent that understands your preferred coding style, your tolerance for verbosity, your conventions around naming and error handling, arrives at every project already partially oriented. That orientation is the Agent Harness.

The Project Harness

The codebase-level scaffolding an agent operates within.

This is where most developers are actively building right now. It is also where the tooling is most mature. A project harness includes:

  • Slash commands and MCP plugins
  • Hook scripts (PreToolUse, PostToolUse, Stop, Bash)
  • Subdirectory CLAUDE.md files scoped to specific modules
  • Characterization tests and static analysis configuration
  • Skills, sensors, rules, flywheels, and other "code as markdown" artifacts

Think of the Project Harness as terrain. It shapes what the agent encounters as it moves through your codebase: what guardrails exist, what patterns it is expected to follow, what tools are available and where. A well-designed project harness does not just constrain the agent. It makes the right path the easy path. This is the harness that has had my attention recently.

The open questions here are genuinely interesting. How granular should subdirectory context be before it becomes noise? When does a hook encode wisdom and when does it encode fear? How do you keep a project harness from calcifying, from becoming a set of rules that made sense six months ago and now just get in the way? These are craft questions, and we are only beginning to develop shared answers.

The Organization Harness

The cross-project consistency layer. And the most underbuilt harness in the stack.

If the Project Harness is the terrain of a single project, the Organization Harness is the survey that makes multiple terrains legible to the same agent. Its purpose, at any scale, is to make sure an agent moving from one project to another does not have to relearn the fundamentals. Shared conventions. Common tool configurations. Policies that apply everywhere so they do not have to be restated anywhere.

The Organization Harness does not require an enterprise. In a monorepo, it might be nothing more than a root-level CLAUDE.md and a shared lint config. For larger organizations it scales up to approved tool registries, compliance guardrails, and governance policies. But the intent is the same whether you are a solo developer across multiple repos or a platform team serving dozens of product teams.

Here is the honest state of things: almost nobody is building the Organization Harness deliberately yet. Most teams have it accidentally. A convention that emerged organically. A root CLAUDE.md someone added and others quietly inherited. That is not nothing, but it is not design.

Purpose-built tooling for this harness does not really exist yet. But the primitives do, and they are ones developers already know. A version-controlled shared repo can hold your org-level CLAUDE.md, hook templates, and lint configs. Package managers can distribute them. For teams managing multiple separate repos today, git submodules are an underrated pragmatic option: pull the org configuration into each project as a submodule, update it centrally, and let projects inherit changes on their own schedule.

MCP servers are another workaround worth considering: an internal MCP server can expose org-wide tools, prompts, and resources to any agent that connects, without each project needing to vendor the configuration. It solves the distribution problem in a different way than submodules. It does not solve the harder problems: how an org-level harness gets authored, how conflicts with project-level configuration get resolved, or how drift gets detected. Those gaps remain wherever the bytes live.

The real gap is semantic, not technical. Which makes it exactly the kind of gap that shared vocabulary can close.

This is the most interesting empty harness in the stack. As agentic workflows mature and projects multiply, inconsistency compounds quietly. The team that invests in the Organization Harness early is building something that will pay dividends in ways that are hard to attribute but impossible to miss.

The Orchestration Harness

Fleet-level coordination of agents. The harness where the products and frameworks are arriving faster than the patterns.

Devin lives here. So do CrewAI, AutoGen, LangGraph, and swarm frameworks. So does any infrastructure that treats individual agents as nodes in a larger graph: routing work between them, managing their lifecycles, composing their outputs into something coherent. This is not configuration in the traditional sense. It is choreography. The Orchestration Harness does not shape how an agent thinks. It shapes how agents relate to each other.

LangGraph makes this concrete: you define a graph of agent nodes, edges that represent conditional routing between them, and state that flows through the graph as work progresses. The harness is the graph itself, the encoded decisions about which agent handles what, under what conditions, and what happens when something fails. Devin operates similarly in spirit, if not in implementation: a task enters the system, gets decomposed, gets distributed, gets reassembled. The Orchestration Harness is what holds that process together.

What makes the Orchestration Harness genuinely hard is not the tooling. LangGraph and its peers are increasingly capable. It is the design questions that do not have settled answers yet. When a fleet of agents is doing something you did not intend, how do you know? How do you trace causation across spawned instances? How do you encode organizational intent in a way that survives decomposition into subtasks? How do you reason about failure when the failing component is itself an agent with its own harness?

These are not small questions. The Orchestration Harness is where the absence of shared vocabulary is most costly, because the systems are complex enough that imprecise language leads directly to imprecise design. And imprecise design at this scale fails in ways that are hard to diagnose and expensive to untangle.


Products do not respect the taxonomy

The reason "harness" gets muddled is that real products do not sit cleanly in one harness. They span two or three at once.

Claude Code is primarily a Model Harness, but it ships Project Harness primitives: skills, commands, the .claude/ directory shape. Cursor straddles the Model Harness and the Project Harness. CrewAI and AutoGen blur the Agent Harness and the Orchestration Harness at the same time: they define how one agent runs and how many coordinate. LangChain sprawls across the Agent Harness, the Project Harness, and sometimes the Orchestration Harness. Devin reaches into all five.

This is why the word collapses. The products are not lying. They really do span harnesses. The fix is not to pretend they do not. The fix is to name which harness a product touches when we talk about it.


A debugging ladder

The taxonomy earns its keep when something goes wrong.

When an agent behaves unexpectedly, the instinct is to poke at whatever is most visible, usually a prompt or a config file. But the question "which harness is this a problem in?" is more useful:

  • Is the tool itself underperforming for this task? (Model Harness)
  • Is global memory or agent configuration incomplete or contradictory? (Agent Harness)
  • Is a hook misconfigured, or is a subdirectory CLAUDE.md missing critical context? (Project Harness)
  • Are there conflicting conventions across projects that this agent is inheriting inconsistently? (Organization Harness)
  • Is the orchestration logic routing or spawning incorrectly? (Orchestration Harness)

Five questions. Five places to look. That is not a debugging methodology. It is what shared vocabulary makes possible.


The attention map

The taxonomy also makes the field's attention map visible. Most of the work right now is happening in the Model Harness (the tool wars), the Project Harness (the explosion of project-level scaffolding), and the Orchestration Harness (the multi-agent frameworks). The Agent Harness is catching up. The Organization Harness is empty.

If you are looking for where the next interesting work lives, look at the empty harness.


Why naming this matters

We are, collectively, in a period of rapid accumulation. Patterns are emerging faster than they are being named. The result is that knowledge stays local: buried in individual CLAUDE.md files, undocumented hook scripts, tribal conventions that do not survive team changes.

Taxonomies feel like housekeeping until suddenly they are load-bearing. The goal of the Harness Stack is not to add ceremony to a field that is moving fast. It is to give the field something specific to argue about. "We need a better harness" is unanswerable today, because the next person is allowed to interpret it however they want. "We need a better Organization Harness" is an argument you can act on.

I hold this loosely. The edges are genuinely blurry. The Agent Harness and the Project Harness blur when global memory starts referencing project-specific context. The Organization Harness and the Orchestration Harness blur when org policies begin governing agent spawning behavior. That is fine. A taxonomy does not need to be perfect to be useful. It needs to be shared.

The rule is: when you say "harness," say which one. The taxonomy is wrong somewhere. It is a first attempt. I would rather argue about whether the Organization Harness should be called something else than keep watching engineers nod at each other and walk out of the room with five different mental models.


Does this map to how you are building, or does it break somewhere meaningful? I am curious where the names hold and where they need to be argued with. If you are working in this space, I would rather have a conversation than be right.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.