VOOZH about

URL: https://dev.to/wonderlab/agent-series-20-harness-in-production-from-single-file-to-reusable-package-2chd

⇱ Agent Series (20): Harness in Production — From Single File to Reusable Package - DEV Community


From Demo Code to a Reusable Package

Article 19 used a 900-line harness_full_demo.py to demonstrate eight defense layers. That file is good for explaining concepts, but not for reuse — all layers are coupled together, nothing can be tested in isolation, and nothing can be imported by another project.

A production-grade Agent project needs something you can actually import:

harness/
├── __init__.py Public API exports
├── registry.py Layer 2: ActionRegistry + PermissionLevel
├── budget.py Layer 3: PermissionBudget (with refund())
├── sandbox.py Layer 4: sanitise_input + sandboxed_eval
├── audit.py Layer 6: ImmutableAuditLog (hash-chained)
├── rollback.py Layer 7: RollbackCoordinator
└── harness.py Unified entry point: AgentHarness

This article starts with package design, covers three key API decisions, and finishes with two integration styles: standalone Python and LangGraph graph embedding.


Module Design

registry.py — Layer 2

class PermissionLevel(Enum):
 READ = 1
 WRITE = 2
 ADMIN = 3
 IRREVERSIBLE = 4

@dataclass
class RegisteredAction:
 name: str
 level: PermissionLevel
 budget_cost: int
 description: "str"
 handler: Any # Callable or BaseTool

class ActionRegistry:
 def register(self, action: RegisteredAction) -> None: ...
 def get(self, name: str) -> RegisteredAction: ... # not found → PermissionError
 def is_allowed(self, name: str) -> bool: ...
 def names(self) -> list[str]: ...

get() rather than __getitem__: raises a consistent PermissionError, without leaking the internal KeyError detail.


budget.py — Layer 3

class PermissionBudget:
 def spend(self, action_name: str, cost: int) -> None:
 if self.remaining < cost:
 raise BudgetExhaustedError(...)
 self.remaining -= cost

 def refund(self, action_name: str, cost: int) -> None:
 self.remaining = min(self.total, self.remaining + cost)

The new refund() method fixes a design flaw from Article 19: budget was deducted before approval, and never returned on rejection. The production package corrects this — when an IRREVERSIBLE action is intercepted, harness.py proactively calls refund() to keep budget accounting accurate.


sandbox.py — Layer 4

INJECTION_PATTERN = re.compile(
 r"(ignore.*(previous|above|prior)|forget.*instruction|"
 r"you are now|act as|jailbreak|bypass|"
 r"override.*system|system.*override|" # both word orders covered
 r"</s>|\n\n###|###\s*system|<\|im_start\|>|system prompt)",
 re.IGNORECASE,
)

Two subtle points:

  1. Both SYSTEM OVERRIDE (system first) and override.*system (override first) are covered
  2. \n\n### matches a real newline, not the literal string \\n\\n###

Both bugs were discovered and fixed during the adversarial tests in Article 21.


audit.py — Layer 6

class ImmutableAuditLog:
 def log(self, action, actor, target, result, metadata=None) -> str:
 entry = {..., "prev_hash": self._last_hash}
 entry["hash"] = self._hash(json.dumps(entry, sort_keys=True) + self._last_hash)
 with self._path.open("a") as f: # append-only
 f.write(json.dumps(entry) + "\n")
 return entry["hash"]

 def verify_integrity(self) -> bool:
 # Replays the hash chain; any modified field returns False
 ...

The __len__() helper lets tests use len(audit) to check entry count directly.


rollback.py — Layer 7

class RollbackCoordinator:
 @contextmanager
 def transaction(self, state: dict, op_name: str):
 snapshot = copy.deepcopy(state)
 self._snapshots.append({"op": op_name, "snapshot": snapshot})
 try:
 yield state
 except Exception:
 state.clear()
 state.update(snapshot)
 self._snapshots.pop()
 raise

 def rollback_last(self, state: dict) -> str | None:
 """Manual trigger: undo the most recent committed transaction."""
 if not self._snapshots:
 return None
 entry = self._snapshots.pop()
 state.clear()
 state.update(entry["snapshot"])
 return entry["op"]

rollback_last() enables manual rollback: after a transaction commits, the snapshot is retained until explicitly confirmed or cleared by the caller.


Unified Entry Point: AgentHarness

class AgentHarness:
 def __init__(self, budget: int = 100, log_path: str = ...):
 self.registry = ActionRegistry()
 self.budget = PermissionBudget(total=budget)
 self.audit = ImmutableAuditLog(log_path=log_path)
 self.rollback = RollbackCoordinator()
 self._state: dict = {}

 def execute(self, action_name: str, actor: str = "agent", **kwargs) -> Any:
 # Layer 4: sanitise string arguments
 # Layer 2: registry check (missing → PermissionError)
 # Layer 3: budget deduction (insufficient → BudgetExhaustedError)
 # Layer 5: IRREVERSIBLE → refund budget + raise HumanApprovalRequired
 # Layer 7: WRITE/ADMIN wrapped in rollback.transaction
 # Layer 6: audit record
 ...

 def approve_and_execute(self, action_name: str, actor: str = "human", **kwargs) -> Any:
 """Call this after catching HumanApprovalRequired to complete execution."""
 ...

Why the two methods are separate:

  • execute() is the automated path: all checks pass, execute immediately
  • approve_and_execute() is the human path: the caller explicitly signals "this has been approved"

Merging them (e.g., with an approved=False parameter) makes intent ambiguous and harder to test.


Standalone Usage

Basic Flow

harness = AgentHarness(budget=50)

# Register actions
harness.registry.register(RegisteredAction(
 "read_ticket", PermissionLevel.READ, 1, "Read Jira ticket", handler_fn))
harness.registry.register(RegisteredAction(
 "write_draft", PermissionLevel.WRITE, 3, "Write draft fix", handler_fn))
harness.registry.register(RegisteredAction(
 "create_pr", PermissionLevel.ADMIN, 8, "Open pull request", handler_fn))
harness.registry.register(RegisteredAction(
 "merge_to_main", PermissionLevel.IRREVERSIBLE, 20, "Merge to main", handler_fn))

READ → WRITE → ADMIN normal flow:

r1 = harness.execute("read_ticket", ticket_id="BUG-101")
r2 = harness.execute("write_draft", ticket_id="BUG-101", patch="fix: add null check")
r3 = harness.execute("create_pr", ticket_id="BUG-101", title="fix: BUG-101")
# read=1 + write=3 + admin=8 = 12 spent, 38 remaining

Unregistered Action Blocked

try:
 harness.execute("delete_all_data")
except PermissionError as e:
 # "Action 'delete_all_data' not in registry. Execution blocked."
 ...

IRREVERSIBLE Two-Phase Execution

try:
 harness.execute("merge_to_main", pr_id=1)
except HumanApprovalRequired as e:
 print(e.action_name) # "merge_to_main"
 print(e.action_args) # {"pr_id": 1}
 # After human review:
 result = harness.approve_and_execute("merge_to_main", pr_id=1)

Key point: when execute() intercepts an IRREVERSIBLE action, it calls budget.refund() first. The net budget cost is zero. Only approve_and_execute() actually charges the budget.

Budget Exhaustion

# budget=5, write cost=3
h = AgentHarness(budget=5)
h.execute("write_draft", ...) # OK, 2 remaining
h.execute("write_draft", ...) # BudgetExhaustedError: need 3, remaining 2

LangGraph Integration

Embedding the harness inside LangGraph's tools_node:

def tools_node(state: HState) -> dict:
 last = state["messages"][-1]
 results = []
 for tc in last.tool_calls:
 name, args = tc["name"], tc["args"]
 try:
 reg = harness.registry.get(name) # Layer 2
 harness.budget.spend(name, reg.budget_cost) # Layer 3

 if reg.level == PermissionLevel.IRREVERSIBLE:
 decision = interrupt({...}) # Layer 5: LangGraph primitive
 if decision != "approved":
 harness.budget.refund(name, reg.budget_cost)
 harness.audit.log(name, "checkpoint", ..., "HUMAN_REJECTED")
 results.append(ToolMessage(content="rejected", ...))
 continue

 if reg.level in (WRITE, ADMIN):
 with harness.rollback.transaction(harness._state, name): # Layer 7
 output = TOOL_MAP[name].invoke(args)
 else:
 output = TOOL_MAP[name].invoke(args)

 harness.audit.log(name, "agent", ..., "EXECUTED") # Layer 6
 results.append(ToolMessage(content=str(output), ...))

 except PermissionError as e:
 harness.audit.log(name, "registry", ..., "BLOCKED")
 results.append(ToolMessage(content=str(e), ...))
 except BudgetExhaustedError as e:
 results.append(ToolMessage(content=str(e), ...))

 return {"messages": results}

tools_node is the harness's natural insertion point: it intercepts before tool execution without touching any agent_node (reasoning layer) logic.


Article 21 Test Results (45/45)

This package's behavior is fully verified by Article 21's test suite:

Functional (Layer 1–7 basic behaviour) ████████████████████████████████ 19/19 PASS
Adversarial (injection / escalation) ████████████████████████████████ 17/17 PASS
Chaos (fault injection / partial) ████████████████████████████████ 9/ 9 PASS

Total 45/ 45 tests passed

Two real bugs found by the tests:

  1. INJECTION_PATTERN only matched override.*system, missing [SYSTEM OVERRIDE] (reversed word order)
  2. \\n\\n### matched the literal string \n, not a real newline — jailbreak pattern ### System: slipped through

Both fixed in sandbox.py with a one-line regex adjustment.


Design Checklist

Package Structure

  • [ ] One file per layer; each file does exactly one thing
  • [ ] __init__.py exports only the public API; internal classes stay private
  • [ ] AgentHarness acts as Facade; callers don't reach into subsystems directly

API Design

  • [ ] execute() is the automated path covering the full Layer 2→7 chain
  • [ ] approve_and_execute() is the human path; the caller signals "approved"
  • [ ] Budget is refunded (refund()) when IRREVERSIBLE is intercepted, keeping accounting accurate
  • [ ] All exception types (PermissionError / BudgetExhaustedError / HumanApprovalRequired) exported from __init__.py

Sandbox

  • [ ] Injection pattern covers both forward and reverse word orders
  • [ ] \n is a real newline character, not the literal \\n

LangGraph Integration

  • [ ] Harness is embedded only in tools_node, not in agent_node
  • [ ] Each tool call runs through the harness check chain independently
  • [ ] IRREVERSIBLE uses LangGraph interrupt(), not a Python exception

Summary

Five core conclusions:

  1. Modularity is a prerequisite for testability: you can't test a single layer in isolation when everything is one file; splitting into a package lets each module be independently mocked and verified
  2. Refund budget on IRREVERSIBLE interception: the Article 19 design flaw, fixed here — "intercept before charging" is cleaner than "charge then refund," though both are valid; pick one and document it
  3. Separating execute() and approve_and_execute() makes intent explicit: automated and human paths are distinct; caller intent is unambiguous
  4. Tests found real production bugs: two regex vulnerabilities were invisible during development; adversarial tests exposed them on the first run
  5. LangGraph's tools_node is the harness's natural slot: no changes to agent logic needed; add the harness only at the tool execution layer, keeping concerns separated

References


Check out PrimeSkills — a curated marketplace of AI agents and skills that have been validated in real-world, enterprise-grade workflows. No fluff, just what actually works.

Find more useful knowledge and interesting products on my Homepage