VOOZH about

URL: https://glama.ai/mcp/servers/WasmAgent/wasmagent-js

โ‡ฑ wasmagent-mcp-server by WasmAgent | Glama


wasmagent

WASM Agent Kernel & Portable Code Executor

๐Ÿ‘ npm version
๐Ÿ‘ License: Apache-2.0
๐Ÿ‘ CI
๐Ÿ‘ Docs
๐Ÿ‘ Glama MCP server

TypeScript agent runtime with WASM sandboxing, prompt-cache optimization, and parallel quality runners.

Build production-grade AI agents in TypeScript โ€” code-execution agents, tool-calling agents, or multi-path reasoning pipelines โ€” with built-in cost controls and Cloudflare Workers deployment.

# For Anthropic (Claude)
npm add @wasmagent/core @anthropic-ai/sdk

# For OpenAI / compatible endpoints (Ollama, vLLM, etc.)
npm add @wasmagent/core openai

๐Ÿ“š Docs site: https://WasmAgent.github.io/wasmagent-js/ ยท Getting started in 5 min: docs/guides/getting-started.md ยท Benchmarks: docs/benchmarks.md ยท Changelog: CHANGELOG.md ยท API stability: docs/strategy/api-stability.md ยท Strategy memo: docs/strategy/2026-06-competitiveness.md ยท Trust Page (D4): docs/strategy/trust.md ยท Enterprise security face: docs/strategy/security-face.md

๐Ÿ†• 2026-06-17 strategy update: code-mode is now table stakes (CF / OpenAI / Anthropic all ship it). The differentiation tightened to portable executor + governance + paired-statistics referee. New: docs/security/capability-manifest-owasp.md (OWASP Agentic Top 10 mapping) ยท docs/strategy/2026-06-17-update.md (delta on top of S1โ€“S4) ยท docs/reports/arm-batch-grammar-2026-06-17/analysis.md (worked example: how evals-runner falsified our own hypothesis in 30 minutes).

๐ŸŽฏ 2026-06-18 โ€” GoalDirectedAgent shipped. New first-class loop primitive: agent synthesises its own success criteria, verifies them deterministically (or with adversarial-defaulted LLM judge), retries with hints. The eighth axis of differentiation โ€” see docs/guides/goal-directed.md. One-shot ToolCallingAgent is still the default; goal-directed is opt-in for tasks where "did this actually deliver" matters.

๐Ÿค Looking for a co-maintainer. @wasmagent/core@1.0.0 is on the calendar for 2026-12-15. If you ship to the Vercel AI SDK / Mastra / Claude Agent SDK / OpenAI Agents JS / Cloudflare Agents / LangGraph.js communities and want npm-publish + merge rights, see CONTRIBUTING.md and GOVERNANCE.md. Release cadence ledger: docs/strategy/release-cadence-log.md. Sandbox-escape SLA drill log: docs/strategy/security-drill-log.md.


Nine differentiation axes โ€” at a glance

What this table is. The eight surfaces where wasmagent is doing something the other JS agent frameworks (Vercel AI SDK, OpenAI Agents JS, Mastra, LangGraph.js, CF Agents SDK, smolagents-ts) are not โ€” at least not all in one place. Each row links to the guide that explains it. The detailed feature grid below this section breaks each axis into the specific cells where competitors do or don't ship the same capability. [Last verified 2026-06-18.]

๐Ÿ†• A ninth axis is in flight. "Adaptive execution" โ€” tool fallback when a tool fails, tool synthesis when none fits, goal adaptation when the goal turns out unreachable. RFC drafted 2026-06-18, phased implementation. See docs/strategy/2026-06-18-adaptive-execution.md and docs/rfcs/adaptive-execution.md.

#

Axis

One-line value

Status

Doc

1

Multi-provider model adapters

One Model interface across Anthropic / OpenAI / Doubao / DeepSeek / Kimi / Qwen / GLM / MiniMax / local node-llama-cpp โ€” bring your own vendor, swap with one line.

shipped

getting-started ยท openai-compat-recipes

2

Multi-tier kernel matrix

Three execution tiers (VmKernel in-process ยท QuickJS/Pyodide/Wasmtime WASM ยท RemoteSandboxKernel microVM) under one Kernel API โ€” same CapabilityManifest (network/fs/env/cpu/memory) across every tier.

shipped

kernels/comparison

3

Cross-runtime + offline closure

Same kernel API on Node / any edge runtime / browser / air-gapped laptop. @wasmagent/model-local + WASM kernel = full agent loop, zero outbound traffic.

shipped

model-local ยท getting-started

4

Memory layers

MemoryBlockSet (Letta-style in-context blocks, prompt-cache safe) + observational memory + Checkpointer + 4 KV backends (CF KV / DO / Redis / Upstash).

shipped

memory ยท observational-memory

5

Durable workflow engine

LocalWorkflowEngine + CloudflareWorkflowEngine against the same WorkflowDefinition โ€” observable, terminable, resumable, with the same contract test on both.

shipped

durable-runtime

6

Code-mode (compress N MCP tools into 1)

Single execute_code tool that compresses N MCP tools into one. Token cost on tool registries that grow stays flat instead of linear. Drop-in for Cloudflare codemode DynamicWorkerExecutor / OpenAI Agents SDK / Mastra.

shipped

code-mode

7

Devtools + GenAI semconv OTel

Zero-deploy local Studio (run-overview UI). OtelBridge emits standard gen_ai.* attributes (Datadog / Honeycomb / Grafana GenAI compatible) alongside legacy names.

shipped

devtools ยท otel-exporter

8

Goal-directed loop

Agent synthesises its own success criteria, executes, verifies (deterministic + adversarial-defaulted LLM judge), retries with hints. The user states a goal; the framework supplies the loop.

shipped 2026-06-18

goal-directed ยท baseline ยท strategy

9

Adaptive execution

When a tool fails, framework offers registered alternatives (L1). When none fits, agent synthesises a one-off via execute_code (L2). When the goal itself looks unreachable, agent proposes a relaxed criteria set the caller can accept/reject (L3). Repeat-hint stop-loss bounds blast radius.

fully shipped + paired-stat verified 2026-06-18

strategy ยท RFC ยท ablation

The eight (soon nine) axes compose. A goal-directed run can use a local model (axis 1+3) running in a WASM kernel (axis 2), with a verifier that checks workspace state via the workflow engine (axis 5), traced through OTel (axis 7), and persisted to memory (axis 4). The eighth axis raises the ceiling on what the first seven can deliver. The ninth, when it lands, will let that ceiling survive a tool failing or a goal turning out infeasible.


Related MCP server: Code Executor MCP Server

Comparison

There are several mature TypeScript agent frameworks. Here is an honest assessment of where wasmagent fits.

Last verified: 2026-06-15. Each โš ๏ธ/โŒ cell links to its source on the column header's project. The "sandbox" rows have been re-framed (D2, 2026-06-13) so that having some sandbox is no longer the differentiator โ€” three competitors now ship one. The remaining axes โ€” isolation tier composability, cross-runtime neutrality, offline closure โ€” are what no other framework offers in one package.

Vercel AI SDK

LangGraph.js

OpenAI Agents JS

Mastra

CF Agents SDK

wasmagent

npm downloads/month

~57M

~10M

~3.8M

~4M

~3.2M

early-stage

ToolCallingAgent

โœ…

โœ…

โœ…

โœ…

โœ…

โœ…

Sandboxed code execution

โŒ (none in core)

โŒ (none in core)

โœ… SandboxAgent โ€” Unix-local / Docker / hosted

โœ… Workspace โ€” E2B / Daytona / Modal / Blaxel / Railway

โœ… @cloudflare/sandbox container

โœ… kernels

Isolation tiers โ€” composable in one process

n/a

n/a

โš ๏ธ 1 tier (process / container, picked at run time per client)

โš ๏ธ 1 tier per provider (you swap providers, not tiers)

โš ๏ธ 1 tier (container-per-DO, vendor-bound)

โœ… 3 tiers, swap with one line โ€” VmKernel (in-process) / WASM kernel (QuickJS / Pyodide / Wasmtime) / RemoteSandboxKernel (microVM); factory.createKernel() selects per call

Cross-runtime neutrality

โš ๏ธ Node + edge runtime patches

โœ… Node + edge

โš ๏ธ Node + Docker hosts (sandbox path needs a host process)

โš ๏ธ provider-specific (each provider has its own runtime constraint)

โŒ Cloudflare-only (sandbox is a CF Container)

โœ… same kernel API on Node, any edge runtime, browser, and offline laptop

Offline / air-gapped closure

โŒ requires provider HTTP

โŒ requires provider HTTP

โŒ Sandbox + model both need network

โŒ all sandbox providers are cloud SaaS

โŒ vendor-bound

โœ… @wasmagent/model-local + WASM kernel = full agent loop with zero outbound traffic

Python execution (edge-safe, no container)

โŒ

โŒ

โŒ (containers required)

โŒ (containers required)

โŒ

โœ… Pyodide-in-WASM, runs inside a Worker isolate

Anthropic prompt-cache management

โš ๏ธ pass-through

โš ๏ธ pass-through

โš ๏ธ via adapter

โš ๏ธ pass-through

โŒ

โœ… auto breakpoints + 1h TTL

Self-consistency / reflect-refine runners

โŒ

โŒ manual

โŒ

โŒ

โŒ

โœ… built-in

Budget forcing

โŒ

โŒ

โŒ

โŒ

โŒ

โœ…

DAG tool scheduler + speculative exec

โŒ

โš ๏ธ graph-level

โŒ

โš ๏ธ workflow graph

โŒ

โœ…

Long-history compaction

โš ๏ธ syntactic prune

โŒ manual

โŒ

โš ๏ธ observational memory

โŒ

โœ… model-summarised

MCP support

โœ…

โœ…

โœ…

โœ…

โœ…

โœ…

Cloudflare Workers

โš ๏ธ partial

โœ…

โš ๏ธ experimental

โš ๏ธ alpha

โœ… native

โœ…

UI hooks (React/Next.js)

โœ… best-in-class

โŒ

โŒ

โš ๏ธ via AI SDK

โš ๏ธ

โœ… useAgentRun

Provider integrations

40+

300+

OpenAI-primary

40+

CF Workers AI

Anthropic ยท OpenAI ยท Doubao ยท DeepSeek ยท Kimi ยท Qwen ยท GLM ยท MiniMax

Evals framework

โŒ

โš ๏ธ LangSmith

โŒ

โœ… 12+ scorers

โŒ

โœ… 16 scorers + 2 multi-criterion judges

Observability (OTel)

โš ๏ธ LangSmith

โš ๏ธ LangSmith

โŒ

โœ…

โŒ

โœ… OtelBridge + GenAI semconv

Retry / resilience

โœ…

โœ…

โœ…

โœ…

โœ…

โœ… RetryPolicy

Durable workflows / checkpointing

โœ… DurableAgent (AI SDK 6)

โœ… LangGraph

โŒ (Assistants API retiring 2026-08-26)

โš ๏ธ partial

โœ… Durable Objects

โœ… Checkpointer + 4 backends (CF KV / DO / Redis / Upstash)

SSE Last-Event-ID resume

โš ๏ธ via DurableAgent

โœ… runtime

โŒ

โŒ

โŒ

โœ… EventLog primitive + worker-native

HITL persisted suspend/resume

โœ…

โœ…

โŒ

โš ๏ธ partial

โš ๏ธ via DO

โœ… stateless /resume endpoint, hours-to-days durations

Embedded local LLM (in-process, offline)

โš ๏ธ via Ollama HTTP

โš ๏ธ via Ollama HTTP

โŒ

โš ๏ธ via Ollama HTTP

โŒ

โœ… @wasmagent/model-local โ€” node-llama-cpp + grammar-constrained tool calls + multi-mirror downloads (HF / hf-mirror / ModelScope)

Where competitors are stronger

  • Vercel AI SDK โ€” If you're building a chat UI with Next.js, use this. The React hooks (useChat, useAgent), DurableAgent for stateful/resumable workflows (AI SDK 6), native MCP support, and DevTools panel are all best-in-class. 57M monthly downloads.

  • LangChain/LangGraph.js โ€” If you need 300+ integrations (vector stores, document loaders, obscure providers) or graph-based durable workflows with checkpointing and human-in-the-loop, LangGraph is battle-tested at LinkedIn, Uber, and GitLab scale.

  • Mastra โ€” Best eval framework (12+ built-in scorers including trajectory and tool accuracy). Strong developer onboarding. Their "Observational memory" pattern was first-mover; wasmagent now ships an equivalent (ObservationalMemory) plus extra prompt-cache-aware compression โ€” see docs/guides/observational-memory.md.

  • Cloudflare Agents SDK โ€” If you're building on Cloudflare specifically, Durable Objects give you stateful agents with persistent scheduling that nothing else matches natively.

  • OpenAI Agents JS โ€” If your stack is OpenAI-only and you want first-party support, the cleanest path. The 2026-04 release added SandboxAgent with Unix-local, Docker, and hosted clients; for OS-level isolation backed by OpenAI itself, this is the path of least resistance.

Where wasmagent is differentiated

  • Three isolation tiers under one swappable interface (D2, 2026-06-13). OpenAI Agents JS now ships SandboxAgent (Unix-local / Docker / hosted) and Mastra ships Workspace providers (E2B / Daytona / Modal / Blaxel / Railway). The differentiator is no longer "has a sandbox" โ€” it's that wasmagent exposes three tiers (VmKernel in-process, QuickJSKernel / PyodideKernel / WasmtimeKernel true WASM, RemoteSandboxKernel microVM) under one Kernel interface, swap them with one line at call time, and apply one CapabilityManifest (network/fs/env/cpu/memory) across every tier. Competitors give you one tier wired to one provider. See docs/kernels/comparison.md for the decision tree.

  • Cross-runtime neutrality. Cloudflare's sandbox is fast on Cloudflare. Mastra's providers are SaaS. wasmagent kernels run on Node, on any edge runtime, in a browser tab, and on a laptop with the network unplugged โ€” same Kernel API, same security manifest. This is the structural advantage no platform-bound competitor can match.

  • Offline / air-gapped closure. @wasmagent/model-local (node-llama-cpp + grammar-constrained tool calls + multi-mirror downloads HF / hf-mirror / ModelScope) plus a WASM kernel = full agent loop with zero outbound traffic. For compliance-bound and air-gapped deployments, no other framework gives you this without writing the integration yourself.

  • Durable runtime โ€” Same KvBackend powers checkpoints, the SSE event log, and structured memory. Four production backends ship out of the box (Cloudflare KV / Durable Objects / Redis / Upstash REST). A paused await_human_input survives worker recycle for hours/days; POST /resume is stateless. See docs/guides/durable-runtime.md.

  • Quality runners โ€” Self-consistency with answer extraction (boxed / last-line / custom), reflect-refine, budget forcing ("Wait" prefill), and parallel fork-join are not shipped as first-class APIs by any competitor.

  • Anthropic prompt-cache optimization โ€” Framework actively manages cache_control breakpoint placement across multi-turn history, supports the 1-hour extended TTL (ttl:"1h"), and reports per-TTL cache usage. Competitors pass through or validate limits but do not optimise placement.

  • Speculative tool execution โ€” Read-only, idempotent tools are pre-executed ahead of write barriers within a DAG step. The scheduler is awakened by $<callId> dependency references in the system prompt, enabling true parallel + ordered hybrid scheduling. No competitor implements this.

  • GenAI semantic conventions โ€” OtelBridge emits standard gen_ai.* attributes (Datadog / Honeycomb / Grafana GenAI view compatible) alongside legacy names, switchable via semconvMode.

  • Observational Memory + cache-stable prefix (A1) โ€” Background "observer" model continuously compresses history into ranked observation paragraphs. The compressed prefix is byte-stable so Anthropic prompt cache hits stay hot across observations โ€” Mastra's reference work has no equivalent. ~22% of baseline tokens on a 50-turn synthetic trace; see examples/benchmarks/observational-memory.mjs.

  • Time-travel debugger (A2) โ€” @wasmagent/devtools exposes the existing EventLog + Checkpointer data through a navigable step timeline + "fork from any step" UI. LangGraph Studio's headline feature, shipped as a tiny opt-in package (logic core ~250 LOC, React UI optional).

  • Skills + lifecycle hooks (A3) โ€” SkillRegistry for progressive instruction/tool disclosure (Claude Agent SDK / CrewAI v1.12 convention). ToolPostHook chain (redact, truncate, audit) sits beside the existing ToolGuardrail โ€” pre/post symmetry without confusing block vs transform semantics.

  • Multi-criterion LLM judges (A4) โ€” judgeScorer extends llmJudge with weighted criterion-level scoring + configurable scale. Two built-in judges (trajectoryQualityJudge, answerCompletenessJudge) work with any cheap Model adapter so judges run on Haiku/Doubao while the agent stays on Sonnet/Opus.

  • Reproducible benchmarks โ€” Every percentage in this README (โˆ’37%, 72โ†’90%, โˆ’85%, โˆ’84%) is verified by an offline benchmark in examples/benchmarks/. Run pnpm bench to reproduce. CI fails the PR if any number drifts outside its tolerance.

Honest caveats

wasmagent is early-stage. The differentiating features (code execution kernels, durable runtime, quality runners, speculative scheduling) are technically novel but also niche โ€” most teams pick a framework based on ecosystem breadth and documentation volume, where the mature options above win. Choose wasmagent when sandboxed code execution, durable agent runs, prompt-cache cost control, or output quality runners are first-order concerns.

Verified status

Number

Verified by

Tests passing (all packages)

1341

bun run test (CI matrix on every push) โ€” @wasmagent/core 716 ยท @wasmagent/mcp-server 47 (D1: +11 portal) ยท @wasmagent/devtools 34 ยท @wasmagent/evals-runner 31 ยท @wasmagent/aisdk 17 (D3: +3 memory) ยท @wasmagent/claude-agent-sdk 7 ยท @wasmagent/openai-agents 6 ยท others 483

README percentages reproducible

8 / 8

bun run bench โ€” runs in CI; non-zero exit blocks the PR (incl. A1 โ‰ค25% target + S1/A1 code-mode โ‰ค50% target + D1 Portal โ‰ค10% / โ‰ค66.7%)

Cross-process kill-and-resume (A1 DoD โ‘ )

โœ“ Redis + โœ“ Cloudflare KV + โœ“ Durable Object

redis.test.ts + kvAdapters.test.ts

SSE Last-Event-ID gap-free replay (A2 DoD โ‘ )

โœ“

EventLog.test.ts round-trip test

Stateless HITL resume (A3 DoD โ‘ )

โœ“

hitl.test.ts โ€” three simulated processes

Observational memory โ‰ฅ4ร— compression (A1)

โœ“ 22% of baseline

examples/benchmarks/observational-memory.mjs

Code-mode bootstrap O(1) vs direct-MCP O(N) (S1/A1)

โœ“ 13.6% of direct at N=30 tools

examples/benchmarks/code-mode-tokens.mjs

MCP Portal: O(1) bootstrap across M upstreams (D1)

โœ“ 3.1% of direct multi-MCP at M=5ร—N=30

examples/benchmarks/portal-tokens.mjs + packages/mcp-server/src/portal.test.ts (11 tests) + examples/mcp-portal/ end-to-end demo

Step-fork bundle (A2 DevTools)

โœ“ 9 unit + 8 jsdom render tests

packages/devtools/src/EventLogReplay.test.ts + react/DevTools.test.tsx

Skill lazy-load + post-hook chain (A3)

โœ“

packages/core/src/skills/Skill.test.ts + guardrails/index.test.ts

Judge scorer weighted breakdown (A4)

โœ“

packages/core/src/evals/JudgeScorer.test.ts

Paired-statistics parity vs scipy (evals-runner)

โœ“ 31 reference values to ยฑ1e-7

packages/evals-runner/src/stats/index.test.ts

Local Studio HTTP overview (A4 of 2026-06-12 plan)

โœ“

agentkit devtools --events-file <ndjson>

Framework-agnostic GenAI semconv ingest (D5)

โœ“ 9 adapter tests

agentkit devtools --otel-events-file <path> โ€” accepts NDJSON or OTLP/JSON from any producer (Vercel AI SDK, Mastra, OpenAI Agents JS, Anthropic SDK)

Multi-model evaluation across 17ร— size range

โœ“ 5 models, 2026-06-12

docs/reports/longmemeval-5model-2026-06-12.md


Features

  • Two agent modes โ€” CodeAgent (writes + executes code) and ToolCallingAgent (native tool_use)

  • Code execution โ€” three isolation tiers โ€” VmKernel (node:vm, in-process dev/test), QuickJSKernel / PyodideKernel / WasmtimeKernel (true WASM, language-level isolation, edge-safe), RemoteSandboxKernel (E2B / Cloudflare Sandbox microVM, full process isolation). Mix tiers via factory.createKernel().

  • Programmatic Tool Calling (PTC) โ€” ProgrammaticOrchestrator executes model-generated scripts inside any kernel; callTool() calls registered tools without surfacing intermediate results to the context (โˆ’37% tokens). Self-hosted alternative to Anthropic's managed PTC container.

  • Prompt-cache optimization โ€” MessageAssembler builds cache-stable prefixes; Anthropic cache_control breakpoints respect the 4-breakpoint limit, per-chunk token thresholds, and the 1-hour extended TTL (ttl:"1h"); per-TTL usage metering (5m vs 1h); OpenAI automatic prefix cache hit tracking

  • Tool deferred loading โ€” deferLoading: true on any tool (or McpToolCollection.deferAll()) excludes its schema from the system prefix and loads on-demand via Anthropic Tool Search (โˆ’85% tokens for large MCP server collections)

  • Tool Use Examples โ€” inputExamples on any tool maps to Anthropic's input_examples wire field (72%โ†’90% parameter accuracy)

  • Context editing โ€” assembler.editToolResults({ maxTokens, keepRecent }) truncates old tool outputs reversibly without breaking conversation structure (+29% task performance, โˆ’84% tokens on web search)

  • Cross-session Memory Tool โ€” createMemoryTool({ backend }) gives agents persistent read/write/list/delete memory backed by any KvBackend (Cloudflare KV, Redis, in-memory Map)

  • Quality runners โ€” majority-vote self-consistency with answer extraction (boxed / last-line / custom hook), critique-refine cycles, "Wait" prefill budget forcing, parallel fork-join with synthesis

  • DAG scheduling โ€” independent tool calls execute concurrently via Scheduler; read-only tools speculatively pre-execute ahead of write barriers; $<callId> dependency syntax in system prompt enables true data-dependency ordering; wired into ToolCallingAgent by default

  • Long-history compaction โ€” agent.assembler.compact(model, keepRecentSteps) summarises old steps; inject a custom MessageAssembler via assembler option

  • Production resilience โ€” automatic exponential backoff + jitter retry for 429 / 5xx / network errors on all model adapters; configurable via RetryPolicy

  • Evals framework โ€” runEval() with 16 built-in scorers covering correctness (exactMatch, toolCallAccuracy, trajectoryValidity, finalAnswerLength, guardrailCompliance), faithfulness, relevance, recovery, efficiency, constraints, plus two multi-criterion JudgeScorer judges (trajectoryQualityJudge, answerCompletenessJudge)

  • Evaluation harness (@wasmagent/evals-runner) โ€” runEvaluation() plus agentkit evals run CLI: multi-model ร— multi-suite ร— multi-seed Pareto reports over (accuracy, cost, p95 wall). Six reference suites cover the gaps single-task benchmarks miss (long-context recall, multi-turn memory, agent trajectory, latency-under-budget, cost-per-correct, tool-sequence). Built-in paired statistics (McNemar exact / Wilson CI / paired bootstrap / G1 gate) match scipy reference values to ยฑ1e-7. All synthetic fixtures โ€” no overlap with public training corpora.

  • Code-mode MCP server (@wasmagent/mcp-server) โ€” createCodeModeServer() collapses N downstream tools into a docs_search + execute_code two-tool MCP surface. At 30 tools the bootstrap-token cost drops to 13.6% of direct MCP (codemode-lite reported 53%); pairs with any agentkit kernel for unified security policy.

  • MCP Portal โ€” federate N upstream servers behind one neutral two-tool surface (D1, 2026-06-13) โ€” createPortalServer() wraps multiple ToolRegistry / MCP upstreams (filesystem + GitHub + memory + โ€ฆ) into one code-mode face. Bootstrap stays O(1) regardless of how many upstreams are federated; at 5 servers ร— 30 tools = 150 tools, the Portal is 3.1% of direct multi-MCP and 19.8% of code-mode-per-server (examples/benchmarks/portal-tokens.mjs). One CapabilityManifest spans every upstream โ€” the audit boundary platform-bound Portals (Cloudflare's announced version) cannot give you across heterogeneous providers. See examples/mcp-portal/.

  • AI SDK + Mastra + Claude Agent SDK + OpenAI Agents JS plugin packages (@wasmagent/aisdk, @wasmagent/mastra-sandbox, @wasmagent/claude-agent-sdk, @wasmagent/openai-agents) โ€” drop agentkit's WASM kernels into Vercel AI SDK 4โ€“6 (sandboxedJsTool, codeModeTool), Mastra (agentkitMastraSandbox), Anthropic Claude Agent SDK (sandboxedJsClaudeTool, codeModeClaudeTool), or OpenAI Agents JS (sandboxedJsAgentTool, codeModeAgentTool) without an external sandbox provider.

  • Observability โ€” OtelBridge maps AgentEvent streams to OTel-compatible spans; emits gen_ai.* semantic convention attributes (Datadog/Honeycomb/Grafana GenAI view compatible) with semconvMode: "both" | "stable" | "legacy"

  • Durable runtime โ€” KvCheckpointer with four production backends: CloudflareKvBackend, DurableObjectKvBackend, RedisKvBackend (ioredis-style), RedisRestKvBackend (Upstash REST, edge-safe). CheckpointableRun saves state after every step; await_human_input persists pendingHumanInput and exits the iterator so the worker can recycle while a human reviews.

  • SSE Last-Event-ID resume โ€” EventLog tags every event with a monotonic id, persists to the same KvBackend, and replays only the missing tail when a client reconnects. The reference Cloudflare Worker honors Last-Event-ID natively; useAgentRun({ resume: { maxAttempts } }) retries automatically.

  • Stateless human-in-the-loop โ€” resumeFromHuman(checkpointer, traceId, promptId, response) writes the human's reply into a paused snapshot. Because there is no in-memory state, the worker that pauses and the worker that resumes can be different processes (and different days). See examples/durable-runtime/.

  • React hooks โ€” @wasmagent/react provides useAgentRun() for streaming SSE agent events in Next.js / React apps

  • Multi-model โ€” Anthropic (Claude) and OpenAI-compatible endpoints (Ollama, vLLM, llama.cpp)

  • MCP support โ€” McpToolCollection wraps any MCP server's tools as first-class agentkit tools

  • Cloudflare Workers โ€” HTTP API entry point with KV session caching, ready to deploy with Wrangler


Quick Start

Code Agent

import { CodeAgent, AnthropicModel, AnthropicModels } from "@wasmagent/core";

const agent = new CodeAgent({
 tools: [],
 model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
 maxSteps: 10,
});

for await (const event of agent.run("What is 42 * 1337?")) {
 if (event.event === "final_answer") console.log(event.data.answer);
}

Tool-Calling Agent

import { ToolCallingAgent, AnthropicModel, AnthropicModels } from "@wasmagent/core";
import { z } from "zod";

const searchTool = {
 name: "search",
 description: "Search the web",
 inputSchema: z.object({ query: z.string() }),
 outputSchema: z.string(),
 readOnly: true,
 idempotent: true,
 forward: async ({ query }) => `Results for: ${query}`,
};

const agent = new ToolCallingAgent({
 tools: [searchTool],
 model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
 maxSteps: 5,
});

for await (const event of agent.run("Search for recent AI news")) {
 if (event.event === "final_answer") console.log(event.data.answer);
}

CLI

# Install globally
npm install -g @wasmagent/cli

# Run a task
agentkit run "What is the square root of 144?"

# Stream all events as NDJSON
agentkit run "Summarise recent AI news" --stream | jq .

# Use a specific model
agentkit run "Write a haiku" --model claude-opus-4-8 --max-steps 5

Quality Runners

Self-Consistency โ€” majority vote across N independent runs

import { SelfConsistencyRunner, AnthropicModel, AnthropicModels } from "@wasmagent/core";

const runner = new SelfConsistencyRunner({
 model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
 tools: [],
 n: 5,
 concurrency: 3,
 earlyStop: true,
});

const answer = await runner.run("What is the capital of France?");

Reflect-Refine โ€” critique loop until quality signal passes

import { ReflectRefineRunner, AnthropicModel, AnthropicModels } from "@wasmagent/core";

const runner = new ReflectRefineRunner({
 model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
 tools: [],
 maxCycles: 3,
 qualitySignal: (answer) => answer.length > 100,
});

const answer = await runner.run("Write a detailed analysis of...");

Parallel Fork-Join โ€” diverse reasoning paths, synthesised answer

import { ParallelForkJoinRunner, AnthropicModel, AnthropicModels } from "@wasmagent/core";

const runner = new ParallelForkJoinRunner({
 branches: 3,
 concurrency: 3,
 aggregation: "summary",
 branchPrompt: (i, msgs) => [
 ...msgs,
 { role: "user", content: `Analyse from perspective ${i + 1} of 3.` },
 ],
});

const result = await runner.run(
 new AnthropicModel(AnthropicModels.SONNET_LATEST),
 [{ role: "user", content: "What are the trade-offs of microservices?" }]
);
console.log(result.answer); // synthesised
console.log(result.branches); // individual paths

Long-history compaction

import { CodeAgent, AnthropicModel, AnthropicModels, MessageAssembler } from "@wasmagent/core";

const model = new AnthropicModel(AnthropicModels.SONNET_LATEST);
const assembler = new MessageAssembler({ chunkSizeSteps: 8 });
const agent = new CodeAgent({
 tools: [],
 model,
 maxSteps: 50,
 assembler,
});

// Summarise old steps, keep context window in check
await agent.assembler.compact(model, 5);

Custom Endpoints & Local Models

Both adapters accept an optional baseURL to point at any compatible endpoint โ€” local models, third-party proxies, or private deployments.

OpenAI-compatible (Ollama / vLLM / llama.cpp / any proxy)

import { OpenAIModel, OpenAIModels } from "@wasmagent/core";

// Hosted OpenAI
const gpt4o = new OpenAIModel(OpenAIModels.GPT_4O);

// Local Ollama
const local = new OpenAIModel("mistral-7b", {
 baseURL: "http://localhost:11434/v1",
 apiKey: "ollama",
 samplingParams: { temperature: 0.7, seed: 42 },
});

Anthropic-compatible proxy or private deployment

import { AnthropicModel, AnthropicModels } from "@wasmagent/core";

// Standard usage โ€” reads ANTHROPIC_API_KEY from environment
const model = new AnthropicModel(AnthropicModels.SONNET_LATEST);

// Third-party proxy or private endpoint
const proxied = new AnthropicModel(AnthropicModels.SONNET_LATEST, {
 apiKey: "your-proxy-key",
 baseURL: "https://your-proxy.example.com",
});

Chinese model providers (first-class adapters)

Seven providers ship as dedicated packages with full thinking-mode, reasoning-field, and cache-strategy support:

// Doubao / Volcengine Ark (first-class thinking + effort tiers)
import { DoubaoModel, DoubaoModels } from "@wasmagent/model-doubao";
const doubao = new DoubaoModel(DoubaoModels.LATEST, process.env.ARK_API_KEY);
for await (const e of doubao.generate(msgs, { thinking: { mode: "enabled", effort: "high" } })) { ... }

// DeepSeek V4 (thinking:{type} + effort, V4_FLASH available)
import { DeepSeekModel, DeepSeekModels } from "@wasmagent/model-deepseek";
const ds = new DeepSeekModel(DeepSeekModels.V4_PRO, process.env.DEEPSEEK_API_KEY);

// Kimi K2.6 (reasoning field: delta.reasoning, thinking:{type} via extra_body)
import { MoonshotModel, KimiModels } from "@wasmagent/model-moonshot";
const kimi = new MoonshotModel(KimiModels.LATEST, process.env.MOONSHOT_API_KEY);

// Qwen3 (enable_thinking + thinking_budget, intl region option)
import { QwenModel, QwenModels } from "@wasmagent/model-qwen";
const qwen = new QwenModel(QwenModels.QWEN3_MAX, { region: "cn" });

// GLM-5 (Zhipu self-hosted, thinking:{type} via extra_body)
import { ZhipuModel, GLMModels } from "@wasmagent/model-zhipu";
const glm = new ZhipuModel(GLMModels.GLM_5, process.env.ZHIPU_API_KEY);

// MiniMax M3 (reasoning_split=true โ†’ reasoning_details; or <think> tag parsing)
import { MiniMaxModel, MiniMaxModels } from "@wasmagent/model-minimax";
const mm = new MiniMaxModel(MiniMaxModels.M3, process.env.MINIMAX_API_KEY);

Provider capability reference:

Provider

Package

Thinking switch

Reasoning field

Cache strategy

Multi-turn round-trip

Doubao/Ark

model-doubao

extra_body.thinking.{type,level}

delta.reasoning_content

auto-prefix (transparent) / ark-context (explicit)

tool-turns-only

DeepSeek V4

model-deepseek

extra_body.thinking.{type,effort}

delta.reasoning_content

auto-prefix

tool-turns-only

Kimi K2.6

model-moonshot

extra_body.thinking.{type}

delta.reasoning (K2.6) / delta.reasoning_content (K2)

auto-prefix

tool-turns-only

Qwen3

model-qwen

enable_thinking + thinking_budget

delta.reasoning_content

auto-prefix

never

GLM-5

model-zhipu

extra_body.thinking.{type}

delta.reasoning_content

auto-prefix

never

MiniMax M3

model-minimax

reasoning_split:true

delta.reasoning_details (or <think> in content)

auto-prefix

never

Note on multi-turn round-trip: DeepSeek/Doubao/Kimi require reasoning_content echoed back in assistant messages containing tool_use (not in text-only turns โ€” that causes a 400 error). The adapters implement this automatically via reasoningRoundTripPolicy: "tool-turns-only".


Deploy to Cloudflare Workers

cd packages/cloudflare-worker
cp wrangler.toml.example wrangler.toml # edit account_id and kv_namespaces
wrangler secret put ANTHROPIC_API_KEY
wrangler deploy

The Worker exposes a POST /run endpoint. Session state is stored in KV for cost-efficient prompt caching across requests.


Packages

wasmagent is a 33-package monorepo. See docs/packages.md for the canonical, tier-classified list (โ˜… Core / โ—† Narrative / โ–ฝ Maintenance), with one-line descriptions of each package and links to its README.

For the maintenance-tier rationale, see docs/strategy/maintenance-tiers.md.


Production APIs

Retry / Resilience (C1)

All model adapters automatically retry 429 / 5xx / network errors with exponential backoff + jitter:

import { AnthropicModel } from "@wasmagent/core";

const model = new AnthropicModel("claude-sonnet-4-6", {
 apiKey: process.env.ANTHROPIC_API_KEY,
 retry: { maxRetries: 3, baseDelayMs: 500, maxDelayMs: 30_000 },
});

Evals (B1)

import { runEval, exactMatch, toolCallAccuracy } from "@wasmagent/core";

const results = await runEval(dataset, async function* (task) {
 yield* agent.run(task);
}, [exactMatch, toolCallAccuracy]);

OpenTelemetry Bridge (C2)

import { OtelBridge, InMemorySpanExporter, withOtel } from "@wasmagent/core";

const exporter = new InMemorySpanExporter(); // swap for OTLP in production
const bridge = new OtelBridge({ exporter });
for await (const ev of withOtel(agent.run(task), bridge)) {
 console.log(ev);
}
bridge.flush();

Durable runtime โ€” Checkpoints, SSE resume, HITL

Pick one KvBackend and use it for checkpoints, the SSE event log, and structured memory โ€” there is one canonical contract.

import {
 CheckpointableRun,
 EventLog,
 KvCheckpointer,
 resumeFromHuman,
 applyHumanResponse,
 restoreFromSnapshot,
} from "@wasmagent/core";
// Pick a backend that matches your runtime.
import { CloudflareKvBackend } from "@wasmagent/cloudflare-worker";
// Other options: DurableObjectKvBackend (CF), RedisKvBackend (Node/Bun),
// RedisRestKvBackend (Upstash, edge-safe), MapKvBackend (tests).

const kv = new CloudflareKvBackend(env.MY_KV);
const checkpointer = new KvCheckpointer(kv);
const log = new EventLog(kv); // SSE Last-Event-ID resume
const wrapper = new CheckpointableRun({ checkpointer }, agent.assembler);

// Stream + persist + tag every event with a monotonic id.
for await (const { eventId, event } of log.tap(
 wrapper.run(agent.run(task), task, traceId),
 traceId,
)) {
 // emit `id: ${eventId}\nevent: ${event.event}\ndata: ${...}\n\n` over SSE
 if (event.event === "await_human_input") {
 // Snapshot is already persisted; the worker is free to exit.
 return;
 }
}

Resume after a worker recycle (different process, possibly different machine):

const lastId = req.headers.get("Last-Event-ID");
for await (const { eventId, event } of log.replay(traceId, lastId)) { /* re-emit */ }
const startSeq = await log.nextSeq(traceId);
for await (const { eventId, event } of log.tap(agent.run(task, traceId), traceId, { startSeq })) { /* live tail */ }

Resume after human approval (could be hours/days later):

// In the /resume HTTP handler โ€” stateless, returns immediately.
await resumeFromHuman(checkpointer, traceId, promptId, response);

// Later, when a worker picks up the trace:
const snap = await checkpointer.load(traceId);
restoreFromSnapshot(snap, agent.assembler);
applyHumanResponse(snap, agent.assembler); // injects user_message into history
// Then continue with `wrapper.run(agent.run(snap.task, traceId), ...)`.

The reference Cloudflare Worker (@wasmagent/cloudflare-worker) wires all of this for you โ€” bind AGENTKIT_EVENT_LOG and AGENTKIT_CHECKPOINTS in wrangler.toml and you get Last-Event-ID resume + a POST /resume endpoint out of the box. Full guide: docs/guides/durable-runtime.md.

React Hook (B2)

import { useAgentRun } from "@wasmagent/react";

function ChatUI() {
 const { messages, isRunning, run } = useAgentRun("/api/run");
 return (
 <>
 {messages.map((m) => <div key={m.id}>{m.content}</div>)}
 <button onClick={() => run({ task: "What is 2 + 2?" })} disabled={isRunning}>
 Ask
 </button>
 </>
 );
}

Tool Deferred Loading (L1-1)

Exclude large MCP server tool schemas from the context prefix; load on-demand via Anthropic Tool Search. Reduces token usage by up to 85% on servers with many tools.

import { McpToolCollection, ToolCallingAgent, AnthropicModel, AnthropicModels } from "@wasmagent/core";

// Option A: defer all tools from an MCP server with many tools.
const tools = await McpToolCollection.fromHttp("https://big-mcp-server.example.com");
tools.deferAll(); // marks all tools as deferLoading: true

// Option B: defer individual tools via the ToolDefinition field.
const myTool = {
 name: "my_tool",
 deferLoading: true, // excluded from system prefix
 // ... other fields
};

const agent = new ToolCallingAgent({
 tools: tools.list(),
 model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
});

Tool Use Examples (L1-2)

Provide few-shot examples to improve parameter accuracy from ~72% to ~90%.

const searchTool = {
 name: "search",
 description: "Search the web for information",
 inputSchema: z.object({ query: z.string(), maxResults: z.number().optional() }),
 inputExamples: [
 { query: "latest AI research 2026", maxResults: 5 },
 { query: "TypeScript best practices" },
 ],
 // ...
};

Context Editing (L2-1)

Truncate old tool outputs reversibly to reduce context size without breaking conversation structure.

import { MessageAssembler, AnthropicModel, AnthropicModels } from "@wasmagent/core";

const model = new AnthropicModel(AnthropicModels.SONNET_LATEST);
const assembler = new MessageAssembler({ chunkSizeSteps: 8 });
const agent = new ToolCallingAgent({ tools, model, assembler, maxSteps: 50 });

// After many steps, truncate old tool outputs that are taking too many tokens.
// Keeps the 3 most recent tool steps verbatim; truncates older ones.
const truncated = agent.assembler.editToolResults({ maxTokens: 4096, keepRecent: 3 });
console.log(`Truncated ${truncated} tool outputs`);

Cross-Session Memory Tool (L2-2)

Give agents persistent memory that survives across separate run() calls.

import { createMemoryTool, MapKvBackend, ToolCallingAgent, AnthropicModel, AnthropicModels } from "@wasmagent/core";

// Use MapKvBackend for in-process use, or KvCheckpointer's backend for persistence.
const memory = createMemoryTool({ backend: new MapKvBackend() });

const agent = new ToolCallingAgent({
 tools: [memory, ...otherTools],
 model: new AnthropicModel(AnthropicModels.SONNET_LATEST),
});

// Session 1: agent learns something
for await (const ev of agent.run("What's the capital of France? Remember it for later.")) { }

// Session 2: agent recalls it
for await (const ev of agent.run("What did you remember about France's capital?")) {
 if (ev.event === "final_answer") console.log(ev.data.answer); // "Paris"
}

Programmatic Tool Calling / Self-Hosted PTC (L3-1)

Execute model-generated orchestration scripts inside a kernel; only the final result enters the context window.

import { ProgrammaticOrchestrator, JsKernel, ToolRegistry } from "@wasmagent/core";

const kernel = new JsKernel();
const registry = new ToolRegistry();
registry.register(searchTool);
registry.register(calcTool);

const orchestrator = new ProgrammaticOrchestrator(kernel, registry, {
 extraCapabilities: ["tool:search", "tool:calc"],
});

// Model-generated script โ€” intermediate results never enter the LLM context.
const script = `
 const results = callTool('search', { query: 'AI news 2026' });
 const count = callTool('calc', { expr: results.length + ' items' });
 count + ' found';
`;
const { finalOutput, toolCallCount } = await orchestrator.run(script);
console.log(finalOutput); // Only this enters the context window.
console.log(toolCallCount); // e.g. 2 โ€” intermediate results stayed in the kernel.

Development

pnpm install
pnpm build
pnpm test
pnpm typecheck

# Reproduce every percentage in the "Differentiated" section above.
pnpm bench

# Cloudflare Worker local dev
cd packages/cloudflare-worker && wrangler dev

Examples

Example

What it shows

examples/basic-agent/

Minimal CodeAgent end-to-end

examples/tool-calling-agent/

ToolCallingAgent with tools

examples/tool-search-rag/

RAG-style retrieval tool

examples/durable-runtime/

Checkpoint + SSE resume + HITL across three simulated processes (no model needed)

examples/eval-suite/

Composite scorer over a small dataset

examples/benchmarks/

Reproducible verification of every README percentage

examples/cf-production/

Production-style Worker deployment

examples/otel-jaeger/

OTel bridge with Jaeger backend

examples/observational-memory/ (in benchmarks)

A1 โ€” measure compression ratio of ObservationalMemory vs baseline

examples/devtools-replay/

A2 โ€” synthetic event trace + EventLogReplay fork-from-step demo (offline)

examples/skills-demo/

A3 โ€” three lazily-loaded skills + post-hook chain (redact + truncate)

examples/judge-scorer-demo/

A4 โ€” code-based vs LLM-judge scorer divergence on a synthetic trace

Documentation

Environment Variables

Variable

Purpose

ANTHROPIC_API_KEY

Anthropic model access

OPENAI_API_KEY

OpenAI / compatible endpoint

CLOUDFLARE_API_TOKEN

CI/CD Worker deployment

CLOUDFLARE_ACCOUNT_ID

CI/CD Worker deployment


Acknowledgements

Inspired by Hugging Face's smolagents. wasmagent is a ground-up TypeScript reimplementation โ€” not a port โ€” targeting async-first execution, WASM sandboxing, and edge deployment.

License

Apache 2.0

A
license - permissive license
A
quality
A
maintenance

Maintenance

โ€“Maintainers
โ€“Response time
โ€“Release cycle
1Releases (12mo)
Commit activity

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/WasmAgent/wasmagent-js'

If you have feedback or need assistance with the MCP directory API, please join our Discord server