Crawlio Browser

@Crawlio-app

Visit Server

3 months ago

# browser-automation

100-tool browser automation for AI agents via MCP. Screenshots, DOM inspection, network capture, form filling, cookie management, tab orchestration, session recording, and structured data extraction via Chrome extension. One-command setup: npx crawlio-browser init auto-configures 14 MCP clients.recording + 1 compiler) with framework-aware intelligence, typed evidence infrastructure, and confidence-tracked findings — captures what static crawlers can't see.

Overview Tools Comments

Overview

Crawlio Agent

👁 npm version
👁 License: MIT

Documentation | API Reference | Chrome Extension

MCP server that gives AI full control of a live Chrome browser via CDP. 100 tools (93 browser + 3 extraction + 3 recording + 1 compiler) with framework-aware intelligence, typed evidence infrastructure, and confidence-tracked findings — captures what static crawlers can't see.

Note: This repo supersedes crawlio-browser-mcp. All development now happens here.

Links:

When to use Crawlio Agent

Use Crawlio Agent when your AI needs to interact with a real browser — SPAs, authenticated pages, dynamic content, JS-rendered frameworks. Unlike headless browser tools, Crawlio Agent connects to your actual Chrome via a lightweight extension, giving the AI access to your logged-in sessions, cookies, and full browser state.

Crawlio Agent vs headless browser tools: Headless tools launch a separate browser process. Crawlio Agent connects to your existing Chrome — no separate browser, no login flows, full access to your tabs and sessions.

Quick Start

Install the Chrome Extension
Run the init wizard:
```
npx crawlio-browser init
```

That's it. Auto-detects and configures 14 MCP clients: Claude Code, Cursor, VS Code, Codex, Gemini CLI, Claude Desktop, ChatGPT Desktop, Windsurf, Cline, Zed, Goose, OpenCode, MCPorter, and Cline CLI.

Init wizard options

npx crawlio-browser init # Default: code mode, stdio transport
npx crawlio-browser init --full # Full mode (100 individual tools)
npx crawlio-browser init --portal # Portal mode (persistent HTTP server)
npx crawlio-browser init --cloudflare # Add Cloudflare MCP (89 tools, no wrangler)
npx crawlio-browser init --dry-run # Show what would happen
npx crawlio-browser init --yes # Skip prompts (CI / scripted installs)
npx crawlio-browser init -a claude # Target specific MCP client

Transport Modes

Mode	Command / URL	Protocol	Best For
stdio	`npx crawlio-browser`	JSON-RPC over stdin/stdout	Claude Desktop, Cursor, Windsurf — client manages process lifecycle
Portal (HTTP)	`POST http://127.0.0.1:3001/mcp`	MCP Streamable HTTP	Claude Code, ChatGPT Desktop — server survives session restarts
Portal (SSE)	`GET /sse` + `POST /message`	Server-Sent Events	Legacy clients needing SSE transport

Portal mode is recommended for Claude Code — the server persists across context compaction and session restarts. On macOS, --portal installs a launchd agent for auto-start on login.

Manual setup (any client)

How It Works

AI Client (stdio/http) --> MCP Server (Node.js) --> Chrome Extension (MV3)
 crawlio-browser WebSocket -> CDP

The MCP server communicates with the Chrome extension via WebSocket. The extension controls the browser through Chrome DevTools Protocol (CDP).

Capabilities

Framework-Aware Intelligence

Every execute call probes the browser for framework signatures and injects a shape-shifting smart object with framework-native accessors. React state, Vue reactivity, Next.js routing, Shopify cart data — 17 framework namespaces across 4 tiers, detected at runtime and rebuilt on every navigation. The AI doesn't query a generic DOM; it queries the framework's own data structures.

Evidence-Based Analysis

Method Mode adds higher-order methods and a typed evidence system on top of Code Mode. smart.extractPage() runs 7 parallel operations in a single call — page capture, performance metrics, security state, font detection, meta extraction, accessibility audit, and mobile-readiness check. Failed operations produce typed CoverageGap records instead of silent nulls. Findings created with smart.finding() get their confidence automatically adjusted when supporting data is missing. The result: structured, auditable research output with gap tracking and confidence propagation.

Session Recording & Replay

Record browser interactions as structured data, then compile them into reusable SKILL.md automations. 12 interaction tools are automatically intercepted during recording — clicks, typing, navigation, scrolling — each capturing args, result, timing, and page URL. One compileRecording() call converts the session into a deterministic automation script.

Auto-Settling & Actionability

Every mutative action (click, type, navigate, select_option) runs actionability checks before acting — polling visibility, dimensions, enabled state, and overlay detection. After the action, a progressive backoff settle delay ([0, 20, 100, 100, 500]ms) waits for DOM mutations to quiesce. The AI doesn't need manual sleep() calls between actions.

Architecture: JIT Context Runtime

The JIT Context MCP Runtime is a layered execution architecture where each layer absorbs a category of complexity that would otherwise fall on the model. The model sees three tools and a clean SDK. Everything beneath that surface is the runtime absorbing reality.

 ┌───────────────────────────────────┐
 │ AI Model (LLM) │
 │ Writes code, reads errors, loops │
 └───────────────┬───────────────────┘
 │ 3 tools: search, execute, connect_tab
 ▼
┌─────────────────────────────────────────────────────────────────┐
│ JIT Context MCP Runtime │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ METHOD MODE │ │
│ │ Behavioral protocol + higher-order methods │ │
│ │ scrollCapture · waitForIdle · extractPage · comparePages │ │
│ │ detectTables · extractTable · waitForNetworkIdle · │ │
│ │ extractData │ │
│ │ │ │
│ │ ↳ Absorbs: behavioral variance, ad-hoc composition, │ │
│ │ inconsistent output shapes, data extraction patterns │ │
│ ├────────────────────────────────────────────────────────────┤ │
│ │ POLYMORPHIC CONTEXT │ │
│ │ 17 framework namespaces, injected Just-In-Time │ │
│ │ react · vue · angular · nextjs · shopify · ... │ │
│ │ │ │
│ │ ↳ Absorbs: framework opacity, minified code, │ │
│ │ devtools hook complexity │ │
│ ├────────────────────────────────────────────────────────────┤ │
│ │ ACTIONABILITY ENGINE │ │
│ │ 7 core smart methods with built-in resilience │ │
│ │ click · type · navigate · waitFor · evaluate · │ │
│ │ snapshot · screenshot │ │
│ │ │ │
│ │ ↳ Absorbs: DOM timing, hydration delays, CSS animations, │ │
│ │ disabled states, overlapping elements │ │
│ ├────────────────────────────────────────────────────────────┤ │
│ │ TETHERED IPC BRIDGE │ │
│ │ WebSocket ↔ Chrome extension, message queue, │ │
│ │ heartbeat, auto-reconnect, stale detection │ │
│ │ │ │
│ │ ↳ Absorbs: connection drops, tab refreshes, │ │
│ │ port conflicts, extension lifecycle │ │
│ ├────────────────────────────────────────────────────────────┤ │
│ │ 133 RAW COMMANDS (bridge.send) │ │
│ │ CDP-level browser control via Chrome extension │ │
│ └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
 │
 ▼
 ┌───────────────────────────────────┐
 │ Live Chrome Browser │
 │ Persistent session, real DOM, │
 │ framework runtime, user state │
 └───────────────────────────────────┘

What Each Layer Absorbs

Layer	Without It	With It
Tethered IPC Bridge	Script crashes on tab refresh, pending commands lost on reconnect, port conflicts on startup	Resilient WebSocket with message queue (100-msg capacity), heartbeat stale detection (15s intervals), auto-reconnect with drain
Actionability Engine	`click('#btn')` fires before the button renders, during CSS transitions, or while an overlay covers it	Progressive polling (exists → has dimensions → visible → not disabled → not obscured) with `[0, 20, 100, 100, 500]ms` backoff
Polymorphic Context	Model sees minified `<div>` elements; reading React state requires knowing exact hook paths, renderer maps, and fiber root API	Runtime probes live JS environment, detects 17 frameworks, injects namespace methods (`smart.react.getVersion()`, `smart.nextjs.getData()`)
Method Mode	Model composes primitives ad-hoc — inconsistent scroll loops, missed edge cases, varying return shapes	8 tested methods encode correct patterns; behavioral protocol constrains workflow

Execution Lifecycle

Discovery — Model calls search("page capture performance") and gets documentation for relevant commands
Framework Detection — Runtime probes the live DOM, detects active frameworks, constructs polymorphic smart object with appropriate namespaces
Scope Assembly — Model's code is compiled into an async function with injected parameters: bridge (133 commands), crawlio (HTTP client), sleep, TIMEOUTS, smart (7 core + 8 higher-order + up to 17 framework namespaces), compileRecording
Execution — Method Mode methods compose the lower layers: extractPage() fires 7 parallel bridge.send() calls; click() runs the actionability engine; react.getVersion() evaluates framework-specific expressions
Error Recovery (Agentic REPL) — On failure, the browser stays in the exact state that produced the error. The model reads the structured error, adjusts, and calls execute again. Framework cache persists — no re-detection unless URL changed

Design Principles

Absorb complexity downward — Every category of difficulty (connection management, DOM timing, framework detection, multi-step composition) is handled by the layer best equipped for it. The model only encounters the clean interface at the top.
Shape the SDK to the target — The polymorphic context system detects what the page is and reshapes available methods to match. The model writes against a stable interface; the runtime adapts underneath.
Preserve state across cycles — The tethered architecture means the model can fail, learn, and retry against the same live environment — transforming error handling from "restart from scratch" into "adjust and continue."

How It Compares

Dimension	Standard MCP	Cloudflare Code Mode	JIT Context Runtime
Tools in context	50-100+ schemas	2 (`search`, `execute`)	3 (`search`, `execute`, `connect_tab`)
Execution environment	N/A (tool calls)	V8 isolate (stateless)	Local async sandbox (stateful, tethered to live browser)
DOM access	Via individual tool calls	None	Live, persistent, framework-aware
Framework awareness	None	None	17 namespaces, injected JIT
Action resilience	Model must handle timing	N/A (no DOM)	Built-in actionability polling + settle delays
Error recovery	Re-call individual tool	Re-create isolate	Re-execute against same live state (Agentic REPL)
Multi-step patterns	Model improvises	Model writes loops	8 tested higher-order methods + behavioral protocol

Read the full architecture guide →

Two Modes

Code Mode (3 tools) — default

Collapses 100 tools into 3 high-level tools with ~95% schema token reduction:

Tool	Description
`search`	Discover available commands by keyword
`execute`	Run async JS with `bridge`, `crawlio`, `smart`, `sleep`, and `compileRecording` in scope
`connect_tab`	Connect to a browser tab

// Navigate and screenshot
await bridge.send({ type: 'browser_navigate', url: 'https://example.com' }, 30000);
await sleep(2000);
const screenshot = await bridge.send({ type: 'take_screenshot' }, 10000);
return screenshot;

Full Mode (100 tools)

Every tool exposed directly to the LLM. Enable with --full:

npx crawlio-browser init --full

Smart Object

In Code Mode, the smart object provides framework-aware helpers with auto-waiting and actionability checks.

Core Methods

Method	Description
`smart.evaluate(expression)`	Execute JS in the page via CDP
`smart.click(selector, opts?)`	Auto-waiting click with 500ms settle
`smart.type(selector, text, opts?)`	Auto-waiting type with 300ms settle
`smart.navigate(url, opts?)`	Navigate with 1000ms settle
`smart.waitFor(selector, timeout?)`	Poll until element is actionable
`smart.snapshot()`	Accessibility tree snapshot
`smart.screenshot()`	Full-page screenshot (base64 PNG)

Higher-Order Methods

Method	Description
`smart.scrollCapture(opts?)`	Scroll to bottom, capturing screenshots at each position. Handles stuck-scroll detection, bottom detection, section capping, and scroll reset.
`smart.waitForIdle(timeout?)`	MutationObserver-based idle detection — waits for 500ms quiet window. Timeout hard-capped at 15s. Replaces blind `sleep()` calls.
`smart.extractPage(opts?)`	7 parallel operations in one call — page capture, performance, security, fonts, meta, accessibility, mobile-readiness. Returns typed `PageEvidence` with `CoverageGap[]` for anything that failed.
`smart.comparePages(urlA, urlB)`	Navigates to both URLs, runs `extractPage()` on each, returns a `ComparisonScaffold` with 11 dimensions, shared/missing fields, and comparable metrics.

Typed Evidence

Methods for structured analysis findings with confidence propagation:

Method	Description
`smart.finding(data)`	Create a validated `Finding` with claim, evidence, sourceUrl, confidence, and method. Rejects malformed input with specific errors.
`smart.findings()`	Get all session-accumulated findings (returns a copy)
`smart.clearFindings()`	Reset session findings and coverage gaps

When a finding's dimension matches an active coverage gap, confidence is automatically capped:

Input Confidence	Active Gap	Output
`high`	`reducesConfidence: true`	`medium` + `confidenceCapped: true`
`medium`	`reducesConfidence: true`	`low` + `confidenceCapped: true`
`low`	any	`low` (floor)
any	no matching gap	unchanged

Framework Namespaces

When a framework is detected, the smart object exposes framework-specific helpers:

Method Mode

Method Mode is a domain layer built on top of Code Mode. It adds higher-order methods, a typed evidence system, and a behavioral protocol to the execute sandbox — without changing the tool surface. The model still sees three tools. The same smart object. The same 133-command catalog underneath. What changes is what happens inside execute.

The Maturity Ladder

Layer	Optimizes For	Behavioral Variance	Evidence Quality
Raw MCP (100 tools)	Completeness	High — flat tool list, no composition guidance	None — unstructured text
Code Mode (3 tools)	Token efficiency	Medium — right primitives, ad-hoc composition	None — model-defined shapes
Method Mode v1 (+ 8 methods + protocol)	Consistency	Low — proper methods, protocol constraints	Convention — `{ finding, evidence, url }`
Method Mode v2 (+ typed evidence + gaps + confidence)	Correctness	Minimal — typed schemas, tool-enforced findings	Structural — typed records, gap tracking, confidence propagation

Architecture

┌────────────────────────────────────────────────────────────┐
│ execute sandbox │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Behavioral Protocol (web-research skill) │ │
│ │ Acquire → Normalize → Analyze │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ Evidence Infrastructure │ │
│ │ finding() · findings() · clearFindings() │ │
│ │ Typed records · Coverage gaps · Confidence prop. │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ Higher-Order Methods [8] │ │
│ │ scrollCapture · waitForIdle · extractPage · │ │
│ │ comparePages · detectTables · extractTable · │ │
│ │ waitForNetworkIdle · extractData │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ Smart Core [7 methods] │ │
│ │ evaluate · click · type · navigate · waitFor · │ │
│ │ snapshot · screenshot │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ Framework Namespaces [up to 17, injected JIT] │ │
│ │ react · vue · angular · nextjs · shopify · ... │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ bridge.send() — 133 raw commands │ │
│ └──────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────┘

Each layer up encodes more domain knowledge. bridge.send({ type: "capture_page" }) captures a page. smart.extractPage() captures a page AND runs performance metrics, security state, font detection, accessibility analysis, and mobile-readiness checks in parallel — seven operations, one call, graceful failure on supplementary data, typed gaps for anything that fails.

Evidence Infrastructure

Coverage Gaps — When supplementary operations in extractPage() fail, they don't silently return null. A typed gap is recorded with the dimension, reason, impact, and whether it reduces confidence on related findings:

// Example gap from a failed performance metrics call
{ dimension: "performance", reason: "CDP domain disabled", impact: "method-failed", reducesConfidence: true }

Tool-Enforced Findings — smart.finding() validates every field at the tool level. The model cannot produce a finding without meeting the schema — it either returns a valid Finding or gets a clear error. Findings accumulate across execute calls within a session via smart.findings().

Session Aggregation — Findings and coverage gaps persist across execute calls. A model can make findings across multiple calls, then retrieve the full set with smart.findings(). Reset with smart.clearFindings().

End-to-End Example: Competitive Audit

// 1. Extract and compare both sites (scaffold + gaps included)
const comparison = await smart.comparePages(
 'https://acme.com',
 'https://rival.com'
);
// 2. Make findings — confidence auto-adjusts based on data availability
smart.finding({
 claim: 'Rival loads 2.3x faster on Largest Contentful Paint',
 evidence: [
 `Acme LCP: ${comparison.siteA.performance?.webVitals?.lcp}ms`,
 `Rival LCP: ${comparison.siteB.performance?.webVitals?.lcp}ms`,
 ],
 sourceUrl: 'https://acme.com',
 confidence: 'high',
 method: 'comparePages + extractPage performance metrics',
 dimension: 'performance', // if perf data failed, confidence caps to "medium"
});
smart.finding({
 claim: 'Acme has 12 images without alt text; Rival has 0',
 evidence: [
 `Acme imagesWithoutAlt: ${comparison.siteA.accessibility?.imagesWithoutAlt}`,
 `Rival imagesWithoutAlt: ${comparison.siteB.accessibility?.imagesWithoutAlt}`,
 ],
 sourceUrl: 'https://acme.com',
 confidence: 'high',
 method: 'comparePages + extractPage accessibility summary',
 dimension: 'accessibility',
});
// 3. Capture visual evidence
await smart.navigate('https://acme.com');
await smart.waitForIdle();
const acmeVisuals = await smart.scrollCapture({ maxSections: 5 });
// 4. Return accumulated session findings + visual evidence
return {
 findings: smart.findings(),
 scaffold: comparison.scaffold,
 gaps: { acme: comparison.siteA.gaps, rival: comparison.siteB.gaps },
 visualEvidence: { acme: acmeVisuals.sectionCount + ' sections captured' },
};

Examples

Navigate, extract, and analyze

// Connect to active tab, extract structured page evidence
const page = await smart.extractPage();
const finding = smart.finding({
 claim: `Site uses ${page.capture.framework?.name || 'no detected framework'}`,
 evidence: [`Framework: ${JSON.stringify(page.capture.framework)}`],
 sourceUrl: page.meta?.canonical || 'active tab',
 confidence: 'high',
 method: 'extractPage framework detection',
});
return { page: page.meta, finding };

Mobile emulation + screenshot

// Emulate iPhone and capture
await bridge.send({ type: 'emulate_device', device: 'iPhone 14' }, 10000);
await smart.navigate('https://example.com');
await smart.waitForIdle();
const screenshot = await smart.screenshot();
return screenshot;

Record and compile automation

// Record a browser session, then compile to reusable skill
await bridge.send({ type: 'start_recording' }, 10000);
await smart.navigate('https://example.com');
await smart.click('button.submit');
await smart.type('#email', 'test@example.com');
const session = await bridge.send({ type: 'stop_recording' }, 10000);
return compileRecording(session.session, 'signup-flow');

Intercept and mock network

// Block analytics, mock API response
await bridge.send({
 type: 'browser_intercept',
 pattern: '*analytics*',
 action: 'block'
}, 10000);
await bridge.send({
 type: 'browser_intercept',
 pattern: '*/api/user',
 action: 'mock',
 body: JSON.stringify({ name: 'Test User' }),
 statusCode: 200
}, 10000);
await smart.navigate('https://example.com');
return await smart.snapshot();

Session Recording

Record browser sessions as structured data, then compile them into reusable automation skills. 12 interaction tools are automatically intercepted during recording (click, type, navigate, scroll, etc.), capturing args, result, timing, and page URL.

// In code mode: record, interact, compile
await bridge.send({ type: 'start_recording' }, 10000);
// ... interact with the page ...
const session = await bridge.send({ type: 'stop_recording' }, 10000);
const skill = compileRecording(session.session, 'my-automation');
return skill;

In full mode, recording is available as 4 individual tools: start_recording, stop_recording, get_recording_status, and compile_recording.

Auto-Settling

Mutative tools (browser_click, browser_type, browser_navigate, browser_select_option) use actionability checks:

Pre-flight: Polls element visibility, stability, and enabled state before acting
Action: Dispatches the CDP command
Post-settle: Waits for DOM mutations to quiesce with progressive backoff [0, 20, 100, 100, 500]ms

This means the AI doesn't need to manually add sleep() or waitFor() calls between actions — the tools handle SPA rendering delays automatically.

Framework Detection

Detects 64 technologies across 4 tiers using globals, DOM markers, meta tags, HTTP headers, and script URLs:

Tier	Frameworks	Signal Strength
Meta-frameworks	Next.js, Nuxt, SvelteKit, Remix, Gatsby	Unique globals + parent detection
Core	React, Vue.js, Angular, Svelte, Astro, Qwik, SolidJS, Lit, Preact	Globals + DOM markers
CMS & Platforms	WordPress, Shopify, Webflow, Squarespace, Wix, Drupal, Magento, Ghost, Bubble	Meta tags + globals
Libraries & Tools	jQuery, Bootstrap, Tailwind CSS, Alpine.js, HTMX, Turbo, Stencil, Redux, Ember.js, Backbone.js	DOM + globals

Multi-framework detection returns a primary framework (meta-framework takes priority) plus a subFrameworks array for the full stack.

Tools Reference

Requirements

Node.js >= 18
Chrome (or Chromium) with the Crawlio Agent extension installed
Crawlio.app (optional) — for site crawling and enrichment

Resources

Documentation
API Reference
Product Page
Chrome Extension
npm Package
Changelog
Previous repo — this project supersedes crawlio-browser-mcp

License

MIT

Try in Playground

Server Config

{
 "mcpServers": {
 "crawlio-browser": {
 "command": "npx",
 "args": [
 "-y",
 "crawlio-browser"
 ]
 }
 }
}

Build with ShipAny.

URL: https://mcp.so/server/crawlio-browser/Crawlio-app

⇱