VOOZH about

URL: https://glama.ai/mcp/servers/0xMassi/webclaw

⇱ webclaw by 0xMassi | Glama



Most web scraping tools give your agent one of two bad outputs:

  • a blocked page, login wall, or empty app shell

  • raw HTML full of nav, scripts, styling, ads, and duplicated boilerplate

webclaw.io is the hosted web extraction API for webclaw. This repo contains the open-source CLI, MCP server, extraction engine, and self-hostable server.

webclaw turns a URL into clean content your tools can actually use.

webclaw https://example.com --format markdown
# Example Domain

This domain is for use in illustrative examples in documents.

You may use this domain in literature without prior coordination or asking for permission.

Use it from the terminal, wire it into Claude/Cursor through MCP, call the hosted API from your app, or self-host the OSS server.


Install

Agent setup

The fastest way to connect webclaw to Claude Code, Claude Desktop, Cursor, Windsurf, OpenCode, Codex CLI, and other MCP-compatible tools:

npx create-webclaw

The installer detects supported clients and configures the MCP server for you.

Homebrew

brew tap 0xMassi/webclaw
brew install webclaw

Prebuilt binaries

Download macOS and Linux binaries from GitHub Releases.

Docker

docker run --rm ghcr.io/0xmassi/webclaw https://example.com

Cargo

cargo install --git https://github.com/0xMassi/webclaw.git webclaw-cli
cargo install --git https://github.com/0xMassi/webclaw.git webclaw-mcp

If building from source fails because native build tools are missing, install the platform prerequisites:

OS

Command

Debian / Ubuntu

sudo apt install -y pkg-config libssl-dev cmake clang git build-essential

Fedora / RHEL

sudo dnf install -y pkg-config openssl-devel cmake clang git make gcc

Arch

sudo pacman -S pkg-config openssl cmake clang git base-devel

macOS

xcode-select --install


Related MCP server: mcp-playwright

Quick Start

Scrape one page

webclaw https://stripe.com --format markdown

Return LLM-optimized text

webclaw https://docs.anthropic.com --format llm

Keep only the main content

webclaw https://example.com/blog/post --only-main-content

Include or exclude selectors

webclaw https://example.com \
 --include "article, main, .content" \
 --exclude "nav, footer, .sidebar, .ad"

Crawl a documentation site

webclaw https://docs.rust-lang.org --crawl --depth 2 --max-pages 50

Workflow examples

Extract brand assets

webclaw https://github.com --brand

Compare a page over time

webclaw https://example.com/pricing --format json > pricing-old.json
webclaw https://example.com/pricing --diff-with pricing-old.json

MCP Server

webclaw ships with an MCP server for AI agents.

npx create-webclaw

Manual config:

{
 "mcpServers": {
 "webclaw": {
 "command": "~/.webclaw/webclaw-mcp"
 }
 }
}

Then ask your agent things like:

Scrape these competitor pricing pages and summarize the differences.
Crawl this documentation site and prepare clean context for a RAG index.
Extract the brand colors, fonts, and logos from this company website.

Tools

Tool

What it does

Local

scrape

Extract one URL as markdown, text, JSON, LLM format, or HTML

Yes

crawl

Follow same-origin links and extract discovered pages

Yes

map

Discover URLs without extracting every page

Yes

batch

Scrape multiple URLs in parallel

Yes

extract

Convert page content into structured data

Yes, with local or configured LLM

summarize

Summarize a page

Yes, with local or configured LLM

diff

Compare page content snapshots

Yes

brand

Extract colors, fonts, logos, and metadata

Yes

search

Search the web and scrape results

Hosted API

research

Multi-source research workflow

Hosted API


SDKs

npm install @webclaw/sdk
pip install webclaw
go get github.com/0xMassi/webclaw-go
import { Webclaw } from "@webclaw/sdk";

const client = new Webclaw({ apiKey: process.env.WEBCLAW_API_KEY! });

const page = await client.scrape({
 url: "https://example.com",
 formats: ["markdown"],
 only_main_content: true,
});

console.log(page.markdown);
from webclaw import Webclaw

client = Webclaw(api_key="wc_your_key")

page = client.scrape(
 "https://example.com",
 formats=["markdown"],
 only_main_content=True,
)

print(page.markdown)
curl -X POST https://api.webclaw.io/v1/scrape \
 -H "Authorization: Bearer $WEBCLAW_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
 "url": "https://example.com",
 "formats": ["markdown"],
 "only_main_content": true
 }'

Output Formats

Format

Use it when you need

markdown

Clean page content with structure preserved

llm

Compact context for agents and RAG pipelines

text

Plain text with minimal formatting

json

Structured metadata, links, images, and extracted fields

html

Cleaned HTML for custom processing


Local First, Hosted When Needed

The CLI and MCP server work locally without an account for the core extraction path.

Use the hosted API at webclaw.io when you need:

  • protected-site access without managing infrastructure

  • JavaScript rendering

  • async crawl and research jobs

  • web search

  • watches and production usage tracking

  • SDKs for application code

export WEBCLAW_API_KEY=wc_your_key

webclaw https://example.com --cloud

What You Can Build

Use case

Example

AI agent web access

Give Claude, Cursor, or another MCP client clean page context

RAG ingestion

Crawl docs, help centers, blogs, and knowledge bases

Competitor monitoring

Track pricing pages, changelogs, docs, and product pages

Structured extraction

Turn messy pages into typed JSON for automations

Research workflows

Search, scrape, summarize, and cite multiple sources

Brand intelligence

Extract logos, colors, fonts, and social metadata

Architecture

webclaw/
 crates/
 webclaw-core HTML to markdown, text, JSON, and LLM-ready output
 webclaw-fetch Fetching, crawling, batching, and mapping
 webclaw-llm Local and hosted LLM provider support
 webclaw-pdf PDF text extraction
 webclaw-mcp MCP server for AI agents
 webclaw-cli Command-line interface

webclaw-core is pure extraction logic: no network I/O, small surface area, and usable independently from the fetching layer.


Configuration

Variable

Description

WEBCLAW_API_KEY

Hosted API key

OLLAMA_HOST

Ollama URL for local LLM features

OPENAI_API_KEY

OpenAI-compatible LLM provider key

OPENAI_BASE_URL

OpenAI-compatible base URL

ANTHROPIC_API_KEY

Anthropic-compatible LLM provider key

ANTHROPIC_BASE_URL

Anthropic-compatible base URL

WEBCLAW_PROXY

Single proxy URL

WEBCLAW_PROXY_FILE

Proxy pool file


Contributing

The most useful contributions right now are practical and small:

  • add examples for real agent and RAG workflows

  • improve SDK snippets

  • report pages that extract poorly

  • add failing fixtures for messy HTML

  • improve docs for MCP clients and local setup

  • test the CLI on more Linux/macOS environments

Good first places to start:

If a page extracts badly, include:

URL:
Command or API request:
Expected output:
Actual output:
Format used: markdown / llm / text / json / html
CLI, MCP, SDK, or API:

Please remove secrets, cookies, private tokens, and customer data from logs before posting.


Infrastructure Partner


Studio Partners


Community Plugins

Third-party plugins that integrate webclaw with AI agent platforms:

Plugin

Platform

What it does

openclaw-webclaw

OpenClaw

Native webclaw v1 API plugin with 9 tools: scrape, search, crawl, extract, summarize, diff, map, batch, brand

hermes-webclaw

Hermes Agent

Web search provider and 9 dedicated tools for the full v1 API surface. Install with hermes plugins install jal-co/hermes-webclaw

Built a webclaw integration? Open a PR to add it here.


Contributors

Thanks to everyone improving webclaw through issues, examples, docs, bug reports, and pull requests.


Star History


License

AGPL-3.0

A
license - permissive license
A
quality
A
maintenance

Maintenance

Maintainers
1hResponse time
1dRelease cycle
54Releases (12mo)
Commit activity
Issues opened vs closed

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/0xMassi/webclaw'

If you have feedback or need assistance with the MCP directory API, please join our Discord server