VOOZH about

URL: https://glama.ai/mcp/servers/joedank/mcpcrawl4ai

⇱ Crawl4AI MCP Wrapper by joedank | Glama


Crawl4AI MCP Wrapper

Custom Model Context Protocol (MCP) server that wraps the Crawl4AI Docker API with reliable stdio transport for Claude Code integration.

Overview

This MCP wrapper provides 7 tools for web scraping, crawling, and content extraction:

  1. scrape_markdown - Extract clean markdown from webpages

  2. extract_html - Get preprocessed HTML structures

  3. capture_screenshot - Take full-page PNG screenshots

  4. generate_pdf - Create PDF documents from webpages

  5. execute_javascript - Run JavaScript in browser context

  6. crawl_urls - Crawl multiple URLs with configuration options

  7. ask_crawl4ai - Query Crawl4AI documentation and examples

Related MCP server: Enhanced Fetch MCP

Prerequisites

  • Python 3.8 or higher

  • Crawl4AI Docker container running on port 11235

  • Claude Code installed

Installation

1. Start Crawl4AI Docker Container

docker run -d -p 11235:11235 --name crawl4ai --shm-size=2g unclecode/crawl4ai:latest

Verify it's running:

curl http://localhost:11235/health

2. Set Up Virtual Environment

cd /Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Test the Wrapper

python test_wrapper.py

You should see all tests pass:

✅ All tests passed! MCP wrapper is ready to use.

Configuration

For Claude Code (Project-Level)

Add to /Volumes/4TB/Users/josephmcmyne/general/.mcp.json:

{
 "mcpServers": {
 "iterm-mcp": {
 "args": ["-y", "iterm-mcp"],
 "command": "npx"
 },
 "crawl4ai-custom": {
 "command": "/Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper/venv/bin/python",
 "args": ["/Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper/crawl4ai_mcp.py"],
 "env": {
 "CRAWL4AI_BASE_URL": "http://localhost:11235"
 }
 }
 }
}

For Claude Code (Global)

Add to ~/.claude/claude_desktop_config.json (if this file exists):

{
 "mcpServers": {
 "crawl4ai-custom": {
 "command": "/Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper/venv/bin/python",
 "args": ["/Volumes/4TB/Users/josephmcmyne/myProjects/mcp/crawl4ai-wrapper/crawl4ai_mcp.py"],
 "env": {
 "CRAWL4AI_BASE_URL": "http://localhost:11235"
 }
 }
 }
}

Usage

Restart Claude Code

After configuration, restart Claude Code to load the new MCP server.

Verify Connection

claude mcp list

You should see:

crawl4ai-custom: ... - ✓ Connected

Use in Claude Code

Example prompts:

Scrape the content from https://example.com and summarize it.
Take a screenshot of https://github.com
Crawl these URLs and extract their main content: https://example.com, https://example.org

Tool Reference

scrape_markdown

Extract clean markdown from a webpage.

Parameters:

  • url (required): URL to scrape

  • filter_type (optional): 'fit', 'raw', 'bm25', or 'llm' (default: 'fit')

  • query (optional): Query string for BM25/LLM filters

  • cache_bust (optional): Cache-bust counter (default: '0')

Example:

result = await scrape_markdown("https://example.com")

extract_html

Get preprocessed HTML from a webpage.

Parameters:

  • url (required): URL to extract HTML from

capture_screenshot

Capture a full-page PNG screenshot.

Parameters:

  • url (required): URL to screenshot

  • screenshot_wait_for (optional): Seconds to wait before capture (default: 2.0)

  • output_path (optional): Path to save the screenshot file

generate_pdf

Generate a PDF document from a webpage.

Parameters:

  • url (required): URL to convert to PDF

  • output_path (optional): Path to save the PDF file

execute_javascript

Run JavaScript code on a webpage.

Parameters:

  • url (required): URL to execute scripts on

  • scripts (required): List of JavaScript snippets to execute

crawl_urls

Crawl multiple URLs with configuration.

Parameters:

  • urls (required): List of 1-100 URLs to crawl

  • browser_config (optional): Browser configuration overrides

  • crawler_config (optional): Crawler configuration overrides

  • hooks (optional): Custom hook functions

ask_crawl4ai

Query Crawl4AI documentation.

Parameters:

  • query (required): Search query

  • context_type (optional): 'code', 'doc', or 'all' (default: 'all')

  • score_ratio (optional): Minimum score threshold (default: 0.5)

  • max_results (optional): Maximum results (default: 20)

Troubleshooting

Container Not Running

If tests fail with connection errors:

docker ps | grep crawl4ai

If not running, start it:

docker start crawl4ai

MCP Server Not Connecting

Check Claude Code logs:

tail -f ~/Library/Logs/Claude/mcp*.log

Verify the Python path in your .mcp.json is correct:

which python
# Should match the venv path in your config

Port Conflicts

If port 11235 is in use:

  1. Stop the Crawl4AI container: docker stop crawl4ai

  2. Find the conflicting process: lsof -i :11235

  3. Either stop that process or change the port in both Docker and this wrapper

Development

Adding New Tools

To add a new tool:

  1. Add an async function decorated with @mcp.tool() in crawl4ai_mcp.py

  2. Make an HTTP request to the appropriate Crawl4AI endpoint

  3. Add error handling

  4. Add a test in test_wrapper.py

  5. Update this README

Running in Debug Mode

# Enable debug logging
export CRAWL4AI_MCP_LOG=DEBUG

# Run the server
python crawl4ai_mcp.py

Architecture

┌─────────────────┐
│ Claude Code │
└────────┬────────┘
 │ stdio (MCP protocol)
 ▼
┌─────────────────┐
│ FastMCP Server │ (this wrapper)
│ crawl4ai_mcp.py│
└────────┬────────┘
 │ HTTP/REST API
 ▼
┌─────────────────┐
│ Crawl4AI Docker │
│ Container │
│ (port 11235) │
└─────────────────┘

Benefits Over Official SSE Server

  • Reliable stdio transport: No SSE connection issues

  • Full control: Easy to extend and customize

  • Better error handling: Graceful degradation

  • Simple debugging: Standard Python stack traces

  • No protocol mismatches: Direct HTTP to Crawl4AI API

License

MIT License - feel free to modify and distribute.

Credits

F
license - not found
-
quality - not tested
D
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/joedank/mcpcrawl4ai'

If you have feedback or need assistance with the MCP directory API, please join our Discord server