VOOZH about

URL: https://apify.com/crawlerbros/markdownify-mcp-server

⇱ Markdownify MCP Server Β· Apify


Pricing

from $1.00 / 1,000 results

Go to Apify Store

Markdownify MCP Server

Convert any webpage to clean, formatted Markdown perfect for AI consumption. Ideal for building knowledge bases, documentation scrapers, and content migration tools.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(3)

Developer

πŸ‘ Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

2

Bookmarked

18

Total users

3

Monthly active users

8 days ago

Last modified

Share

Convert any webpage to clean, formatted Markdown perfect for AI consumption. This Actor is ideal for building knowledge bases, documentation scrapers, and content migration tools.

Features

βœ… Convert any webpage to Markdown - Clean, formatted output
βœ… CSS Selector Support - Include/exclude specific sections
βœ… JavaScript Rendering - Optional Playwright support for dynamic content
βœ… Authentication Support - HTTP Basic Auth for restricted content
βœ… Customizable Output - Configure heading styles, strip tags, etc.
βœ… Error Handling - Graceful failures with detailed error messages
βœ… MCP Server Ready - Structured output for AI consumption

How It Works

  1. Input - Provide URL(s) and optional configuration
  2. Fetch - Download webpage content (HTTP or Playwright)
  3. Extract - Apply include/exclude selectors
  4. Convert - Transform HTML to clean Markdown
  5. Output - Save to Apify dataset with metadata

Input Parameters

Required

  • urls (array of strings) - List of webpage URLs to convert

Optional

  • includeSelectors (array of strings) - CSS selectors to include specific sections
    Example: ["article", ".main-content", "#documentation"]

  • excludeSelectors (array of strings) - CSS selectors to exclude
    Example: ["nav", "footer", ".advertisement", "script", "style"]

  • useJavaScript (boolean) - Enable Playwright for JavaScript-heavy pages
    Default: false

  • headingStyle (string) - Markdown heading style
    Options: "ATX" (# Heading) or "SETEXT" (Heading\n=======)
    Default: "ATX"

  • stripTags (array of strings) - HTML tags to completely remove
    Default: ["script", "style", "iframe", "noscript"]

  • auth (object) - HTTP Basic Authentication credentials
    Example: {"username": "user", "password": "pass"}

  • timeout (integer) - Request timeout in seconds
    Default: 30, Range: 10-120

Input Example

{
"urls":["https://apify.com/docs","https://en.wikipedia.org/wiki/Markdown"],
"excludeSelectors":["nav","footer",".advertisement"],
"useJavaScript":false,
"headingStyle":"ATX",
"timeout":30
}

Output Format

Each converted page is saved as a separate record in the dataset:

{
"url":"https://example.com",
"title":"Example Domain",
"markdown":"# Example Domain\n\nThis domain is for use...",
"markdown_length":1234,
"success":true,
"error":null,
"scraped_at":"2025-10-24T10:30:00.000Z",
"meta":{
"method":"http",
"heading_style":"ATX",
"stripped_tags":["script","style"],
"used_include_selectors":false,
"used_exclude_selectors":true
}
}

Use Cases

πŸ“š Build AI-Ready Knowledge Bases

Convert documentation, wikis, and help centers into Markdown for AI training or RAG systems.

πŸ“ Content Migration

Migrate existing web content to Markdown for static site generators (Jekyll, Hugo, etc.).

πŸ€– AI Agent Integration

Enable AI agents to consume web content in a clean, structured format.

πŸ“„ Documentation Scraping

Extract and format technical documentation from multiple sources.

πŸ”„ Content Synchronization

Keep Markdown versions of web pages up-to-date automatically.

API Integration

JavaScript/Node.js

const{ ApifyClient }=require("apify-client");
const client =newApifyClient({token:"YOUR_API_TOKEN"});
const input ={
urls:["https://example.com"],
excludeSelectors:["nav","footer"],
};
const run =await client.actor("YOUR_ACTOR_ID").call(input);
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item)=>{
console.log(`Title: ${item.title}`);
console.log(`Markdown length: ${item.markdown_length}`);
console.log(item.markdown);
});

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
input_data ={
'urls':['https://example.com'],
'excludeSelectors':['nav','footer']
}
run = client.actor('YOUR_ACTOR_ID').call(run_input=input_data)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(f"Title: {item['title']}")
print(f"Markdown length: {item['markdown_length']}")
print(item['markdown'])

cURL

curl-X POST https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs \
-H"Authorization: Bearer YOUR_API_TOKEN"\
-H"Content-Type: application/json"\
-d'{
"urls": ["https://example.com"],
"excludeSelectors": ["nav", "footer"]
}'

Tips & Best Practices

πŸš€ Performance

  • Use useJavaScript: false for static pages (much faster)
  • Only enable useJavaScript: true for dynamic content
  • Use includeSelectors to extract only what you need
  • Batch multiple URLs in a single run

🎯 Accuracy

  • Test selectors in browser DevTools first
  • Use specific includeSelectors for precise extraction
  • Combine include and exclude for best results
  • Add common noise elements to excludeSelectors

πŸ”§ Troubleshooting

  • Empty markdown? Check if selectors are correct
  • Missing content? Try enabling useJavaScript
  • Timeout errors? Increase timeout value
  • Authentication issues? Verify auth credentials

Development

Local Testing

# Install dependencies
pip install-r requirements.txt
# Install Playwright browsers
playwright install chromium
# Run locally
python -m src

Project Structure

markdownify-mcp/
β”œβ”€β”€ .actor/
β”‚ β”œβ”€β”€ actor.json # Actor configuration
β”‚ β”œβ”€β”€ input_schema.json # Input validation
β”‚ └── output_schema.json # Output structure
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ __main__.py # Main entry point
β”‚ β”œβ”€β”€ fetcher.py # HTTP& Playwright fetchers
β”‚ β”œβ”€β”€ extractor.py # Content extraction
β”‚ └── converter.py # HTML to Markdown
β”œβ”€β”€ Dockerfile # Docker configuration
β”œβ”€β”€ requirements.txt # Python dependencies
└── README.md # This file

License

Apache 2.0

Support

For issues, questions, or feature requests, please contact support or open an issue in the repository.


Made with ❀️ for the AI community

You might also like

File to Markdown

shahidirfan/file-to-markdown

Transform files into clean, readable Markdown instantly. Convert PDFs, documents, images, and more to structured Markdown format. Perfect for automating documentation workflows, content migration, and building knowledge bases. Ideal for developers, writers, and content teams.

5

5.0

HTML to Markdown

web.harvester/html-to-markdown

Convert HTML to clean Markdown. Supports GFM tables, code blocks, and custom rules. Perfect for content migration and documentation.

3

Website To Markdown

swarmgarden/website-to-markdown

Convert any webpage to clean, readable Markdown format. Perfect for content extraction and readability.

70

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds β€” perfect for AI training data, RAG pipelines, and content archiving.

Ai Ready Web Page To Markdown Converter

mustafa.irshaid.113/ai-ready-web-page-to-markdown-converter

Convert any webpage into structured Markdown and HTML using just a URL. Get the page title, link, and contentβ€”perfect for SEO, devs, and AI crawlers. Fast, clean, and ideal for repurposing or analysis. Start turning websites into Markdown instantly.

πŸ‘ User avatar

Mustafa Irshaid

16

HTML to Markdown Converter - Bulk Web Content to MD

santamaria-automations/html-to-markdown

Extract main article content from any website and convert to clean Markdown including headings, links, images, tables, and code blocks. Perfect for LLM training, AI pipelines, and documentation. Export data, run via API, schedule and monitor runs, or integrate with other tools.