Pricing
from $1.00 / 1,000 results
Webpage to Markdown
Get the main content of any page as Markdown. Great for LLMs and AI agent workflows.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
2
Bookmarked
8
Total users
2
Monthly active users
a month ago
Last modified
Categories
Share
Convert Any Web Page to Clean Markdown/HTML/JSON- Content Extraction Tool for AI, Web Scraping, and Automation
Submit a URL and get the page's core content back as clean Markdown or HTML in seconds. Automatically strips navigation bars, sidebars, headers, footers, ads, and other clutter from any page type β articles, documentation, landing pages, and more. Returns rich metadata including title, description, author, publish date, language, word count, and featured image with every result.
Features
- One-shot extraction β Submit any URL and receive clean, structured content in seconds. No configuration required.
- Markdown and HTML output β Get content in the format that fits your pipeline. Markdown for LLM and AI workflows, HTML for full-fidelity rendering.
- Rich page metadata β Title, author, description, publication date, language, word count, domain, site name, and featured image extracted automatically from every page.
- Schema.org structured data β Extracts JSON-LD and microdata where available.
- Language-aware extraction β Set a preferred BCP 47 language to improve content selection on multilingual pages.
- Manual content targeting β Override auto-detection with a custom CSS selector when you need content from a specific page region.
- Debug mode β Inspect which elements were removed and why, to fine-tune extraction on challenging pages.
- SPA fallback β Automatically handles client-side rendered single-page applications via third-party APIs.
Output example
{"url":"https://tim.blog/2026/04/24/how-to-keep-your-brain-sharp/","title":"How to Keep Your Brain Sharp: A Practical Playbook Beyond the Basics","description":"The following is a guest post from Dr. Tommy Wood (@drtommywood), associate professor of pediatrics and neuroscience at the University of Washington, where his research focuses on brain health.","author":"Tim Ferriss","published":"2026-04-24T18:46:08+00:00","domain":"tim.blog","site":"The Blog of Author Tim Ferriss","image":"https://tim.blog/wp-content/uploads/2026/04/milad-fakurian-58Z17lnVS4U-unsplash-scaled.jpg","favicon":"https://i0.wp.com/tim.blog/wp-content/uploads/2025/05/favicon.png?fit=32%2C32&quality=80&ssl=1","language":"en-US","wordCount":7961,"parseTime":167,"outputFormat":"markdown","content":"..."}
Input
| Field | Type | Default | Description |
|---|---|---|---|
urls | string[] | β | Required. List of URLs to process |
outputFormat | enum | markdown | Output format: markdown, html, or json (full metadata) |
debug | boolean | false | Enable debug logging and debug info in results |
language | string | β | Preferred BCP 47 language tag (e.g. en, fr, ja) |
contentSelector | string | β | CSS selector to override auto-detection of main content |
Output
Each URL produces a dataset entry with the following fields:
| Field | Type | Description |
|---|---|---|
url | string | Source URL |
title | string | Page title |
content | string | Extracted content (Markdown or HTML depending on outputFormat) |
description | string | Page description / summary |
author | string | Author of the article |
published | string | Publication date |
domain | string | Domain name |
site | string | Website name |
image | string | Main image URL |
favicon | string | Favicon URL |
language | string | Detected language (BCP 47) |
wordCount | number | Word count |
parseTime | number | Parse time in milliseconds |
outputFormat | string | The format used (markdown, html, or json) |
In JSON mode, additional fields like metaTags, schemaOrgData, and debug info are included. If an error occurs, the entry contains error instead of content.
Sample output
Running against https://apify.com produces a dataset entry with the full page content converted to Markdown and rich metadata extracted automatically:
{"url":"https://apify.com","title":"Apify: Full-stack web scraping and data extraction platform","description":"Cloud platform for web scraping, browser automation, AI agents, and data for AI.","domain":"apify.com","site":"Apify","language":"en","wordCount":771,"parseTime":128,"outputFormat":"markdown","content":"## Get real-time web data for your AI\n\nApify Actors scrape up-to-date web data..."}
The content field contains the full page rendered as clean Markdown, with images, links, and headings preserved. Switch to outputFormat: "html" or "json" for different views of the same data.
