Webpage to Markdown

Pricing

from $1.00 / 1,000 results

Try for free

Go to Apify Store

👁 Webpage to Markdown

Webpage to Markdown

Try for free

Get the main content of any page as Markdown. Great for LLMs and AI agent workflows.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

👁 Epic Scrapers

Epic Scrapers

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Convert Any Web Page to Clean Markdown/HTML/JSON- Content Extraction Tool for AI, Web Scraping, and Automation

Submit a URL and get the page's core content back as clean Markdown or HTML in seconds. Automatically strips navigation bars, sidebars, headers, footers, ads, and other clutter from any page type — articles, documentation, landing pages, and more. Returns rich metadata including title, description, author, publish date, language, word count, and featured image with every result.

Features

One-shot extraction — Submit any URL and receive clean, structured content in seconds. No configuration required.
Markdown and HTML output — Get content in the format that fits your pipeline. Markdown for LLM and AI workflows, HTML for full-fidelity rendering.
Rich page metadata — Title, author, description, publication date, language, word count, domain, site name, and featured image extracted automatically from every page.
Schema.org structured data — Extracts JSON-LD and microdata where available.
Language-aware extraction — Set a preferred BCP 47 language to improve content selection on multilingual pages.
Manual content targeting — Override auto-detection with a custom CSS selector when you need content from a specific page region.
Debug mode — Inspect which elements were removed and why, to fine-tune extraction on challenging pages.
SPA fallback — Automatically handles client-side rendered single-page applications via third-party APIs.

Output example

{
"url":"https://tim.blog/2026/04/24/how-to-keep-your-brain-sharp/",
"title":"How to Keep Your Brain Sharp: A Practical Playbook Beyond the Basics",
"description":"The following is a guest post from Dr. Tommy Wood (@drtommywood), associate professor of pediatrics and neuroscience at the University of Washington, where his research focuses on brain health.",
"author":"Tim Ferriss",
"published":"2026-04-24T18:46:08+00:00",
"domain":"tim.blog",
"site":"The Blog of Author Tim Ferriss",
"image":"https://tim.blog/wp-content/uploads/2026/04/milad-fakurian-58Z17lnVS4U-unsplash-scaled.jpg",
"favicon":"https://i0.wp.com/tim.blog/wp-content/uploads/2025/05/favicon.png?fit=32%2C32&quality=80&ssl=1",
"language":"en-US",
"wordCount":7961,
"parseTime":167,
"outputFormat":"markdown",
"content":"..."
}

Input

Field	Type	Default	Description
`urls`	`string[]`	—	Required. List of URLs to process
`outputFormat`	`enum`	`markdown`	Output format: `markdown`, `html`, or `json` (full metadata)
`debug`	`boolean`	`false`	Enable debug logging and debug info in results
`language`	`string`	—	Preferred BCP 47 language tag (e.g. `en`, `fr`, `ja`)
`contentSelector`	`string`	—	CSS selector to override auto-detection of main content

Output

Each URL produces a dataset entry with the following fields:

Field	Type	Description
`url`	`string`	Source URL
`title`	`string`	Page title
`content`	`string`	Extracted content (Markdown or HTML depending on `outputFormat`)
`description`	`string`	Page description / summary
`author`	`string`	Author of the article
`published`	`string`	Publication date
`domain`	`string`	Domain name
`site`	`string`	Website name
`image`	`string`	Main image URL
`favicon`	`string`	Favicon URL
`language`	`string`	Detected language (BCP 47)
`wordCount`	`number`	Word count
`parseTime`	`number`	Parse time in milliseconds
`outputFormat`	`string`	The format used (`markdown`, `html`, or `json`)

In JSON mode, additional fields like metaTags, schemaOrgData, and debug info are included. If an error occurs, the entry contains error instead of content.

Sample output

Running against https://apify.com produces a dataset entry with the full page content converted to Markdown and rich metadata extracted automatically:

{
"url":"https://apify.com",
"title":"Apify: Full-stack web scraping and data extraction platform",
"description":"Cloud platform for web scraping, browser automation, AI agents, and data for AI.",
"domain":"apify.com",
"site":"Apify",
"language":"en",
"wordCount":771,
"parseTime":128,
"outputFormat":"markdown",
"content":"## Get real-time web data for your AI\n\nApify Actors scrape up-to-date web data..."
}

The content field contains the full page rendered as clean Markdown, with images, links, and headings preserved. Switch to outputFormat: "html" or "json" for different views of the same data.

Webpage To Clean Markdown

technicaldost/webpage-to-clean-markdown

👁 User avatar

Technical Dost Solutions

👁 AI Web-to-Markdown Extract API — URL to Clean JSON for LLMs avatar

AI Web-to-Markdown Extract API — URL to Clean JSON for LLMs

olican/ai-web-to-markdown-extract

Scrapes any webpage, automatically cleans HTML clutter (nav, footers, scripts, ads, cookie consent banners), and transforms the main content into clean, structured Markdown for LLMs and RAG.

👁 User avatar

Sergio Calvo

5.0

Markdown API

vivid_astronaut/markdown

👁 User avatar

Fabio Suizu

👁 Ai Ready Web Page To Markdown Converter avatar

Ai Ready Web Page To Markdown Converter

mustafa.irshaid.113/ai-ready-web-page-to-markdown-converter

Convert any webpage into structured Markdown and HTML using just a URL. Get the page title, link, and content—perfect for SEO, devs, and AI crawlers. Fast, clean, and ideal for repurposing or analysis. Start turning websites into Markdown instantly.

👁 User avatar

Mustafa Irshaid

URL to Markdown for LLMs (polite, robots-respecting)

weltverbenzer/url-to-markdown-for-llms

Turn any URL into clean, LLM-ready Markdown for AI agents and RAG pipelines. Enforces robots.txt, extracts main content (Readability) and converts to Markdown. Returns title, byline and markdown.

👁 User avatar

Johannes Witt

👁 Webpage to Markdown avatar

Webpage to Markdown

extremescrapes/webpage-to-markdown

This actor cost-effectively converts websites into structured markdown optimized for AI processing. It extracts webpage content, formats it into clean markdown, and ensures compatibility with AI models.

👁 User avatar

Extreme Scrapes

212

5.0

👁 Website To Markdown avatar