VOOZH about

URL: https://apify.com/epicscrapers/webpage-to-markdown

⇱ Webpage to Markdown Β· Apify


Pricing

from $1.00 / 1,000 results

Go to Apify Store

Webpage to Markdown

Get the main content of any page as Markdown. Great for LLMs and AI agent workflows.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ Epic Scrapers

Epic Scrapers

Maintained by Community

Actor stats

2

Bookmarked

8

Total users

2

Monthly active users

a month ago

Last modified

Share

Convert Any Web Page to Clean Markdown/HTML/JSON- Content Extraction Tool for AI, Web Scraping, and Automation

Submit a URL and get the page's core content back as clean Markdown or HTML in seconds. Automatically strips navigation bars, sidebars, headers, footers, ads, and other clutter from any page type β€” articles, documentation, landing pages, and more. Returns rich metadata including title, description, author, publish date, language, word count, and featured image with every result.

Features

  • One-shot extraction β€” Submit any URL and receive clean, structured content in seconds. No configuration required.
  • Markdown and HTML output β€” Get content in the format that fits your pipeline. Markdown for LLM and AI workflows, HTML for full-fidelity rendering.
  • Rich page metadata β€” Title, author, description, publication date, language, word count, domain, site name, and featured image extracted automatically from every page.
  • Schema.org structured data β€” Extracts JSON-LD and microdata where available.
  • Language-aware extraction β€” Set a preferred BCP 47 language to improve content selection on multilingual pages.
  • Manual content targeting β€” Override auto-detection with a custom CSS selector when you need content from a specific page region.
  • Debug mode β€” Inspect which elements were removed and why, to fine-tune extraction on challenging pages.
  • SPA fallback β€” Automatically handles client-side rendered single-page applications via third-party APIs.

Output example

{
"url":"https://tim.blog/2026/04/24/how-to-keep-your-brain-sharp/",
"title":"How to Keep Your Brain Sharp: A Practical Playbook Beyond the Basics",
"description":"The following is a guest post from Dr. Tommy Wood (@drtommywood), associate professor of pediatrics and neuroscience at the University of Washington, where his research focuses on brain health.",
"author":"Tim Ferriss",
"published":"2026-04-24T18:46:08+00:00",
"domain":"tim.blog",
"site":"The Blog of Author Tim Ferriss",
"image":"https://tim.blog/wp-content/uploads/2026/04/milad-fakurian-58Z17lnVS4U-unsplash-scaled.jpg",
"favicon":"https://i0.wp.com/tim.blog/wp-content/uploads/2025/05/favicon.png?fit=32%2C32&quality=80&ssl=1",
"language":"en-US",
"wordCount":7961,
"parseTime":167,
"outputFormat":"markdown",
"content":"..."
}

Input

FieldTypeDefaultDescription
urlsstring[]β€”Required. List of URLs to process
outputFormatenummarkdownOutput format: markdown, html, or json (full metadata)
debugbooleanfalseEnable debug logging and debug info in results
languagestringβ€”Preferred BCP 47 language tag (e.g. en, fr, ja)
contentSelectorstringβ€”CSS selector to override auto-detection of main content

Output

Each URL produces a dataset entry with the following fields:

FieldTypeDescription
urlstringSource URL
titlestringPage title
contentstringExtracted content (Markdown or HTML depending on outputFormat)
descriptionstringPage description / summary
authorstringAuthor of the article
publishedstringPublication date
domainstringDomain name
sitestringWebsite name
imagestringMain image URL
faviconstringFavicon URL
languagestringDetected language (BCP 47)
wordCountnumberWord count
parseTimenumberParse time in milliseconds
outputFormatstringThe format used (markdown, html, or json)

In JSON mode, additional fields like metaTags, schemaOrgData, and debug info are included. If an error occurs, the entry contains error instead of content.

Sample output

Running against https://apify.com produces a dataset entry with the full page content converted to Markdown and rich metadata extracted automatically:

{
"url":"https://apify.com",
"title":"Apify: Full-stack web scraping and data extraction platform",
"description":"Cloud platform for web scraping, browser automation, AI agents, and data for AI.",
"domain":"apify.com",
"site":"Apify",
"language":"en",
"wordCount":771,
"parseTime":128,
"outputFormat":"markdown",
"content":"## Get real-time web data for your AI\n\nApify Actors scrape up-to-date web data..."
}

The content field contains the full page rendered as clean Markdown, with images, links, and headings preserved. Switch to outputFormat: "html" or "json" for different views of the same data.

You might also like

AI Web-to-Markdown Extract API β€” URL to Clean JSON for LLMs

olican/ai-web-to-markdown-extract

Scrapes any webpage, automatically cleans HTML clutter (nav, footers, scripts, ads, cookie consent banners), and transforms the main content into clean, structured Markdown for LLMs and RAG.

2

5.0

Ai Ready Web Page To Markdown Converter

mustafa.irshaid.113/ai-ready-web-page-to-markdown-converter

Convert any webpage into structured Markdown and HTML using just a URL. Get the page title, link, and contentβ€”perfect for SEO, devs, and AI crawlers. Fast, clean, and ideal for repurposing or analysis. Start turning websites into Markdown instantly.

πŸ‘ User avatar

Mustafa Irshaid

16

Webpage to Markdown

extremescrapes/webpage-to-markdown

This actor cost-effectively converts websites into structured markdown optimized for AI processing. It extracts webpage content, formats it into clean markdown, and ensures compatibility with AI models.

πŸ‘ User avatar

Extreme Scrapes

212

5.0

Website To Markdown

swarmgarden/website-to-markdown

Convert any webpage to clean, readable Markdown format. Perfect for content extraction and readability.

70

Universal Markdown Scraper for LLMs

botflowtech/universal-markdown-scraper-for-llms

Universal Markdown Scraper for LLMs

URL to markdown

apify/url-to-markdown

An Apify Actor that takes a URL as input and returns the content of the page in Markdown format.

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds β€” perfect for AI training data, RAG pipelines, and content archiving.