VOOZH about

URL: https://apify.com/shoebill-dev27/html-to-markdown

โ‡ฑ HTML to Markdown โ€” clean conversion, boilerplate stripping ยท Apify


๐Ÿ‘ HTML to Markdown โ€” clean conversion, boilerplate stripping avatar

HTML to Markdown โ€” clean conversion, boilerplate stripping

Pricing

from $2.00 / 1,000 results

Go to Apify Store

HTML to Markdown โ€” clean conversion, boilerplate stripping

Convert scraped HTML into clean Markdown and plain text: headings, nested lists, links, images, code blocks, blockquotes, and tables. Drops scripts, styles, and structural boilerplate (nav/footer/aside) so only content remains. Pure parsing, no LLM cost.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Shinobu Otani

Shinobu Otani

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Categories

Share

HTML to Markdown

Convert scraped HTML into clean Markdown and plain text โ€” pure parsing, no LLM cost. Pairs well with crawlers upstream and with Doc Structure Extractor or RAG Text Chunker downstream.

What it does

  • Headings, paragraphs, nested lists, links, images, emphasis, inline code, fenced code blocks, blockquotes, simple tables, horizontal rules.
  • Always drops <script>, <style> and other non-content tags; drops structural boilerplate (nav, footer, aside, form) by default so only the article content remains.
  • Extracts the page title (<title>, falling back to the first <h1>).
  • Also returns a plain-text rendering and basic stats.

Input

{
"documents":["<html><body><h1>Guide</h1><p>Hello <strong>world</strong></p></body></html>"],
"drop_boilerplate":true,
"include_links":true,
"include_images":true
}

Output (one dataset item per document)

{
"title":"Guide",
"markdown":"# Guide\n\nHello **world**",
"text":"Guide\n\nHello world",
"stats":{"blocks":2,"characters":26,"words":3},
"document_index":0
}

Usage

Feed it raw HTML from any crawler run, then chunk the resulting Markdown for RAG, index the plain text for search, or store the Markdown directly.

You might also like

HTML to Markdown

web.harvester/html-to-markdown

Convert HTML to clean Markdown. Supports GFM tables, code blocks, and custom rules. Perfect for content migration and documentation.

3

HTML to Markdown Converter - Bulk Web Content to MD

santamaria-automations/html-to-markdown

Extract main article content from any website and convert to clean Markdown including headings, links, images, tables, and code blocks. Perfect for LLM training, AI pipelines, and documentation. Export data, run via API, schedule and monitor runs, or integrate with other tools.

Web Page to Markdown Extractor

fetch_cat/web-page-to-markdown-extractor

Convert public URLs into clean Markdown, text, metadata, links, images, and optional HTML for AI agents, RAG, support, and automation workflows.

Website to Markdown Crawler for LLM & RAG

logiover/website-text-markdown-crawler

Crawl any website to clean Markdown and plain text for LLM training and RAG. HTML to Markdown, no API or login. Export website text to CSV or JSON.