HTML to Markdown โ clean conversion, boilerplate stripping
Pricing
from $2.00 / 1,000 results
HTML to Markdown โ clean conversion, boilerplate stripping
Convert scraped HTML into clean Markdown and plain text: headings, nested lists, links, images, code blocks, blockquotes, and tables. Drops scripts, styles, and structural boilerplate (nav/footer/aside) so only content remains. Pure parsing, no LLM cost.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
HTML to Markdown
Convert scraped HTML into clean Markdown and plain text โ pure parsing, no LLM cost. Pairs well with crawlers upstream and with Doc Structure Extractor or RAG Text Chunker downstream.
What it does
- Headings, paragraphs, nested lists, links, images, emphasis, inline code, fenced code blocks, blockquotes, simple tables, horizontal rules.
- Always drops
<script>,<style>and other non-content tags; drops structural boilerplate (nav,footer,aside,form) by default so only the article content remains. - Extracts the page title (
<title>, falling back to the first<h1>). - Also returns a plain-text rendering and basic stats.
Input
{"documents":["<html><body><h1>Guide</h1><p>Hello <strong>world</strong></p></body></html>"],"drop_boilerplate":true,"include_links":true,"include_images":true}
Output (one dataset item per document)
{"title":"Guide","markdown":"# Guide\n\nHello **world**","text":"Guide\n\nHello world","stats":{"blocks":2,"characters":26,"words":3},"document_index":0}
Usage
Feed it raw HTML from any crawler run, then chunk the resulting Markdown for RAG, index the plain text for search, or store the Markdown directly.
