👁 URL to Markdown (JustHTML) - Clean Markdown Extractor avatar

URL to Markdown (JustHTML) - Clean Markdown Extractor

Pricing

Pay per usage

👁 URL to Markdown (JustHTML) - Clean Markdown Extractor

URL to Markdown (JustHTML) - Clean Markdown Extractor

Convert webpages to clean Markdown for RAG and archiving. Uses JustHTML and supports optional Cloudflare/Turnstile bypass plus CSS selector extraction.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

👁 Anass

Anass

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

5 months ago

Last modified

Link to Markdown (JustHTML + Cloudflare Bypass)

🔗 URL → 🧼 Clean Markdown • 🛡️ Optional bypass • 🎯 CSS selector

Convert web links into clean Markdown for RAG, archiving, content pipelines, and AI agents.

This Actor fetches a URL, optionally bypasses Cloudflare challenges using the same Camoufox-based open source bypass approach in this repository, and converts the resulting HTML to Markdown using JustHTML (pure Python HTML5 parser with built-in safe output).

Keywords

link to markdown, html to markdown, webpage to markdown, url to markdown, cloudflare bypass, turnstile, anti-bot, RAG, LLM, AI agent, markdown extractor

Why this Actor (SEO)

If you need a dependable URL → Markdown converter for RAG pipelines, you usually hit three problems:

Broken or messy HTML that produces garbage Markdown
Heavy JavaScript pages that hide the real content
Anti-bot / Cloudflare interstitials that block simple fetchers

This Actor is built to be a practical extractor for AI agents, vector databases, knowledge bases, and content archiving workflows.

Common use cases

Convert product docs pages into Markdown for RAG
Build internal knowledge base snapshots from URLs
Extract “article” content with a CSS selector (main, article, .content)
Prepare clean Markdown for embedding/search indexing

Tips for better extraction

Set selector to target the content container (article, main, .markdown-body)
Use includeHtml=true only when debugging extraction
Keep safe=true when ingesting untrusted pages into downstream systems

What you get

Markdown output per URL (optionally for a specific CSS selector like article, main, or .markdown-body)
Safe-by-default sanitization for untrusted HTML
Optional Cloudflare challenge bypass fallback when normal fetching fails
Dataset output suitable for exporting to JSON/CSV

Input

urls (array) or url (string)
selector (string, optional)
safe (boolean, default: true)
useCloudflareBypass (boolean, default: true)
bypassCache (boolean, default: false)
proxyUrl (string, optional)
includeHtml (boolean, default: false)
maxConcurrency (int, default: 2)

Output (dataset items)

Each item contains:

url, finalUrl
status (success or failed)
title
markdown
statusCode, contentType
bypassed (boolean)
error (string, if failed)

Example input

{
"urls":[
"https://github.com/EmilStenstrom/justhtml"
],
"selector":".markdown-body",
"safe":true,
"useCloudflareBypass":true
}

Deploy to Apify

Install Apify CLI and log in
From this Actor directory, run:

$apify push

Then publish from the Apify Console with a title/description similar to this README for strong discoverability:

Keywords: link to markdown, html to markdown, justhtml, cloudflare bypass, turnstile, RAG

Licensing

This Actor’s code in this repository follows the repository’s license.
JustHTML is vendored under and distributed under its own license (see its LICENSE file).

👁 Website To Markdown avatar

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds — perfect for AI training data, RAG pipelines, and content archiving.

👁 User avatar

SmartApi

5.0

Markdown API

vivid_astronaut/markdown

👁 User avatar

Fabio Suizu

👁 Markdown Anything — URL to Markdown avatar

Markdown Anything — URL to Markdown

s-r/markdown-anything

Convert any URL to clean markdown using a 3-provider fallback chain. Batch input, high concurrency.

👁 User avatar

👁 Website To Markdown avatar

Website To Markdown

swarmgarden/website-to-markdown

Convert any webpage to clean, readable Markdown format. Perfect for content extraction and readability.

👁 User avatar

Swarm Garden

Webpage To Clean Markdown

technicaldost/webpage-to-clean-markdown

👁 User avatar

Technical Dost Solutions

👁 Website to Markdown Crawler for LLM & RAG avatar

Website to Markdown Crawler for LLM & RAG

logiover/website-text-markdown-crawler

Crawl any website to clean Markdown and plain text for LLM training and RAG. HTML to Markdown, no API or login. Export website text to CSV or JSON.

👁 User avatar

Logiover

👁 Convert To Markdown avatar

Convert To Markdown

datavault/convert-to-markdown

Convert to Markdown, converts documents, spreadsheets, images (OCR), audio (transcription), and web/data files into clean Markdown. It runs fully locally, requires no API keys, and is ideal for LLMs, docs, and archiving.

👁 User avatar

Datavault

👁 Web-to-Markdown Generator for AI & RAG Pipelines avatar

Web-to-Markdown Generator for AI & RAG Pipelines

profitstack/web-to-markdown-generator-for-ai-rag-pipelines

Convert any website into clean, heading-based chunking, LLM-ready Markdown for RAG and AI agents.

👁 User avatar

Manas Mantri

Web Page to Clean Markdown

consistent_tradition/web-to-markdown

Extracts clean Markdown text from any web page. Perfect for AI/RAG datasets, research corpora, and content analysis.

👁 User avatar

Peter PANG

URL to Markdown for LLMs (polite, robots-respecting)

weltverbenzer/url-to-markdown-for-llms

Turn any URL into clean, LLM-ready Markdown for AI agents and RAG pipelines. Enforces robots.txt, extracts main content (Readability) and converts to Markdown. Returns title, byline and markdown.

👁 User avatar

Johannes Witt

URL: https://apify.com/macheta/justhtml-link-to-markdown