VOOZH about

URL: https://apify.com/hedelka/tech-docs-scraper

⇱ Tech Docs to Markdown for RAG & LLM Β· Apify


Pricing

$0.50 / 1,000 pages

Go to Apify Store

Tech Docs to LLM-Ready Markdown

Scrapes technical documentation sites (Docusaurus, GitBook, MkDocs, ReadTheDocs) and converts them to clean, structured Markdown for RAG pipelines, LLM training, and AI assistants. Automatically detects documentation framework and removes navigation elements.

Pricing

$0.50 / 1,000 pages

Rating

0.0

(0)

Developer

πŸ‘ Dmitry Goncharov

Dmitry Goncharov

Maintained by Community

Actor stats

1

Bookmarked

25

Total users

2

Monthly active users

4 months ago

Last modified

Categories

Share

Tech Docs to LLM-Ready Markdown Scraper

πŸš€ Convert any technical documentation site to clean, structured Markdown β€” ready for RAG pipelines, LLM training, and AI assistants.

Why This Actor?

While generic web scrapers dump raw HTML, this Actor is specifically designed for technical documentation:

FeatureGeneric ScrapersThis Actor
Code block preservation❌ Lost or brokenβœ… With language tags
Framework-aware extraction❌ One-size-fits-allβœ… Docusaurus, GitBook, MkDocs
Navigation removal❌ Mixed with contentβœ… Clean content only
RAG-ready output❌ Needs post-processingβœ… doc_id, section_path, chunking

πŸ”„ Before / After

You might also like

Documentation Crawler for RAG

liquid_bark/docs-crawler-for-rag

Specialized crawler for developer documentation sites. Detects frameworks (Docusaurus, GitBook, ReadTheDocs, MkDocs, Sphinx), extracts clean content, and outputs semantically chunked Markdown optimized for RAG pipelines.

Docs-to-RAG Crawler

automation-lab/docs-rag-crawler

Crawl documentation sites (ReadTheDocs, GitBook, Docusaurus, Mintlify) into RAG-ready Markdown/JSON chunks with stable chunk IDs, heading breadcrumbs, word counts, and token estimates.

πŸ‘ User avatar

Stas Persiianenko

7

RAG Knowledge Loader

botflowtech/rag-knowledge-loader

Scrapes documentation sites (GitBook, ReadTheDocs, Notion public pages) and converts them into vector-ready JSON format for RAG applications.

RAG-Ready Documentation Scraper

alaricus/rag-docs-markdown-scraper

Scrape documentation to framework-optimized Markdown. Features semantic chunking for LLM, vector database, and RAG pipelines. Parse XML sitemaps easily.

Docs Markdown Rag Ready Crawler

devwithbobby/docs-markdown-rag-ready-crawler

Turn any documentation site or website into clean, structured markdownβ€”ready for RAG, embeddings, and AI agents.

πŸ‘ User avatar

Dev with Bobby

11

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds β€” perfect for AI training data, RAG pipelines, and content archiving.

LLM Markdown Crawler

sleek_waveform/llm-markdown-crawler

Crawl any website and extract clean, boilerplate-free Markdown optimized for LLMs, RAG pipelines, and AI training datasets. Uses Mozilla Readability to strip navigation and ads, then converts to clean Markdown. No browser required β€” fast and cheap.

πŸ‘ User avatar

Daniel Dimitrov

4

Website to Markdown Crawler for LLM & RAG

logiover/website-text-markdown-crawler

Crawl any website to clean Markdown and plain text for LLM training and RAG. HTML to Markdown, no API or login. Export website text to CSV or JSON.