VOOZH about

URL: https://apify.com/mickeywmoore/ai-rag-feeder-v2

⇱ AI RAG Feeder: Convert Websites to Clean Markdown for LLMs Β· Apify


Pricing

$1.00 / 1,000 pages

Go to Apify Store

Turn any website into AI-ready Markdown. Scrapes entire domains, removes ads/clutter, and formats text specifically for RAG pipelines and LLM training data.

Pricing

$1.00 / 1,000 pages

Rating

0.0

(0)

Developer

πŸ‘ Mickey Moore

Mickey Moore

Maintained by Community

Actor stats

0

Bookmarked

9

Total users

1

Monthly active users

4 months ago

Last modified

Share

AI RAG Feeder V2 is a specialized scraper designed to feed data into LLM (Large Language Model) and RAG (Retrieval-Augmented Generation) pipelines. It navigates websites and converts the HTML content into clean, token-efficient Markdown.

✨ Features

  • Clean Markdown Extraction: Automatically removes ads, navbars, and footers to save tokens.
  • Recursive Crawling: Can follow links to scrape entire documentation sites.
  • Smart Formatting: Preserves headers, code blocks, and tables for better embedding quality.
  • Proxy Support: Built-in rotation to avoid IP blocking.

πŸš€ How to use

  1. Start URLs: Enter the list of URLs you want to scrape.
  2. Max Depth: Set how deep the crawler should go (e.g., 1 for direct links, 0 for just the page).
  3. Run: The actor will output a JSON dataset ready for vector databases.

πŸ“¦ Output

The results are stored in the default Apify dataset. Each item contains:

{
"url":"[https://example.com/docs](https://example.com/docs)",
"title":"Documentation",
"markdown":"# Documentation\n\nThis is the clean text...",
"metadata":{"depth":1}
}

You might also like

Web-to-Markdown Generator for AI & RAG Pipelines

profitstack/web-to-markdown-generator-for-ai-rag-pipelines

Convert any website into clean, heading-based chunking, LLM-ready Markdown for RAG and AI agents.

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds β€” perfect for AI training data, RAG pipelines, and content archiving.

Docs Markdown Rag Ready Crawler

devwithbobby/docs-markdown-rag-ready-crawler

Turn any documentation site or website into clean, structured markdownβ€”ready for RAG, embeddings, and AI agents.

πŸ‘ User avatar

Dev with Bobby

11

Web Scraper RAG Ready

traorealexy/Web-Sraper-RAG-Ready

Turn any website into clean, token-efficient Markdown ready for RAG and LLM pipelines. Removes boilerplate, handles JavaScript rendering, and outputs structured JSON for LangChain, LlamaIndex, and vector databases.

Website to Markdown Crawler for LLM & RAG

logiover/website-text-markdown-crawler

Crawl any website to clean Markdown and plain text for LLM training and RAG. HTML to Markdown, no API or login. Export website text to CSV or JSON.

AI Training Data Curator

ryanclinton/ai-training-data-curator

Crawl any website and extract clean, structured text data ready for LLM fine-tuning, RAG pipelines, and AI model training.