VOOZH about

URL: https://apify.com/abotapi/web-scraper-for-llms

โ‡ฑ Web Scraper For Llms ยท Apify


Pricing

from $1.00 / 1,000 results

Go to Apify Store

Web Scraper For Llms

Stealth web scraping engine built for LLMs. Converts any web page to clean markdown or HTML

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ AbotAPI

AbotAPI

Maintained by Community

Actor stats

0

Bookmarked

36

Total users

10

Monthly active users

15 days ago

Last modified

Share

Stealth web scraping engine built for LLMs. Converts any web page to clean markdown or HTML, ready for RAG pipelines, AI knowledge bases, and content analysis. Automatically bypasses Cloudflare and anti-bot protection using a stealth browser with undetectable fingerprints.

Quick Start

Scrape a list of URLs:

{
"urls":["https://example.com","https://medium.com/"]
}

Crawl a website and scrape all discovered pages:

{
"urls":["https://docs.example.com"],
"crawl":true,
"crawlDepth":2,
"crawlMaxPages":50
}

Input Parameters

ParameterTypeDefaultDescription
urlsArrayrequiredURLs to scrape or crawl from
crawlBooleanfalseFollow links to discover additional pages
crawlDepthInteger1Link hops from seed URL (crawl only)
crawlMaxPagesInteger20Max pages to discover per seed (crawl only)
formatsArray["markdown"]Output formats: markdown, html, or both
concurrencyInteger3Parallel URL processing
maxRetriesInteger2Retry attempts for failed URLs (scrape only)
timeoutMsInteger30000Timeout per URL in milliseconds
onlyMainContentBooleantrueStrip nav/header/footer/sidebar (scrape only)
removeAdsBooleantrueRemove ads and tracking elements
removeBase64ImagesBooleantrueRemove inline base64 images
includeTagsArray-CSS selectors to keep (scrape only)
excludeTagsArray-CSS selectors to remove (scrape only)
includePatternsArray-Regex URL filters (include only matching)
excludePatternsArray-Regex URL filters (skip matching)
waitForSelectorString-Wait for CSS selector before extraction (scrape only)
proxyConfigurationObject-Apify proxy settings

Output

{
"url":"https://medium.com/",
"title":"Medium: Read and write stories.",
"description":null,
"markdown":"## Human stories & ideas\n\nA place to read, write, and deepen your understanding...",
"html":null,
"metadata":{
"title":"Medium: Read and write stories.",
"language":"en",
"favicon":"https://miro.medium.com/...",
"canonical":"https://medium.com/",
"openGraph":null,
"twitter":null
},
"duration":5725,
"scrapedAt":"2026-02-24T03:36:28.990Z",
"success":true,
"error":null
}

Use Cases

  • RAG pipelines - Feed clean markdown into LLM knowledge bases
  • Content monitoring - Track changes across a set of pages
  • Research - Bulk extract articles, documentation, or product pages
  • Site migration - Crawl and export an entire site as markdown
  • Data extraction - Scrape structured content from specific CSS selectors

You might also like

Universal Markdown Scraper for LLMs

botflowtech/universal-markdown-scraper-for-llms

Universal Markdown Scraper for LLMs

universal-web-to-markdown

hachi-dev/universal-web-to-markdown

High-performance tool for AI & RAG pipelines. Converts web pages to clean Markdown by removing noise and fixing relative URLs. Built with Cheerio for extreme speed and low cost ($0.50/1k pages). Perfect for feeding clean data to LLMs.

Web to Markdown for LLMs

george.the.developer/web-to-markdown-llm

Convert any URL to clean LLM-ready markdown. 60-70% fewer tokens than raw HTML. Built for AI agents and RAG pipelines.

AI Web-to-Markdown Extract API โ€” URL to Clean JSON for LLMs

olican/ai-web-to-markdown-extract

Scrapes any webpage, automatically cleans HTML clutter (nav, footers, scripts, ads, cookie consent banners), and transforms the main content into clean, structured Markdown for LLMs and RAG.

2

5.0

MCP Web Scraper Server โ€” AI-Ready Web Scraping via MCP

junipr/mcp-web-scraper

Model Context Protocol (MCP) server for web scraping. Provides scrape, extract, search, and link discovery tools via SSE or WebSocket transport. Connect AI agents and LLMs to live web data.

RAG Web Browser

parseforge/rag-web-browser

Give your AI agents real-time web access! Search the web on any topic and get full page content as clean Markdown, ready for LLMs, RAG pipelines, or OpenAI Assistants. Includes titles, descriptions, links, authors, images, and metadata. Start grounding your AI with fresh data in minutes!

Convert To Markdown

datavault/convert-to-markdown

Convert to Markdown, converts documents, spreadsheets, images (OCR), audio (transcription), and web/data files into clean Markdown. It runs fully locally, requires no API keys, and is ideal for LLMs, docs, and archiving.

Webpage to Markdown

epicscrapers/webpage-to-markdown

Get the main content of any page as Markdown. Great for LLMs and AI agent workflows.

7

Related articles

Web crawling vs. web scraping
Read more
What is web scraping?
Read more