Web Scraper For Llms

Pricing

from $1.00 / 1,000 results

Try for free

Go to Apify Store

👁 Web Scraper For Llms

Web Scraper For Llms

Try for free

Stealth web scraping engine built for LLMs. Converts any web page to clean markdown or HTML

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

👁 AbotAPI

AbotAPI

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

15 days ago

Last modified

Quick Start

Scrape a list of URLs:

{
"urls":["https://example.com","https://medium.com/"]
}

Crawl a website and scrape all discovered pages:

{
"urls":["https://docs.example.com"],
"crawl":true,
"crawlDepth":2,
"crawlMaxPages":50
}

Input Parameters

Parameter	Type	Default	Description
`urls`	Array	required	URLs to scrape or crawl from
`crawl`	Boolean	`false`	Follow links to discover additional pages
`crawlDepth`	Integer	`1`	Link hops from seed URL (crawl only)
`crawlMaxPages`	Integer	`20`	Max pages to discover per seed (crawl only)
`formats`	Array	`["markdown"]`	Output formats: `markdown`, `html`, or both
`concurrency`	Integer	`3`	Parallel URL processing
`maxRetries`	Integer	`2`	Retry attempts for failed URLs (scrape only)
`timeoutMs`	Integer	`30000`	Timeout per URL in milliseconds
`onlyMainContent`	Boolean	`true`	Strip nav/header/footer/sidebar (scrape only)
`removeAds`	Boolean	`true`	Remove ads and tracking elements
`removeBase64Images`	Boolean	`true`	Remove inline base64 images
`includeTags`	Array	-	CSS selectors to keep (scrape only)
`excludeTags`	Array	-	CSS selectors to remove (scrape only)
`includePatterns`	Array	-	Regex URL filters (include only matching)
`excludePatterns`	Array	-	Regex URL filters (skip matching)
`waitForSelector`	String	-	Wait for CSS selector before extraction (scrape only)
`proxyConfiguration`	Object	-	Apify proxy settings

Output

{
"url":"https://medium.com/",
"title":"Medium: Read and write stories.",
"description":null,
"markdown":"## Human stories & ideas\n\nA place to read, write, and deepen your understanding...",
"html":null,
"metadata":{
"title":"Medium: Read and write stories.",
"language":"en",
"favicon":"https://miro.medium.com/...",
"canonical":"https://medium.com/",
"openGraph":null,
"twitter":null
},
"duration":5725,
"scrapedAt":"2026-02-24T03:36:28.990Z",
"success":true,
"error":null
}

Use Cases

RAG pipelines - Feed clean markdown into LLM knowledge bases
Content monitoring - Track changes across a set of pages
Research - Bulk extract articles, documentation, or product pages
Site migration - Crawl and export an entire site as markdown
Data extraction - Scrape structured content from specific CSS selectors

Web Page to Clean Markdown

consistent_tradition/web-to-markdown

Extracts clean Markdown text from any web page. Perfect for AI/RAG datasets, research corpora, and content analysis.

👁 User avatar

Peter PANG

👁 Universal Markdown Scraper for LLMs avatar

Universal Markdown Scraper for LLMs

botflowtech/universal-markdown-scraper-for-llms

Universal Markdown Scraper for LLMs

👁 User avatar

BotFlowTech

🧠 RAG Web Browser — Web Content for AI & LLMs

nexgendata/rag-web-browser

Web browser for RAG pipelines and AI agents. Search Google, scrape top results, return clean Markdown. Feed your LLM with real-time web data. Works with Claude, GPT, LangChain, CrewAI. No API key needed.

👁 User avatar

NexGenData

👁 universal-web-to-markdown avatar

universal-web-to-markdown

hachi-dev/universal-web-to-markdown

High-performance tool for AI & RAG pipelines. Converts web pages to clean Markdown by removing noise and fixing relative URLs. Built with Cheerio for extreme speed and low cost ($0.50/1k pages). Perfect for feeding clean data to LLMs.

👁 User avatar

JI JUN

👁 Web to Markdown for LLMs avatar

Web to Markdown for LLMs

george.the.developer/web-to-markdown-llm

Convert any URL to clean LLM-ready markdown. 60-70% fewer tokens than raw HTML. Built for AI agents and RAG pipelines.

👁 User avatar

George Kioko

👁 AI Web-to-Markdown Extract API — URL to Clean JSON for LLMs avatar

AI Web-to-Markdown Extract API — URL to Clean JSON for LLMs

olican/ai-web-to-markdown-extract

Scrapes any webpage, automatically cleans HTML clutter (nav, footers, scripts, ads, cookie consent banners), and transforms the main content into clean, structured Markdown for LLMs and RAG.

👁 User avatar

Sergio Calvo

5.0

👁 MCP Web Scraper Server — AI-Ready Web Scraping via MCP avatar

MCP Web Scraper Server — AI-Ready Web Scraping via MCP

junipr/mcp-web-scraper

Model Context Protocol (MCP) server for web scraping. Provides scrape, extract, search, and link discovery tools via SSE or WebSocket transport. Connect AI agents and LLMs to live web data.

👁 User avatar

junipr

👁 RAG Web Browser avatar

RAG Web Browser

parseforge/rag-web-browser

Give your AI agents real-time web access! Search the web on any topic and get full page content as clean Markdown, ready for LLMs, RAG pipelines, or OpenAI Assistants. Includes titles, descriptions, links, authors, images, and metadata. Start grounding your AI with fresh data in minutes!

👁 User avatar

ParseForge

👁 Convert To Markdown avatar

Convert To Markdown

datavault/convert-to-markdown

Convert to Markdown, converts documents, spreadsheets, images (OCR), audio (transcription), and web/data files into clean Markdown. It runs fully locally, requires no API keys, and is ideal for LLMs, docs, and archiving.

👁 User avatar

Datavault

👁 Webpage to Markdown avatar

Webpage to Markdown

epicscrapers/webpage-to-markdown

Get the main content of any page as Markdown. Great for LLMs and AI agent workflows.

👁 User avatar

Epic Scrapers

👁 Blog article image

Web crawling vs. web scraping

👁 Blog article image

What is web scraping?

URL: https://apify.com/abotapi/web-scraper-for-llms