Crawl4ai

Pricing

Pay per usage

Try for free

Go to Apify Store

👁 Crawl4ai

Crawl4ai

Try for free

Extract page content (markdown/HTML/text), metadata, and link stats. Uses crawl4ai.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

👁 Kael Odin

Kael Odin

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Website Content Extractor

Apify Actor: extract page content (markdown/HTML/text), metadata, and link stats. Uses crawl4ai.

Quick start

pip install-e".[dev]"
crawl4ai-setup
python -m crawl4ai_actor.main

Input: startUrls (required), maxPages, maxDepth, waitUntil, waitForSelector, cssSelector, etc. Full schema: .actor/input_schema.json.

Output: dataset with url, success, content, title, content_length, links_internal_count, etc. Run summary in Storage → Key-value store (runSummary), including failedUrls for retries.

Options (high level)

Option	Purpose
`crawlMode`	`full` (default) \| `discover_only` — discover_only = URLs + links only, no content
`includeLinkUrls`	Include `links_internal` / `links_external` arrays in each item
`waitUntil`	`domcontentloaded` \| `load` \| `networkidle` (SPA/slow sites)
`pageLoadWaitSecs`	Extra delay before capture
`waitForSelector`	Wait for CSS selector (or `css:`/`js:` prefix)
`cssSelector`	Extract only this region (e.g. `main`, `.article`)
`virtualScrollSelector`	Infinite-scroll container to expand

Example — SPA / slow site: { "startUrls": ["https://..."], "waitUntil": "networkidle", "pageLoadWaitSecs": 2 }
Example — discover links only: { "startUrls": ["https://..."], "crawlMode": "discover_only", "maxPages": 100 }

Run locally / Docker

$docker build -t website-content-extractor .

Regression

$UX_MATRIX_GROUP=core python scripts/ux_matrix.py

Reports: scripts/ux_matrix_output.json, scripts/ux_matrix_report.txt (gitignored).

Crawl4ai To Markdown Pro2

juryless_rainbow/crawl4ai-to-markdown-pro2

A high-performance web-to-markdown crawler for AI agents, optimized for LLM data extraction using Crawl4AI. Features stealth browsing and high-fidelity content extraction.

👁 User avatar

aaron jungs

Website Content Crawler for AI — Clean Markdown, 4x Cheaper

joyouscam35875/website-content-crawler

Crawl any website and extract clean text/markdown for LLMs, RAG pipelines, vector databases. BFS crawl with depth control, robots.txt support, boilerplate removal. Perfect for feeding AI models. $0.001/page — 4x cheaper than the official Apify crawler.

👁 User avatar

Ken Digital

👁 Website Content Extractor for RAG: Markdown, HTML, Text avatar

Website Content Extractor for RAG: Markdown, HTML, Text

nezha/website-content-crawler

Turn docs sites, help centers, blogs, and websites into clean markdown, text, or HTML for RAG, AI knowledge bases, and internal search. Crawl from start URLs or sitemaps and keep the crawl in scope.

👁 User avatar

nezha

5.0

👁 RAG Web Browser Scraper avatar

RAG Web Browser Scraper

datapilot/rag-web-browser-scraper

RAG Web Browser Search & Crawl Actor uses to search Bing or crawl URLs, then extracts page content as clean markdown. It captures title, description, language, HTTP status, and structured metadata. Supports multiple queries, proxies, and outputs organized crawl + search results.

👁 User avatar

Data Pilot

AI Web Content Crawler - Markdown for LLMs

intelscrape/ai-web-content-crawler

Crawl any website and extract clean Markdown optimized for LLM training, RAG pipelines, and AI knowledge bases - removes boilerplate and outputs structured JSON with URL, title, markdown, and metadata.

👁 User avatar

IntelScrape

👁 Website Content Pipeline for AI: Markdown, Tokens, RAG Chunks avatar

Website Content Pipeline for AI: Markdown, Tokens, RAG Chunks

scrapemint/website-content-crawler

Crawl any website and ship clean Markdown, plain text, and HTML for AI, LLM, and RAG pipelines. Each row carries token estimates, JSON LD metadata, link graph, and optional auto chunk splitting for vector databases. Pay per page.

👁 User avatar

Ken M

👁 Website URL Crawler & Link Extractor avatar

Website URL Crawler & Link Extractor

maximedupre/website-url-crawler

Crawl JavaScript-rendered websites and export a URL link map. Get source pages, depth, anchor text, link type, HTTP metadata, and crawl status.

👁 User avatar

Maxime Dupré

👁 Ai Ready Web Page To Markdown Converter avatar

Ai Ready Web Page To Markdown Converter

mustafa.irshaid.113/ai-ready-web-page-to-markdown-converter

Convert any webpage into structured Markdown and HTML using just a URL. Get the page title, link, and content—perfect for SEO, devs, and AI crawlers. Fast, clean, and ideal for repurposing or analysis. Start turning websites into Markdown instantly.

👁 User avatar

Mustafa Irshaid

👁 🕷️ Website Crawler — Full-Site Scraping for AI avatar

🕷️ Website Crawler — Full-Site Scraping for AI

nexgendata/website-content-crawler

Crawl entire websites for clean text, markdown or HTML. Perfect for RAG pipelines, AI training & content analysis. Handles JS-rendered pages. Alternative to Firecrawl & Jina. Pay per page.

👁 User avatar

NexGenData

👁 Html to Markdown Converter avatar

Html to Markdown Converter

antonio_espresso/html-to-markdown-converter

Crawl a target URL and convert its HTML content into clean, structured Markdown with optional heading-based chunking.

👁 User avatar

Antonio Blago

URL: https://apify.com/kael_odin/crawl4ai

⇱ Crawl4ai · Apify

Crawl4ai

Website Content Extractor

Quick start

Options (high level)

Run locally / Docker

Regression

You might also like

Crawl4ai To Markdown Pro2

Website Content Crawler for AI — Clean Markdown, 4x Cheaper

Website Content Extractor for RAG: Markdown, HTML, Text

RAG Web Browser Scraper

AI Web Content Crawler - Markdown for LLMs

Website Content Pipeline for AI: Markdown, Tokens, RAG Chunks

Website URL Crawler & Link Extractor

Ai Ready Web Page To Markdown Converter

🕷️ Website Crawler — Full-Site Scraping for AI

Html to Markdown Converter