Pricing
$2.50/month + usage
Go to Apify Store
LLM-Ready Web Scraper
Convert web pages to clean, LLM-friendly text. Perfect for RAG pipelines, AI chatbot training, and fine-tuning datasets. Removes ads,menus, and clutter automatically.
Pricing
$2.50/month + usage
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
6
Total users
1
Monthly active users
5 months ago
Last modified
Share
Converts web pages to clean, LLM-friendly formats. Perfect for building AI applications.
Use Cases
- RAG Pipelines: Get chunked content ready for vector databases
- Fine-tuning Datasets: Export as JSONL for LLM training
- Knowledge Bases: Build AI chatbot training data
- Content Extraction: Clean text without ads, menus, or clutter
Features
- Automatic content extraction (removes ads, navigation, footers)
- Multiple output formats: Markdown, JSON, JSONL
- Optional chunking with overlap for RAG
- Batch URL processing
- Metadata extraction (title, description, domain)
Output Formats
Markdown
---title:"Page Title"url: https://example.com/pagedomain: example.comscraped_at:2024-01-15T10:30:00Z---Clean page content here...
JSON
{"url":"https://example.com","success":true,"content":"Clean text content...","metadata":{"title":"Page Title","description":"Meta description"},"word_count":1500}
JSONL (Fine-tuning)
{"prompt":"Content from Page Title:","completion":"Clean text content..."}
With Chunks (RAG-ready)
{"chunks":[{"chunk_id":0,"text":"First chunk...","word_count":500},{"chunk_id":1,"text":"Second chunk...","word_count":500}],"chunk_count":5}
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| url | string | - | Single URL to scrape |
| urls | array | - | Multiple URLs for batch processing |
| outputFormat | string | markdown | Output format: markdown, json, jsonl |
| includeChunks | boolean | false | Split into RAG-ready chunks |
| chunkSize | integer | 500 | Words per chunk |
| chunkOverlap | integer | 50 | Overlap between chunks |
| maxConcurrency | integer | 5 | Parallel scraping limit |
Example Input
{"urls":["https://docs.python.org/3/tutorial/","https://docs.python.org/3/library/"],"outputFormat":"json","includeChunks":true,"chunkSize":500}
Pricing
Pay only for what you use. Typical cost: $0.01-0.05 per URL depending on page size.
