๐ฅ FireScrape AI Website Content Markdown Scraper
Pricing
$30.00/month + usage
๐ฅ FireScrape AI Website Content Markdown Scraper
Advanced web scraper powered by Crawlee and Puppeteer โ extracts website content, converts it to Markdown, and structures it for LLM training datasets.
Pricing
$30.00/month + usage
Rating
1.9
(2)
Developer
Actor stats
9
Bookmarked
302
Total users
5
Monthly active users
a year ago
Last modified
Categories
Share
Overview
FireScrape is a powerful web scraper built with Crawlee and Puppeteer. It crawls websites, extracts content, converts it into Markdown format, and structures the data โ perfect for generating datasets for LLMs.
๐ฏ Features
- Extracts visible text or full HTML content
- Converts content to Markdown
- Captures screenshots
- Supports proxy configurations
- Follows links for deep crawling
๐ ๏ธ Input Schema
{"title":"FireScrape Input Schema","type":"object","schemaVersion":1,"properties":{"startUrls":{"title":"Start URLs","type":"array","description":"List of URLs to start crawling from.","editor":"requestListSources","prefill":[{"url":"https://apify.com"}]},"maxPages":{"title":"Maximum Pages","type":"integer","description":"The maximum number of pages to crawl.","default":50,"minimum":1},"proxyConfig":{"title":"Proxy Configuration","type":"object","description":"Select proxy settings.","editor":"proxy","default":{"useApifyProxy":true}},"screenshot":{"title":"Take Screenshots","type":"boolean","description":"Enable this to capture a screenshot of each page.","default":true},"enqueue":{"title":"Enqueue Links","type":"boolean","description":"Whether to follow and enqueue new links on the page.","default":true},"getText":{"title":"Extract Text Content","type":"boolean","description":"Extract only the visible text content from the page.","default":false},"getHtml":{"title":"Extract HTML Content","type":"boolean","description":"Extract the full HTML content of the page.","default":false}},"required":["startUrls"]}
โ Output Format
Each successfully scraped page will output a structured JSON object:
{"url":"https://example.com","title":"Example Page","metadata":{"description":"An example page","keywords":["example","page"]},"markdown":"# Example Page\n\nThis is an example page content...","textContent":"This is an example page content...","htmlContent":"<html><body><h1>Example Page</h1>...</body></html>","screenshot":"data:image/png;base64,iVBORw..."}
๐ How to Run
- Deploy the actor on Apify.
- Input the desired URLs and configuration.
- Start the scraper and monitor progress.
- Download results as JSON or Markdown.
๐ง Customization
Feel free to extend FireScrape with additional features โ like handling dynamic content, authentication, or specialized formatting.
๐ Bonus: n8n Workflow Integration
As a free bonus for using FireScrape, you can integrate these n8n workflows with this actor:
These workflows can help automate post-scraping actions and expand your automation capabilities.
Happy scraping! ๐๐ฅ
