VOOZH about

URL: https://apify.com/mohamedgb00714/firescraper-ai-website-content-markdown-scraper

โ‡ฑ ๐Ÿ”ฅ FireScrape AI Website Content Markdown Scraper ยท Apify


๐Ÿ‘ ๐Ÿ”ฅ FireScrape AI Website Content Markdown Scraper avatar

๐Ÿ”ฅ FireScrape AI Website Content Markdown Scraper

Pricing

$30.00/month + usage

Go to Apify Store

๐Ÿ”ฅ FireScrape AI Website Content Markdown Scraper

Advanced web scraper powered by Crawlee and Puppeteer โ€” extracts website content, converts it to Markdown, and structures it for LLM training datasets.

Pricing

$30.00/month + usage

Rating

1.9

(2)

Developer

๐Ÿ‘ mohamed el hadi msaid

mohamed el hadi msaid

Maintained by Community

Actor stats

9

Bookmarked

302

Total users

5

Monthly active users

a year ago

Last modified

Share

Overview

FireScrape is a powerful web scraper built with Crawlee and Puppeteer. It crawls websites, extracts content, converts it into Markdown format, and structures the data โ€” perfect for generating datasets for LLMs.


๐ŸŽฏ Features

  • Extracts visible text or full HTML content
  • Converts content to Markdown
  • Captures screenshots
  • Supports proxy configurations
  • Follows links for deep crawling

๐Ÿ› ๏ธ Input Schema

{
"title":"FireScrape Input Schema",
"type":"object",
"schemaVersion":1,
"properties":{
"startUrls":{
"title":"Start URLs",
"type":"array",
"description":"List of URLs to start crawling from.",
"editor":"requestListSources",
"prefill":[{"url":"https://apify.com"}]
},
"maxPages":{
"title":"Maximum Pages",
"type":"integer",
"description":"The maximum number of pages to crawl.",
"default":50,
"minimum":1
},
"proxyConfig":{
"title":"Proxy Configuration",
"type":"object",
"description":"Select proxy settings.",
"editor":"proxy",
"default":{"useApifyProxy":true}
},
"screenshot":{
"title":"Take Screenshots",
"type":"boolean",
"description":"Enable this to capture a screenshot of each page.",
"default":true
},
"enqueue":{
"title":"Enqueue Links",
"type":"boolean",
"description":"Whether to follow and enqueue new links on the page.",
"default":true
},
"getText":{
"title":"Extract Text Content",
"type":"boolean",
"description":"Extract only the visible text content from the page.",
"default":false
},
"getHtml":{
"title":"Extract HTML Content",
"type":"boolean",
"description":"Extract the full HTML content of the page.",
"default":false
}
},
"required":["startUrls"]
}

โœ… Output Format

Each successfully scraped page will output a structured JSON object:

{
"url":"https://example.com",
"title":"Example Page",
"metadata":{"description":"An example page","keywords":["example","page"]},
"markdown":"# Example Page\n\nThis is an example page content...",
"textContent":"This is an example page content...",
"htmlContent":"<html><body><h1>Example Page</h1>...</body></html>",
"screenshot":"data:image/png;base64,iVBORw..."
}

๐Ÿš€ How to Run

  1. Deploy the actor on Apify.
  2. Input the desired URLs and configuration.
  3. Start the scraper and monitor progress.
  4. Download results as JSON or Markdown.

๐Ÿ”ง Customization

Feel free to extend FireScrape with additional features โ€” like handling dynamic content, authentication, or specialized formatting.


๐ŸŽ Bonus: n8n Workflow Integration

As a free bonus for using FireScrape, you can integrate these n8n workflows with this actor:

These workflows can help automate post-scraping actions and expand your automation capabilities.

Happy scraping! ๐Ÿš€๐Ÿ”ฅ

You might also like

Firecrawl Website Crawler

alizarin_refrigerator-owner/firecrawl-website-crawler

Enhanced Website Crawling with Superior JS Rendering Enhanced website crawler using Firecrawl's Crawl API for superior JavaScript rendering, smart rate limiting, anti-bot bypass, and clean markdown extraction.

Page Scraping Analyzer

apify/page-analyzer

Performs analysis of a webpage to figure out the best way how to scrape its data. Provide a URL and data points to find and get back a detailed dashboard showing how the data can be scraped. Works with initial and rendered HTML, JavaScript variables and dynamically loaded data.

Citation Builder

alizarin_refrigerator-owner/citation-builder

Build local SEO citations by automatically submitting your business NAP (Name, Address, Phone) to 45+ directories. Why Citations Matter Local citations are mentions of your business name, address, and phone number on other websites. They're a critical local SEO ranking factor:

Website Email Scraper

contacts-api/website-email-scraper

Collect verified email addresses with our Website Email Scraper. Extract emails from websites quickly for outreach, marketing campaigns, and lead generation.

Sitemap URL Extractor

onescales/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

570

5.0

Google Autocomplete Scraper

automation-lab/google-autocomplete-scraper

Extract Google autocomplete keyword suggestions for SEO and content research. Get long-tail keywords, trending search predictions, and question-based queries. Supports recursive expansion, alphabet append, and language targeting for comprehensive keyword discovery.

๐Ÿ‘ User avatar

Stas Persiianenko

82

5.0

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

SEO Checker

louisdeconinck/seo-checker

SEO Checker is an advanced Actor that performs comprehensive on-site SEO analysis for any website. It crawls web pages and extracts crucial SEO elements, providing detailed insights to help improve your website's search engine optimization.

๐Ÿ‘ User avatar

Louis Deconinck

329

5.0