VOOZH about

URL: https://apify.com/louisdeconinck/dynamic-markdown-scraper

โ‡ฑ Dynamic Markdown Scraper ยท Apify


Pricing

$19.00/month + usage

Go to Apify Store

Dynamic Markdown Scraper

Effortlessly feed LLM AIs with clean Markdown using our advanced web scraper. Seamlessly scrape dynamic, JavaScript-rendered websites while preserving original formatting. Ideal for AI training, documentation, and content migration.

Pricing

$19.00/month + usage

Rating

5.0

(2)

Developer

๐Ÿ‘ Louis Deconinck

Louis Deconinck

Maintained by Community

Actor stats

7

Bookmarked

128

Total users

2

Monthly active users

6 months ago

Last modified

Share

A powerful web scraper that converts difficult to scrape web pages into clean, well-formatted Markdown content. This scraper crawls websites and automatically transforms their HTML content into Markdown format while maintaining the original structure and formatting. It handles dynamic content and JavaScript-rendered pages with ease.

Features

  • Crawls websites and converts content to Markdown format
  • Maintains proper heading structure, lists, and code blocks
  • Handles dynamic content and JavaScript-rendered pages
  • Handles images and links correctly
  • Respects same-domain crawling
  • Filters out unwanted content (navigation, footers, etc.)
  • Configurable maximum crawl limits
  • Smart content extraction focusing on main article content
  • Built with TypeScript for better maintainability

Use Cases

  • Feed website content to LLM AI for further processing
  • Extract content from websites for documentation, blog posts, or technical writing
  • Scrape and convert web pages for use in static sites, blogs, or other projects
  • Automate content migration from legacy systems to modern platforms

Input Configuration

The scraper accepts the following input parameters:

  • startUrls: Array of URLs where the crawler should begin (required)
  • maxRequestsPerCrawl: Maximum number of pages to crawl (optional, defaults to unlimited)

Example input:

{
"startUrls":[
{"url":"https://apify.com"}
],
"maxRequestsPerCrawl":100
}

Output Format

The scraper saves the following data for each processed page:

  • url: The URL of the scraped page
  • title: Page title
  • markdown: Converted Markdown content
  • capturedAt: Timestamp of when the page was scraped

Example output:

{
"url":"https://apify.com/storage",
"title":"Storage optimized for scraping ยท Apify",
"markdown":"# Apify Storage\n\nScalable and reliable cloud data storage designed for web scraping and automation workloads.\n\n[View documentation](https://docs.apify.com/platform/storage)\n\nBenefits\n\n## Specialized storage from Apify[](https://apify.com/storage#specialized-storage-from-apify)\n\n![Enterprise_grade_reliability_performance_and_scalability_9890860f85.svg](https://cdn-cms.apify.com/Enterprise_grade_reliability_performance_and_scalability_9890860f85.svg)\n\n### Enterprise-grade reliability, performance, and scalability[](https://apify.com/storage#enterprise-grade-reliability-performance-and-scalability)\n\nStore a few records or a few hundred million, with the same low latency and high reliability. We use Amazon Web Services for the underlying data storage, giving you high availability and peace of mind.\n\n### Low-cost storage for web scraping and crawling[](https://apify.com/storage#low-cost-storage-for-web-scraping-and-crawling)\n\nApify provides low-cost storage carefully designed for the large workloads typical of web scraping and crawling operations.\n\n![Low_cost_storage_for_web_scraping_and_crawling_b313f7d95e.svg](https://cdn-cms.apify.com/Low_cost_storage_for_web_scraping_and_crawling_b313f7d95e.svg)\n\n![Easy_to_use_634e40ae76.svg](https://cdn-cms.apify.com/Easy_to_use_634e40ae76.svg)\n\n### Easy to use[](https://apify.com/storage#easy-to-use)\n\nData can be viewed on the web, giving you a quick way to review and share it with other people. The Apify [API](https://docs.apify.com/api/v2) and [SDK](https://docs.apify.com/sdk/js/) makes it easy to integrate our storage into your apps.\n\nFeatures\n\n## Weโ€™ve got you covered[](https://apify.com/storage#weve-got-you-covered)\n\n[![Dataset_78dfe4e3a4.svg](https://cdn-cms.apify.com/Dataset_78dfe4e3a4.svg)\n\n**Dataset** \nStore results from your web scraping, crawling or data processing jobs into Apify datasets and export them to various formats like JSON, CSV, XML, RSS, Excel or HTML.\n\n\n\n\n\n](https://docs.apify.com/platform/storage/dataset)[![Request_queue_9e9602319e.svg](https://cdn-cms.apify.com/Request_queue_9e9602319e.svg)\n\n**Request queue** \nMaintain a queue of URLs of web pages in order to recursively crawl websites, starting from initial URLs and adding new links as they are found while skipping duplicates.\n\n\n\n\n\n](https://docs.apify.com/platform/storage/request-queue)[![Key_value_store_bc65220b7d.svg](https://cdn-cms.apify.com/Key_value_store_bc65220b7d.svg)\n\n**Key-value store** \nStore arbitrary data records along with their MIME content type. The records are accessible under a unique name and can be written and read at a rapid rate.\n\n\n\n\n\n](https://docs.apify.com/platform/storage/key-value-store)\n\n## Ready to build your first Actor?[](https://apify.com/storage#ready-to-build-your-first-actor)\n\n[Start developing](https://apify.com/templates)",
"capturedAt":"2025-01-23T14:01:21.956Z"
}

You might also like

Website Content to Markdown for LLM Training

easyapi/website-content-to-markdown-for-llm-training

๐Ÿš€ Transform web content into clean, LLM-ready Markdown! ๐Ÿ“˜ Scrape multiple pages, extract main content, and convert to Markdown format. Perfect for AI researchers, data scientists, and LLM developers. Fast, efficient, and customizable. Supercharge your AI training data today! ๐ŸŒ๐Ÿ“๐Ÿง 

Html To Markdown Converter ๐Ÿ“„

powerful_bachelor/html-to-markdown-converter

๐Ÿ“„โœจ HTML to Markdown Converter transforms web pages into clean, portable Markdown. Simply input a URL to extract content while preserving structure, formatting, and media elements.๐Ÿ”„ Perfect for content repurposing, documentation, and creating readable, platform-independent text from any webpage! ๐Ÿš€

๐Ÿ‘ User avatar

Powerful Bachelor

36

AI Markdown Maker

onescales/bulk-ai-markdown-maker

Convert any web page into clean, AI ready markdown format in seconds. This markdown generator is perfect for content for AI models, creating documentation, or archiving web content. It intelligently parses web content, removing ads, navigation, and other clutter. Generate Markdown Today!

133

5.0

File to Markdown

shahidirfan/file-to-markdown

Transform files into clean, readable Markdown instantly. Convert PDFs, documents, images, and more to structured Markdown format. Perfect for automating documentation workflows, content migration, and building knowledge bases. Ideal for developers, writers, and content teams.

5

5.0

Web Page to Markdown Converter Pro โ€” Clean Content Extraction

maged120/url-to-markdown-pro

Convert any web page to clean, formatted Markdown. Advanced content extraction handles complex layouts, paywalls, and JavaScript-rendered pages โ€” strips ads and returns just the content.

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds โ€” perfect for AI training data, RAG pipelines, and content archiving.