Pricing
$5.00/month + usage
Download HTML from URLs
This actor takes a list of URLs and downloads HTML of each page.
Pricing
$5.00/month + usage
Rating
5.0
(3)
Developer
Actor stats
0
Bookmarked
3
Total users
3
Monthly active users
7 months ago
Last modified
Share
๐ง HTML Downloader
This Apify actor takes a list of URLs and downloads the full HTML content of each page. It simply scrapes the complete HTML code for all given URLs. You can define proxy settings and optional selector waiting.
โ Use Cases
๐ Download HTML content from multiple websites
๐ท๏ธ Archive web pages for offline analysis
๐ Extract raw HTML for custom parsing
๐ Monitor website changes over time
๐ฅ Input Configuration
You can customize the actor using the following input fields:
{"requestListSources":[{"url":"https://apify.com"}],"proxyConfiguration":{"useApifyProxy":true},"handlePageTimeoutSecs":60,"maxRequestRetries":1,"useChrome":false}
๐งพ Fields Explained Field Type Description requestListSources array Required. Array of URLs to download. Each item can have optional userData with waitForSelector proxyConfiguration object Proxy settings - choose no proxy, Apify Proxy, or custom proxy URLs handlePageTimeoutSecs integer Optional. Maximum time to spend processing one page (default: 60) maxRequestRetries integer Optional. How many retries before giving up (default: 1) useChrome boolean Optional. Use real Chrome browser instead of Chromium (default: false)
๐ค Output
The actor returns a dataset containing HTML content for each URL. Each record includes the original URL, final URL (after redirects), page title, and full HTML content.
๐งฉ Sample Output
[{"url":"https://apify.com","loadedUrl":"https://apify.com/","title":"Apify - Web Scraping & Data Extraction | Apify","html":"<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n..."}]
๐ Proxy Configuration
This actor supports flexible proxy configuration:
No proxy (default)
Apify Proxy for residential IPs
Custom proxy URLs
Default proxy settings:
{"useApifyProxy":true}
๐ How to Use
Open the actor in Apify Console
Click "Try actor" or create a new task
Add URLs to the requestListSources array
Configure proxy settings if needed
Run the actor
Download HTML content in JSON, CSV, or XML format
โ๏ธ Advanced Input Example
{"requestListSources":[{"url":"https://example.com","userData":{"waitForSelector":".content-loaded"}},{"url":"https://another-site.com"}],"proxyConfiguration":{"useApifyProxy":true,"apifyProxyGroups":["RESIDENTIAL"]},"handlePageTimeoutSecs":120,"maxRequestRetries":3,"useChrome":true}
๐ ๏ธ Tech Stack
๐งฉ Apify SDK โ for actor and data handling
๐ท๏ธ Crawlee โ for robust crawling and scraping
๐ Puppeteer โ for browser automation and rendering dynamic content
โ๏ธ Node.js โ fast, scalable backend environment
