Pricing
from $1.00 / 1,000 results
Headless Browser HTML Scraper
Render any URL in a real headless browser and return the fully-rendered HTML, the page text, or a selected area by CSS selector. Scroll for lazy content, wait for elements, and capture screenshots. A browserless-style HTML API on Apify.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
1
Bookmarked
6
Total users
4
Monthly active users
12 days ago
Last modified
Categories
Share
A generic, browserless-style HTML API. Give it any URL and it opens a real headless Chromium browser, fully renders the JavaScript, optionally scrolls and waits, then returns the full rendered HTML β or just a selected area by CSS selector.
Think of it as a self-hosted browserless.io /content + /scrape on Apify.
What it does
- π Renders any URL with a real browser (JavaScript executed)
- π§© Selected area β pass a CSS selector and get every matching element's HTML, text, attributes, and position
- π Scroll to bottom β trigger infinite-scroll / lazy-loaded content with real wheel events
- β³ Wait for a selector, a load event, or a fixed delay
- πΌοΈ Optional full-page screenshot
- π« Block images/media/fonts/CSS to speed up and cut bandwidth
- π Use it synchronously as an API (
run-sync-get-dataset-items)
Input
| Field | Type | Description |
|---|---|---|
urls | array | Required. URLs to render and scrape. |
selector | string | Optional CSS selector for the "selected area". Returns each match's HTML/text/attributes/position. Empty = full page only. |
scrollToBottom | boolean | Scroll down to load lazy content. Default false. |
maxScrolls | integer | Max scroll rounds when scrolling. Default 15. |
waitForSelector | string | Wait until this selector appears (β€30s). |
waitUntil | enum | domcontentloaded (default) Β· load Β· networkidle. |
waitMs | integer | Extra fixed wait after load (ms). |
htmlMode | enum | full (entire DOM, default) or visible β just the above-the-fold content shown on open (no scroll), scripts/styles stripped for a short, clean HTML. |
blockResources | array | Resource types to block. Default ["media","font"]. |
returnFullHtml | boolean | Include the rendered HTML (full or visible per htmlMode). Default true. |
returnText | boolean | Include page visible text. Default true. |
includeScreenshot | boolean | Capture a full-page screenshot and return its URL. Default false. |
proxyConfiguration | object | Apify Proxy (datacenter) by default; use Residential for bot-protected sites. |
Example: full HTML of a JS-rendered page
{"urls":[{"url":"https://www.example.com"}],"waitUntil":"networkidle"}
Example: extract a selected area, after scrolling
{"urls":[{"url":"https://news.ycombinator.com"}],"selector":"span.titleline a","scrollToBottom":true}
Output
One record per URL:
{"url":"https://www.example.com","loadedUrl":"https://www.example.com/","statusCode":200,"title":"Example Domain","html":"<!DOCTYPE html><html>...</html>","text":"Example Domain\nThis domain is for use in...","selectedCount":30,"selectedElements":[{"text":"Some headline","html":"<a href=\"...\">Some headline</a>","attributes":[{"name":"href","value":"https://..."}],"width":320,"height":18,"top":140,"left":24}],"screenshotUrl":"https://api.apify.com/v2/key-value-stores/.../records/screenshot-1","scrapedAt":"2026-06-13T08:00:00.000Z"}
Use as an API
curl-X POST "https://api.apify.com/v2/acts/USERNAME~browserless-html-scraper/run-sync-get-dataset-items?token=TOKEN"\-H"Content-Type: application/json"\-d'{"urls":[{"url":"https://www.example.com"}],"selector":"h1"}'
Notes
- For bot-protected sites, switch
proxyConfigurationto Residential. - Blocking
image/stylesheetspeeds things up but can break layout-dependent lazy scrolling on some sites β keep them enabled (don't block) when usingscrollToBottomon such pages.
