Website Screenshot โ Full Pages, Any Resolution, PNG, No Limits
Pricing
Pay per usage
Website Screenshot โ Full Pages, Any Resolution, PNG, No Limits
20 runs. Website screenshots as PNG/JPG/PDF in 2 min โ full-page, desktop + mobile, custom viewport, bulk URL input. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For competitor visual tracking + UX research. spinov001@gmail.com ยท blog.spinov.online ยท t.me/scraping_ai
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
6
Total users
1
Monthly active users
2 months ago
Last modified
Categories
Share
Website Screenshot Scraper โ Playwright PNG/JPEG Capture, Custom Viewport
Capture batches of webpage screenshots โ full-page or viewport-only, PNG or JPEG, custom width/height โ to an Apify key-value store. Zero local browser install, zero Playwright boilerplate.
Headless Chromium (via Playwright) loads the URL with domcontentloaded waiting strategy, optionally waits for a CSS selector, captures the screenshot, stores it in the run's key-value store, and pushes one dataset record per URL with the signed image URL plus capture metadata.
What you actually get (verified against src/main.js)
Output schema โ one record per URL
{"url":"https://stripe.com","title":"Stripe | Financial Infrastructure for the Internet","screenshotKey":"screenshot_stripe_com_1714398900000","screenshotUrl":"https://api.apify.com/v2/key-value-stores/<storeId>/records/<screenshotKey>","format":"png","width":1280,"height":720,"fullPage":false,"fileSize":286410,"scrapedAt":"2026-04-29T12:00:00.000Z"}
10 fields per success record. On error, the actor pushes { url, error: "<reason>", scrapedAt } so your downstream pipeline can retry the failures selectively.
screenshotUrl points to the file inside the Apify run's default key-value store. The store is retained per Apify's plan defaults (typically 14 days on free tier; longer on paid). For permanent retention, post-process the run via Apify webhook โ S3 / R2 / your own object store.
Input (full schema, all 8 fields exposed in UI)
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
urls | array | [] | required, โฅ1 | List of URLs. Plain hostnames (stripe.com) get https:// prepended automatically. |
fullPage | boolean | false | โ | true for entire scroll height; false for viewport-only. |
width | integer | 1280 | 320โ3840 | Browser viewport width in pixels. |
height | integer | 720 | 240โ2160 | Browser viewport height in pixels. |
format | string | "png" | "png" | "jpeg" | Output format. |
quality | integer | 80 | 1โ100 | JPEG quality. Ignored when format="png". |
waitForSelector | string | "" | CSS selector | Optional CSS selector โ actor waits up to 10 s for it before capturing. Empty = skip. |
waitTime | integer | 2000 | 0โ30000 ms | Extra delay after load before capture. Note: a hardcoded 2000 ms settle ALSO runs before this โ total minimum settle = 2000 + waitTime (default total = 4000 ms). |
How it works
- Launch headless Chromium (Playwright
chromium.launch({ headless: true })). - New browser context with the requested
width ร heightviewport. - For each URL:
page.goto(url, { waitUntil: 'domcontentloaded', timeout: 45000 }).page.waitForLoadState('load', { timeout: 15000 })โ best-effort, swallows timeout.page.waitForTimeout(2000)โ settles late-paint elements.- If
waitForSelector:page.waitForSelector(selector, { timeout: 10000 }), swallows timeout. - Additional
page.waitForTimeout(waitTime)ifwaitTime > 0. page.screenshot({ fullPage, type: format, quality? }).- Save buffer to KV store as
screenshot_<domain>_<epochMs>. - Push one dataset record.
Why not networkidle? Pages with persistent SSE / WebSockets / live analytics (Stripe, Linear, Vercel) never reach networkidle โ Playwright would time out on them. The actor explicitly uses domcontentloaded + a soft load wait + a fixed waitTime to handle late-paint reliably.
Honest limitations (read before bulk runs)
- Total minimum settle =
2000 ms+waitTime. There's a hardcodedpage.waitForTimeout(2000)AFTERdomcontentloadedand BEFORE the configurablewaitTime. Default total is 2000 + 2000 = 4000 ms of fixed delay per URL on top of network time. For very-fast pages this is overkill; for very-slow SPA shells it may still be too short โ adjustwaitTime. - Single browser context, sequential URL processing. The actor opens ONE Chromium context and processes URLs in a
forloop. 100 URLs ร ~6 s wall-clock each โ 10 min. No parallelism. - One outer try/catch wraps browser launch only โ per-URL errors are caught. If a single URL fails (timeout, DNS, navigation error), the actor pushes
{ url, error, scrapedAt }and CONTINUES to the next URL. However, if the browser itself crashes mid-batch, the whole run aborts (no auto-relaunch). - Cloudflare Turnstile / hCaptcha / anti-bot walls block the actor. Standard headless Chromium fingerprint โ no stealth plugins. Cloudflare will challenge or block; expect either an error record or a screenshot of the challenge page.
- No login / cookie injection. Fresh browser context per run. Pages behind auth render their pre-login state. Login-walled captures = custom build.
- No element-crop, no auto-scroll for lazy-load images, no banner-dismissal heuristics. Full-page captures of cookie-banner-heavy sites will show the banner overlay. Custom build can dismiss common banners (Cookiebot, OneTrust, Quantcast).
- No proxy. Direct browser launch on Apify worker IP. Geo-restricted pages render with worker's region (typically US/EU).
- Screenshot retention is per-Apify-plan default โ typically 14 days on free tier, longer on paid. For permanent retention, copy via webhook to your own S3 / R2 / Backblaze.
- Filename is timestamp-keyed
screenshot_<domain>_<epochMs>โ repeated captures of the same URL produce DIFFERENT keys (no overwrite). Useful for archival; means key-value store grows unbounded โ manage retention yourself. titleis pagedocument.titleAT capture time โ for SPAs the title may still be the shell's default if hydration hasn't completed within the 4 s settle window.waitForSelectortimeout (10 s) is silent โ if the selector never appears, the actor proceeds with the current page state (caught.catch(() => {})).urls = []is silently accepted โ actor exits without pushing any records (only browser launch logs).
Who buys this actor
- Visual-regression QA engineers running nightly screenshot diffs against staging + prod to catch CSS regressions before users do.
- Competitive-intel teams archiving weekly snapshots of competitor landing pages (pricing, feature lists, hero copy) for deal-review decks.
- Content archival / journalism preserving webpage state for takedown resilience (source of truth when a page later changes or 404s).
- Link-preview / OG-image fallback services generating thumbnail cards for social feeds when the upstream page lacks proper
og:imagetags. - Brand / trademark monitoring capturing how your logo or copy is displayed on partner, affiliate, and unauthorized resale sites.
- MCP / LLM-agent tools giving an agent the ability to "see" a webpage when DOM-only context isn't enough.
Python example โ visual-regression diff
Capture the same set of paths twice (staging + prod) and flag any byte-size delta >5%:
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")pages =["/","/pricing","/docs","/blog","/login"]defcapture(base_url:str)->dict[str,dict]:run = client.actor("knotless_cadence/website-screenshot-scraper").call(run_input={"urls":[base_url + p for p in pages],"fullPage":True,"width":1440,"height":900,"format":"png",})items =list(client.dataset(run["defaultDatasetId"]).iterate_items())return{i["url"].replace(base_url,""): i for i in items}staging = capture("https://staging.example.com")prod = capture("https://www.example.com")for path in pages:s = staging.get(path,{}).get("fileSize",0)p = prod.get(path,{}).get("fileSize",0)if p andabs(s - p)/ p >0.05:print(f"โ {path} diff {((s-p)/p)*100:+.1f}% staging={s}B prod={p}B")print(f" {staging[path]['screenshotUrl']}")print(f" {prod[path]['screenshotUrl']}")
MCP / LLM-agent integration
tools =[{"name":"capture_webpage","description":"Take a screenshot of a webpage and return the image URL.","input_schema":{"type":"object","properties":{"url":{"type":"string"},"fullPage":{"type":"boolean","default":False},},"required":["url"],},}]defcapture_webpage(url:str, full_page:bool=False)->str:run = client.actor("knotless_cadence/website-screenshot-scraper").call(run_input={"urls":[url],"fullPage": full_page,"format":"png",})returnlist(client.dataset(run["defaultDatasetId"]).iterate_items())[0]["screenshotUrl"]
Pair with Claude Vision / GPT-4o for accessibility audits, brand-compliance checks, or end-to-end QA that tests "looks right" not just "DOM matches".
Common questions
Q: Can I capture a specific element instead of the whole page? A: Not in this actor. Workaround: capture full page, crop locally with Pillow / sharp using the element's bounding box from a companion DOM query. Available as a custom build (see Custom scraping below).
Q: How do I get screenshots at multiple breakpoints (320, 768, 1440 px) in one run?
A: Call the actor 3 times with different width. Native multi-viewport input is on the roadmap but not implemented yet.
Q: What about pages behind a login wall? A: Not supported in v1.0 โ the actor uses a fresh browser context per run with no cookie / session injection. Custom build with cookie / session-token injection available on request.
Q: Does this bypass Cloudflare or captchas? A: No. Standard headless Chromium fingerprint. Aggressive bot-protection (Cloudflare Turnstile, hCaptcha) will block the actor.
Q: Can I schedule this nightly? A: Yes โ Apify has native cron scheduling. Set the actor to run daily, pipe the output dataset to your webhook / Slack / S3 sync.
Q: How long do screenshots stay accessible? A: Per Apify plan defaults โ typically 14 days on free tier, longer on paid. For permanent retention, copy PNGs to your own S3 / R2 / Backblaze bucket via Apify webhook or a post-run script.
Visual / monitoring toolkit (companion actors)
| Tool | Purpose |
|---|---|
| Website Screenshot Scraper (this) | Capture any page visually |
| Website Uptime Checker | Monitor availability / latency |
| Broken Links Checker | Find 404s on your site |
| PageSpeed Insights Scraper | Lighthouse / Core Web Vitals |
| HTTP Headers Checker | Security-headers audit |
| Webpage Text Extractor | Clean article text from HTML |
| URL Expander | Resolve shortlink chains |
All 31 published actors free to inspect on Apify Store.
Custom scraping โ pilot tiers
Need element-crop, multi-viewport, login-walled captures, or a different schema (visual-diff metric, OCR'd text-overlay, automatic banner dismissal)? Three tiers:
- Pilot โ $97 ยท 1 actor, basic config, 7-day support. Good entry point โ useful for a single visual-regression pipeline or a one-off competitor archival sweep.
- Standard โ $297 ยท custom actor + Slack/email alerts on results, 30-day support. Most QA-automation and competitive-intel projects fit here.
- Premium โ $797 ยท custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (daily multi-breakpoint capture, brand-monitoring rollups).
Email: spinov001@gmail.com โ drop the URL list and the schema you need; quote within 48h.
Proof of work: 31 published Apify scrapers (78 total in portfolio) โ Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).
More tips: t.me/scraping_ai ยท blog.spinov.online
Disclaimer
Designed for QA, archival, and competitive-research use. Respect target-site Terms of Service, applicable data-protection law (GDPR, CCPA), and capture publicly accessible pages only. Not affiliated with any of the example domains shown.
Honest disclosure: 10 output fields per success record (url, title, screenshotKey, screenshotUrl, format, width, height, fullPage, fileSize, scrapedAt). All 8 input fields now exposed in INPUT_SCHEMA (UI form). Total minimum settle = 2000 ms hardcoded + waitTime (default total = 4000 ms). Sequential processing โ no parallelism. Per-URL errors push an error record and continue; browser-crash aborts run. No element-crop, no cookie / session injection, no auto-scroll for lazy-load, no Cloudflare / captcha bypass, no proxy. Wait strategy is domcontentloaded + soft load + fixed waitTime โ networkidle is intentionally avoided because it hangs on SSE / WebSocket sites.
