VOOZH about

URL: https://apify.com/fetchcraft/universal-web-snapshot

⇱ Universal Web Snapshot - HTML, Text, Markdown Capture Β· Apify


πŸ‘ Universal Web Snapshot - HTML, Text, Markdown Capture avatar

Universal Web Snapshot - HTML, Text, Markdown Capture

Pricing

$3.00 / 1,000 snapshot takens

Go to Apify Store

Universal Web Snapshot - HTML, Text, Markdown Capture

Capture a clean HTML, plain-text, or markdown snapshot of any URL. Built for archival, change detection, and downstream LLM input. Stores rendered title, final URL, status, fetched-at timestamp. $0.003 per snapshot. Free preview run.

Pricing

$3.00 / 1,000 snapshot takens

Rating

0.0

(0)

Developer

πŸ‘ Emily Ward

Emily Ward

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

11 days ago

Last modified

Share

Universal Web Snapshot

πŸ‘ Universal Web Snapshot - HTML, Text, Markdown Capture

Snapshot any URL via a connector chain: static HTML first, Playwright browser if static returns thin content, Wayback Machine if both fail.

Pure scraper. No API keys. No LLM dependency. Designed to be robust against the three most common scraping failure modes:

Failure modeTier that handles it
Site returns HTML but it's a JS-only SPA (Notion, Linear, Figma)Playwright browser
Site blocks datacenter IPsApify proxy routing
Site is down right now but has Wayback snapshotsWayback Machine

What you get back per URL

FieldDescription
input_url, final_url, statusIdentity + HTTP response
tier_used"static" / "browser" / "wayback" - tells you which connector won
archived_atIf Wayback was used, the snapshot timestamp
signals.title, og_title, meta_description, og_description, canonical_urlStandard meta fields
signals.h1[], h2[]First N headings (typically your hero copy)
signals.json_ld[]Any JSON-LD structured data the page exposes
text_excerpt (first 2000 chars), text_full (up to 20k), text_lengthCleaned text content
screenshot_urlOptional PNG (Playwright tier only) saved to actor KV store
tries[]Per-tier result so you can see what was attempted
elapsed_msTotal processing time

Pricing

$0.05 per successful snapshot. Failed URLs (all three tiers exhausted) are not charged.

Use caseURLsCost
Quick competitor sweep20$1.00
Daily monitoring (100 URLs/day)3,000/mo$150/mo
One-off content audit500$25.00

How the connector chain works

1. Static fetch(cheap, fast) ─┐
β”œβ”€ HTTP200+>300 chars β”‚ Done. tier_used="static".
└─ Thin/blocked/missing content ──
β”‚
2. Playwright browser(heavier) ──
β”œβ”€ Renders JS, waits for network β”‚ Done. tier_used="browser".
└─ Browser failure / still thin ──
β”‚
3. Wayback Machine(last resort) ──
β”œβ”€ Finds archived snapshot β”‚ Done. tier_used="wayback".
└─ No archive available β”€β”˜ Error returned.

The buyer is only charged for successful snapshots, regardless of how many tiers were tried.

Use cases

  • Pricing intel: monitor competitor pricing pages even when they are JS-rendered.
  • Content audits: snapshot 500 URLs across a domain, get clean text for analysis.
  • AI training data: prepare structured input for LLM pipelines from any web source.
  • Compliance / legal: archive a copy of pages with a Wayback timestamp for evidence.
  • SEO research: extract title + meta + h1/h2 across competitor sites.
  • Investor due diligence: snapshot a company's site at a point in time.

Why this is a connector / plugin architecture

The actor's src/lib/scraping.js exposes:

  • fetchStatic(url, opts) - tier 1 implementation
  • fetchBrowser(url, opts) - tier 2 implementation
  • fetchWayback(url, opts) - tier 3 implementation
  • smartFetch(url, opts) - the orchestrator
  • cleanHtml(html), extractSignals(html), normalizeUrl(input) - shared utilities

Any future scraper actor can import the same lib. This means a Pricing Watcher v2 (or any other actor that needs robust scraping) gets the same multi-tier fetch for free. The pattern lives once.

What this actor does NOT do

  • It does not log into authenticated sites.
  • It does not download non-HTML assets (PDFs, videos, etc).
  • It does not paginate within a single URL (use a separate crawler for that).

Tags

scraping web-snapshot wayback playwright connector static-fetch headless-chrome content-extraction seo


Made by Emily Ward, Cancel Costs.

Pairs well with

Integrations

This actor works out of the box with every Apify-supported integration:

  • API: call via Apify API or any official SDK (Python, JavaScript, PHP, .NET). Returns a clean dataset URL.
  • Schedule: set a daily, weekly, or custom cron cadence in Apify Console. Combine with notification for fresh feeds.
  • Webhooks: wire ACTOR.RUN.SUCCEEDED to Slack, Discord, Zapier, Make, n8n, Pipedream, or any HTTPS endpoint.
  • MCP: this actor is discoverable through Apify's hosted MCP server at mcp.apify.com for Claude, Cursor, Cline, Windsurf, and other MCP clients.
  • n8n / Make / Zapier: native HTTP-Request integration. Trigger the actor on schedule, pipe results to Google Sheets, Airtable, your CRM, or any database.

Try it free

Every Apify user gets $5/month in free platform credits (around 250 events at this actor's per-event price). Run preview mode first to confirm output shape before scaling.

New to Apify? Sign up here to get free credits on signup.

What's New

  • 2026-06-03: Metadata, categories, and SEO refreshed. Latest version live on Apify Store.

Last Updated

2026-06-03

You might also like

Wayback Machine Bulk Lookup

jungle_synthesizer/wayback-machine-bulk-lookup

Look up Wayback Machine snapshots for any URL or list of URLs. Returns capture timeline, optional snapshot markdown, and live-vs-snapshot diff. Date range filtering, capture limit, bulk input. Built for OSINT, journalism, SEO link-rot recovery, and legal evidence.

πŸ‘ User avatar

BowTiedRaccoon

2

Website to Markdown Crawler for LLM & RAG

logiover/website-text-markdown-crawler

Crawl any website to clean Markdown and plain text for LLM training and RAG. HTML to Markdown, no API or login. Export website text to CSV or JSON.

Wayback Machine Snapshots Scraper β€” Internet Archive History

seemuapps/wayback-machine-snapshots-scraper

List every Internet Archive snapshot of a URL, page, or whole domain. Timestamp, snapshot URL, status code, mime type, content length. No login.

Markdown Anything β€” URL to Markdown

s-r/markdown-anything

Convert any URL to clean markdown using a 3-provider fallback chain. Batch input, high concurrency.

Web to Markdown for LLMs

george.the.developer/web-to-markdown-llm

Convert any URL to clean LLM-ready markdown. 60-70% fewer tokens than raw HTML. Built for AI agents and RAG pipelines.

Smart Page Fetcher β€” HTML, Markdown & Text

shelvick/smart-page-fetcher

Fetch a batch of URLs and get the page as HTML, Markdown, or clean text. Tries plain HTTP first, renders JavaScript in a real browser when needed, and escalates to stealth + residential proxy for Cloudflare-protected, bot-defended pages, per URL. Pay only for the difficulty each URL needed.

4

URL to markdown

apify/url-to-markdown

An Apify Actor that takes a URL as input and returns the content of the page in Markdown format.