👁 Universal Web Snapshot - HTML, Text, Markdown Capture avatar

Universal Web Snapshot - HTML, Text, Markdown Capture

Pricing

$3.00 / 1,000 snapshot takens

👁 Universal Web Snapshot - HTML, Text, Markdown Capture

Universal Web Snapshot - HTML, Text, Markdown Capture

Capture a clean HTML, plain-text, or markdown snapshot of any URL. Built for archival, change detection, and downstream LLM input. Stores rendered title, final URL, status, fetched-at timestamp. $0.003 per snapshot. Free preview run.

Pricing

$3.00 / 1,000 snapshot takens

Rating

0.0

(0)

Developer

👁 Emily Ward

Emily Ward

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

11 days ago

Last modified

Universal Web Snapshot

👁 Universal Web Snapshot - HTML, Text, Markdown Capture

Snapshot any URL via a connector chain: static HTML first, Playwright browser if static returns thin content, Wayback Machine if both fail.

Pure scraper. No API keys. No LLM dependency. Designed to be robust against the three most common scraping failure modes:

Failure mode	Tier that handles it
Site returns HTML but it's a JS-only SPA (Notion, Linear, Figma)	Playwright browser
Site blocks datacenter IPs	Apify proxy routing
Site is down right now but has Wayback snapshots	Wayback Machine

What you get back per URL

Field	Description
`input_url`, `final_url`, `status`	Identity + HTTP response
`tier_used`	"static" / "browser" / "wayback" - tells you which connector won
`archived_at`	If Wayback was used, the snapshot timestamp
`signals.title`, `og_title`, `meta_description`, `og_description`, `canonical_url`	Standard meta fields
`signals.h1[]`, `h2[]`	First N headings (typically your hero copy)
`signals.json_ld[]`	Any JSON-LD structured data the page exposes
`text_excerpt` (first 2000 chars), `text_full` (up to 20k), `text_length`	Cleaned text content
`screenshot_url`	Optional PNG (Playwright tier only) saved to actor KV store
`tries[]`	Per-tier result so you can see what was attempted
`elapsed_ms`	Total processing time

Pricing

$0.05 per successful snapshot. Failed URLs (all three tiers exhausted) are not charged.

Use case	URLs	Cost
Quick competitor sweep	20	$1.00
Daily monitoring (100 URLs/day)	3,000/mo	$150/mo
One-off content audit	500	$25.00

How the connector chain works

1. Static fetch(cheap, fast) ─┐
 ├─ HTTP200+>300 chars │ Done. tier_used="static".
 └─ Thin/blocked/missing content ─┤
 │
2. Playwright browser(heavier) ─┤
 ├─ Renders JS, waits for network │ Done. tier_used="browser".
 └─ Browser failure / still thin ─┤
 │
3. Wayback Machine(last resort) ─┤
 ├─ Finds archived snapshot │ Done. tier_used="wayback".
 └─ No archive available ─┘ Error returned.

The buyer is only charged for successful snapshots, regardless of how many tiers were tried.

Use cases

Pricing intel: monitor competitor pricing pages even when they are JS-rendered.
Content audits: snapshot 500 URLs across a domain, get clean text for analysis.
AI training data: prepare structured input for LLM pipelines from any web source.
Compliance / legal: archive a copy of pages with a Wayback timestamp for evidence.
SEO research: extract title + meta + h1/h2 across competitor sites.
Investor due diligence: snapshot a company's site at a point in time.

Why this is a connector / plugin architecture

The actor's src/lib/scraping.js exposes:

fetchStatic(url, opts) - tier 1 implementation
fetchBrowser(url, opts) - tier 2 implementation
fetchWayback(url, opts) - tier 3 implementation
smartFetch(url, opts) - the orchestrator
cleanHtml(html), extractSignals(html), normalizeUrl(input) - shared utilities

Any future scraper actor can import the same lib. This means a Pricing Watcher v2 (or any other actor that needs robust scraping) gets the same multi-tier fetch for free. The pattern lives once.

What this actor does NOT do

It does not log into authenticated sites.
It does not download non-HTML assets (PDFs, videos, etc).
It does not paginate within a single URL (use a separate crawler for that).

Pairs well with

pricing-page-watcher: Snapshot competitor pricing pages, diff over time. $0.005 per check.
shopify-store-detector: Snapshot Shopify pages and extract stack. $0.03 per store.
wordpress-stack-detector: Snapshot WP pages and detect plugins. $0.02 per site.

Integrations

This actor works out of the box with every Apify-supported integration:

API: call via Apify API or any official SDK (Python, JavaScript, PHP, .NET). Returns a clean dataset URL.
Schedule: set a daily, weekly, or custom cron cadence in Apify Console. Combine with notification for fresh feeds.
Webhooks: wire ACTOR.RUN.SUCCEEDED to Slack, Discord, Zapier, Make, n8n, Pipedream, or any HTTPS endpoint.
MCP: this actor is discoverable through Apify's hosted MCP server at mcp.apify.com for Claude, Cursor, Cline, Windsurf, and other MCP clients.
n8n / Make / Zapier: native HTTP-Request integration. Trigger the actor on schedule, pipe results to Google Sheets, Airtable, your CRM, or any database.

Try it free

Every Apify user gets $5/month in free platform credits (around 250 events at this actor's per-event price). Run preview mode first to confirm output shape before scaling.

New to Apify? Sign up here to get free credits on signup.

What's New

2026-06-03: Metadata, categories, and SEO refreshed. Latest version live on Apify Store.

Last Updated

2026-06-03

👁 Wayback Machine Bulk Lookup avatar

Wayback Machine Bulk Lookup

jungle_synthesizer/wayback-machine-bulk-lookup

Look up Wayback Machine snapshots for any URL or list of URLs. Returns capture timeline, optional snapshot markdown, and live-vs-snapshot diff. Date range filtering, capture limit, bulk input. Built for OSINT, journalism, SEO link-rot recovery, and legal evidence.

👁 User avatar

BowTiedRaccoon

👁 Website to Markdown Crawler for LLM & RAG avatar

Website to Markdown Crawler for LLM & RAG

logiover/website-text-markdown-crawler

Crawl any website to clean Markdown and plain text for LLM training and RAG. HTML to Markdown, no API or login. Export website text to CSV or JSON.

👁 User avatar

Logiover

👁 Wayback Machine Snapshots Scraper — Internet Archive History avatar

Wayback Machine Snapshots Scraper — Internet Archive History

seemuapps/wayback-machine-snapshots-scraper

List every Internet Archive snapshot of a URL, page, or whole domain. Timestamp, snapshot URL, status code, mime type, content length. No login.

👁 User avatar

Andrew

Website to Markdown for LLM and RAG

jeweled_jockstrap/my-actor-3

Convert any URL to clean Markdown text for AI applications. Strips HTML extracts content. For LLM training RAG pipelines and vector databases. Free Firecrawl alternative.

👁 User avatar

Juan Triviño

👁 Markdown Anything — URL to Markdown avatar

Markdown Anything — URL to Markdown

s-r/markdown-anything

Convert any URL to clean markdown using a 3-provider fallback chain. Batch input, high concurrency.

👁 User avatar

Chrome Web Store — Extension Metadata Snapshot

v0iddo/chrome-webstore-extensions

Snapshot Chrome Web Store extension metadata via the public listing page. One row per extension ID with title, developer, rating, user count, version, screenshots, last updated. JSON-LD + targeted HTML probes.

👁 User avatar

vøiddo

👁 Web to Markdown for LLMs avatar

Web to Markdown for LLMs

george.the.developer/web-to-markdown-llm

Convert any URL to clean LLM-ready markdown. 60-70% fewer tokens than raw HTML. Built for AI agents and RAG pipelines.

👁 User avatar

George Kioko

👁 Smart Page Fetcher — HTML, Markdown & Text avatar

Smart Page Fetcher — HTML, Markdown & Text

shelvick/smart-page-fetcher

Fetch a batch of URLs and get the page as HTML, Markdown, or clean text. Tries plain HTTP first, renders JavaScript in a real browser when needed, and escalates to stealth + residential proxy for Cloudflare-protected, bot-defended pages, per URL. Pay only for the difficulty each URL needed.

👁 User avatar

Scott Helvick

👁 URL to markdown avatar

URL to markdown

apify/url-to-markdown

An Apify Actor that takes a URL as input and returns the content of the page in Markdown format.

👁 User avatar

Apify

Smart Web Content Extractor for AI & LLM

project_bbb/smart-web-content-extractor

Crawl any website and extract clean, structured content optimized for LLM consumption. Outputs Markdown, plain text, or HTML with metadata. Removes nav, ads, and boilerplate automatically.

👁 User avatar

BBB & Company

URL: https://apify.com/fetchcraft/universal-web-snapshot