👁 Wayback Machine Archive Scraper avatar

Wayback Machine Archive Scraper

Pricing

$1.00 / 1,000 snapshot retrieveds

Wayback Machine Archive Scraper

Fetch historical snapshots of any webpage from the Internet Archive. Perfect for digital forensics and tracking deleted content.

Pricing

$1.00 / 1,000 snapshot retrieveds

Rating

0.0

(0)

Developer

👁 Andok

Andok

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Wayback Machine Scraper for Historical Snapshots

Retrieve historical web page snapshots from the Internet Archive for compliance checks, competitive due diligence, and content recovery. Feed it a list of URLs and get back every archived snapshot with timestamps, status codes, and archive links — or optionally fetch the full HTML of the latest snapshot. Built on the official Wayback CDX API for accurate, structured results.

Features

Bulk URL processing — check snapshot history for dozens of URLs in a single run
Date range filtering — narrow results to a specific time window with from and to parameters
Deduplication — collapse identical snapshots by digest to reduce noise
Status code filtering — only return snapshots with specific HTTP status codes (default: 200)
HTML retrieval — optionally fetch the archived HTML content for the most recent snapshot
Concurrent processing — configurable parallelism for faster batch runs
Structured metadata — every snapshot includes timestamp, original URL, MIME type, and archive URL

Input

Field	Type	Required	Default	Description
`urls`	`array`	Yes	`["https://example.com"]`	List of URLs to look up in the Wayback Machine
`url`	`string`	No	—	Single URL (backwards compatible, merged with `urls`)
`from`	`string`	No	—	Start date for snapshot range (format: `YYYY` or `YYYYMMDDhhmmss`)
`to`	`string`	No	—	End date for snapshot range (format: `YYYY` or `YYYYMMDDhhmmss`)
`limit`	`integer`	No	`50`	Maximum snapshots to return per URL (1-5000)
`collapse`	`string`	No	`digest`	Collapse parameter to deduplicate snapshots (e.g. `digest`, `timestamp:8`)
`filterStatus`	`string`	No	`statuscode:200`	HTTP status filter for snapshots (e.g. `statuscode:200`)
`includeHtml`	`boolean`	No	`false`	Fetch the archived HTML content for the latest snapshot (experimental)
`timeoutSeconds`	`integer`	No	`20`	Per-request timeout in seconds (1-120)
`concurrency`	`integer`	No	`5`	Number of URLs to process in parallel (1-25)

Input Example

{
"urls":["https://example.com","https://news.ycombinator.com"],
"from":"2023",
"to":"2025",
"limit":10,
"includeHtml":false
}

Output

Each dataset item represents one input URL with its snapshot history. Key fields:

inputUrl (string) — the URL that was looked up
snapshotCount (number) — total number of matching snapshots found
snapshots (array) — list of snapshot objects with timestamp, original, statuscode, mimetype, length, and archiveUrl
latestSnapshot (object) — the most recent snapshot, or null if none found
latestHtml (string) — archived HTML content (only when includeHtml is enabled)
checkedAt (string) — ISO timestamp of when the check was performed
error (string) — error message if the lookup failed, otherwise null

Output Example

{
"inputUrl":"https://example.com",
"snapshotCount":3,
"snapshots":[
{
"timestamp":"20250110153022",
"original":"https://example.com",
"statuscode":200,
"mimetype":"text/html",
"length":1256,
"archiveUrl":"https://web.archive.org/web/20250110153022/https://example.com"
}
],
"latestSnapshot":{
"timestamp":"20250110153022",
"original":"https://example.com",
"statuscode":200,
"mimetype":"text/html",
"length":1256,
"archiveUrl":"https://web.archive.org/web/20250110153022/https://example.com"
},
"latestHtml":null,
"checkedAt":"2025-01-20T12:00:00.000Z",
"error":null
}

Pricing

Event	Cost
Snapshot Retrieved	Pay-per-event (see actor pricing page)

Use Cases

Compliance & legal — retrieve historical versions of terms of service, privacy policies, or product pages
Competitive due diligence — review how a competitor's website evolved over time before a deal or partnership
Content recovery — recover lost or deleted web pages from the Internet Archive
SEO auditing — check when a page was last crawled and compare historical content changes
Brand monitoring — verify historical claims or track how a brand's messaging changed
Research & journalism — access archived versions of news articles or government pages

Related Actors

Actor	What it adds
Google News Scraper	Monitor current news coverage alongside historical archive lookups
Broken Links Checker	Find dead links on your site, then recover them via Wayback Machine
Sitemap Extractor	Extract all URLs from a sitemap to feed into bulk Wayback lookups

Notes

The Wayback Machine CDX API is free but may throttle under heavy load. Use the concurrency setting conservatively for large batches.
The includeHtml option is experimental and may fail for very large pages or pages with complex JavaScript rendering.

Internet Archive & Wayback Machine Scraper

cloud9_ai/internet-archive-scraper

Search Internet Archive and check Wayback Machine snapshots. Access 800B+ archived pages, books, movies, audio. Search items, get metadata, or check URL archive history. No API key needed. For SEO, OSINT, legal, and research.

👁 User avatar

cloud9

👁 Wayback Machine Scraper - Track Website Changes Over Time avatar

Wayback Machine Scraper - Track Website Changes Over Time

ryanclinton/wayback-machine-search

Search the Internet Archive's Wayback Machine for historical snapshots of any website. Retrieve archived page metadata -- including timestamps, URLs, MIME types, HTTP status codes, and content hashes -- for up to 10,000 snapshots per run.

👁 User avatar

Ryan Clinton

👁 Wayback Machine Search avatar

Wayback Machine Search

crawlerbros/wayback-machine-search

Query Internet Archive's Wayback Machine for historical snapshots of any URL or domain. Filter by date, HTTP status, MIME type, and deduplicate. Optionally fetch the archived page text. Free public CDX API, no authentication.

👁 User avatar

Crawler Bros

👁 Internet Archive Search — Wayback Machine Advanced Query Tool avatar

Internet Archive Search — Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support — date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

👁 User avatar

Maged

👁 Wayback Machine Snapshots Scraper — Internet Archive History avatar

Wayback Machine Snapshots Scraper — Internet Archive History

seemuapps/wayback-machine-snapshots-scraper

List every Internet Archive snapshot of a URL, page, or whole domain. Timestamp, snapshot URL, status code, mime type, content length. No login.

👁 User avatar

Andrew

👁 Wayback Machine Scraper avatar

Wayback Machine Scraper

gio21/wayback-machine-scraper

List Internet Archive Wayback Machine snapshots for one or more URLs. Returns timestamp, snapshot URL, HTTP status, MIME type, digest. Useful for tracking website changes over time, OSINT research, content recovery, and brand monitoring.

👁 User avatar

Gio

Wayback Machine Scraper

glassventures/wayback-machine-scraper

Scrape Wayback Machine archive snapshots for any URL or domain. Get archived URLs, timestamps, status codes, MIME types. Export to JSON, CSV, Excel.

👁 User avatar

Glass Ventures

👁 Internet Archive Items Scraper - archive.org Search by Query avatar

Internet Archive Items Scraper - archive.org Search by Query

gio21/archive-org-items-scraper

Search Internet Archive (archive.org) items: books, movies, audio, software, images, web archives, data. Returns title, creator, date, description, downloads, identifier, URLs. Free, no key. For research, content discovery, digital preservation.

👁 User avatar

Gio

Wayback Cdx Scraper

fortuitous_pirate/wayback-cdx-scraper

Scrape the Internet Archive Wayback Machine CDX index: find all archived snapshots of any URL with timestamps, HTTP status codes, and MIME types.

👁 User avatar

Fortuitous Pirate

👁 Wayback Machine Historical Content Scraper avatar

Wayback Machine Historical Content Scraper

happyfhantum/wayback-machine-historical-content-scraper

Compare archived website snapshots through the Wayback Machine and extract page-history change signals.

👁 User avatar

Kelsey Todd

4.0

URL: https://apify.com/andok/wayback-machine-scraper