VOOZH about

URL: https://apify.com/jungle_synthesizer/wayback-machine-bulk-lookup

โ‡ฑ Wayback Machine Bulk Lookup ยท Apify


Pricing

Pay per event

Go to Apify Store

Wayback Machine Bulk Lookup

Look up Wayback Machine snapshots for any URL or list of URLs. Returns capture timeline, optional snapshot markdown, and live-vs-snapshot diff. Date range filtering, capture limit, bulk input. Built for OSINT, journalism, SEO link-rot recovery, and legal evidence.

Pricing

Pay per event

Rating

0.0

(0)

Developer

๐Ÿ‘ BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

21 days ago

Last modified

Share

Look up Wayback Machine (archive.org) snapshots for any URL or list of URLs. Returns the full capture timeline, optional snapshot HTML-to-markdown content, and a live-vs-snapshot text diff. Built for OSINT analysts, journalists verifying sources, SEO teams recovering link-rot, and legal evidence collection.


What this actor does

For each input URL, the actor:

  1. Queries the Wayback CDX API to retrieve the snapshot index in your specified date range and capture limit
  2. Optionally fetches snapshot HTML for each capture and converts it to markdown (for reading or archiving)
  3. Optionally fetches the current live URL and computes a line-level text diff against the most recent snapshot (to detect page changes)

Each output record contains the full snapshot timeline plus optional diff and content fields.


Input

FieldTypeDefaultDescription
urlsarray of stringsโ€”Required. URLs to look up in the Wayback Machine
maxItemsintegerโ€”Maximum total output records across all URLs
dateFromstringโ€”Earliest snapshot date to include (ISO date, e.g. 2020-01-01)
dateTostringโ€”Latest snapshot date to include (ISO date, e.g. 2024-12-31)
captureLimitinteger100Max snapshots per URL
fetchSnapshotContentbooleanfalseDownload snapshot HTML and convert to markdown
diffWithLivebooleanfalseCompute text diff between latest snapshot and current live URL
proxyConfigurationobjectnoneOptional proxy config (usually not needed for Wayback)

Example input:

{
"urls":[
"https://example.com/news/2024-article",
"https://example.com/about"
],
"dateFrom":"2023-01-01",
"dateTo":"2024-12-31",
"captureLimit":50,
"diffWithLive":true
}

Output

One record per input URL.

FieldTypeDescription
urlstringThe input URL
snapshotCountnumberNumber of snapshots found in the date range
firstCapturedstringEarliest snapshot timestamp (ISO 8601)
lastCapturedstringLatest snapshot timestamp (ISO 8601)
capturesarraySnapshot entries โ€” each a JSON-encoded string with timestamp, archiveUrl, status, mimetype, and optionally contentMarkdown
diffobject{ addedLines, removedLines, changedRatio } โ€” only present when diffWithLive=true
liveStatusnumberCurrent HTTP status of the live URL โ€” only present when diffWithLive=true
finalLiveUrlstringFinal URL after redirects
statusstringsuccess, timeout, or error
errorMsgstringError details on failure, null on success

Example output record:

{
"url":"https://example.com/news/2024-article",
"snapshotCount":14,
"firstCaptured":"2024-03-12T08:42:00Z",
"lastCaptured":"2026-04-29T22:11:00Z",
"captures":[
"{\"timestamp\":\"2026-04-29T22:11:00Z\",\"archiveUrl\":\"https://web.archive.org/web/20260429221100/https://example.com/news/2024-article\",\"status\":200,\"mimetype\":\"text/html\"}"
],
"diff":{"addedLines":12,"removedLines":3,"changedRatio":0.04},
"liveStatus":200,
"finalLiveUrl":"https://example.com/news/2024-article",
"status":"success",
"errorMsg":null
}

Dataset views

The actor produces two dataset views in the Apify console:

  • Capture Timeline โ€” url, snapshotCount, firstCaptured, lastCaptured, captures
  • Live vs Snapshot Diff โ€” url, liveStatus, diff, lastCaptured

Rate limits and performance

The actor respects Wayback Machine's rate limits:

  • CDX API queries: ~10 requests/second (110ms minimum delay)
  • Snapshot content fetches: ~1-2 requests/second (700ms minimum delay)

For large batches with fetchSnapshotContent=true, expect longer runtimes. The default timeout is 2 hours. Start with a small captureLimit (e.g. 10) to estimate runtime before running at full scale.


Use cases

  • OSINT / research: Check whether a source URL existed, when it was captured, and how its content has changed
  • Journalism: Verify archived versions of articles or government pages for fact-checking
  • SEO / link-rot recovery: Find archived versions of dead inbound links and plan redirects or outreach
  • Legal evidence: Retrieve timestamped snapshots of web pages for documentation
  • Web archiving: Bulk-check coverage for a list of URLs before deeper archiving work

You might also like

Wayback Machine Scraper

gio21/wayback-machine-scraper

List Internet Archive Wayback Machine snapshots for one or more URLs. Returns timestamp, snapshot URL, HTTP status, MIME type, digest. Useful for tracking website changes over time, OSINT research, content recovery, and brand monitoring.

Wayback Machine Historical Content Scraper

happyfhantum/wayback-machine-historical-content-scraper

Compare archived website snapshots through the Wayback Machine and extract page-history change signals.

89

4.0

Wayback Machine CDX Bulk Extractor

automation-lab/wayback-machine-cdx-extractor

Bulk extract archived snapshot metadata from the Wayback Machine CDX API. Get every crawled URL, timestamp, HTTP status code, MIME type, and content digest for any domain or URL pattern. Export to JSON, CSV, or Excel.

๐Ÿ‘ User avatar

Stas Persiianenko

7

Wayback Machine Scraper - Track Website Changes Over Time

ryanclinton/wayback-machine-search

Search the Internet Archive's Wayback Machine for historical snapshots of any website. Retrieve archived page metadata -- including timestamps, URLs, MIME types, HTTP status codes, and content hashes -- for up to 10,000 snapshots per run.

74

Wayback Machine Search

maximedupre/wayback-machine-search

Search Wayback Machine snapshots for URLs, hosts, and domains. Export archive dates, status codes, MIME types, digests, content text, version timelines, reports, and monitoring alerts.

๐Ÿ‘ User avatar

Maxime Duprรฉ

2

Wayback Machine Search

crawlerbros/wayback-machine-search

Query Internet Archive's Wayback Machine for historical snapshots of any URL or domain. Filter by date, HTTP status, MIME type, and deduplicate. Optionally fetch the archived page text. Free public CDX API, no authentication.

Wayback Machine Snapshots Scraper โ€” Internet Archive History

seemuapps/wayback-machine-snapshots-scraper

List every Internet Archive snapshot of a URL, page, or whole domain. Timestamp, snapshot URL, status code, mime type, content length. No login.