VOOZH about

URL: https://apify.com/seemuapps/wayback-machine-snapshots-scraper

โ‡ฑ Wayback Machine Snapshots Scraper โ€” Internet Archive History ยท Apify


๐Ÿ‘ Wayback Machine Snapshots Scraper โ€” Internet Archive History avatar

Wayback Machine Snapshots Scraper โ€” Internet Archive History

Pricing

from $1.00 / 1,000 archived snapshot returneds

Go to Apify Store

Wayback Machine Snapshots Scraper โ€” Internet Archive History

List every Internet Archive snapshot of a URL, page, or whole domain. Timestamp, snapshot URL, status code, mime type, content length. No login.

Pricing

from $1.00 / 1,000 archived snapshot returneds

Rating

0.0

(0)

Developer

๐Ÿ‘ Andrew

Andrew

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

12 days ago

Last modified

Share

List every Internet Archive snapshot of a URL, page, or whole domain โ€” with timestamp, snapshot URL, status code, mime type, and content length. No login.

What you get

  • Every archived capture of a URL since the page first hit the Wayback Machine
  • Direct snapshot URLs (https://web.archive.org/web/{timestamp}/{url}) โ€” paste straight into a browser
  • HTTP status code, MIME type, and byte size for each capture
  • Content digest, so you can dedupe identical captures and only see when the page actually changed
  • Date-range, status-code, and MIME-type filters
  • Match modes: exact URL, prefix, hostname, or whole domain (covers subdomains)
  • Cursor-based pagination โ€” fetch unlimited captures across multiple runs
  • Direct export to JSON, CSV, Excel, or Google Sheets

Use cases

  • SEO and competitive intel โ€” track when a competitor changed their pricing, copy, or layout
  • OSINT โ€” recover deleted or modified pages, track changes over time
  • Broken-link recovery โ€” find the most recent working snapshot of a 404'd page
  • Content audit โ€” list every URL ever archived for a domain (subdomains included)
  • Compliance and legal โ€” produce a timeline of what a site looked like on a given date

How to use

  1. Enter a URL (e.g. example.com, https://example.com/page)
  2. Choose a Match Type:
    • Exact โ€” only this URL
    • Prefix โ€” this URL and everything below
    • Host โ€” every URL on this hostname
    • Domain โ€” every URL across the whole domain and its subdomains
  3. Optionally filter by Date from / Date to (YYYY-MM-DD), HTTP status code (e.g. 200), or MIME type (e.g. text/html)
  4. Toggle Collapse duplicate captures to dedupe by content digest (recommended)
  5. Set Max snapshots (default 1000; 0 for unlimited)
  6. Run the actor โ€” one snapshot per row in the Dataset tab
  7. To fetch more snapshots, open the Key-value store tab โ†’ copy the NEXT_PAGE_ID value โ†’ paste it into Page ID on your next run

Output format

One snapshot per dataset row โ€” perfect for direct CSV, Excel, or Google Sheets export:

{
"timestamp":"20231215120000",
"archivedAt":"2023-12-15T12:00:00.000Z",
"originalUrl":"http://example.com/",
"snapshotUrl":"https://web.archive.org/web/20231215120000/http://example.com/",
"statusCode":200,
"mimeType":"text/html",
"contentLength":1234,
"digest":"ABC123XYZ"
}

Pagination

Big sites can have hundreds of thousands of snapshots. The actor saves a resume cursor (the Internet Archive's CDX resume key) to the default Key-value store under NEXT_PAGE_ID.

  1. Open the Key-value store tab on the run page
  2. Copy the value of NEXT_PAGE_ID
  3. Start a new run and paste it into Page ID

When NEXT_PAGE_ID is null, all snapshots have been fetched.

Input options

FieldTypeDescription
URLstringURL or domain to look up (required)
Match TypeenumExact / Prefix / Host / Domain
Date fromstringYYYY-MM-DD UTC โ€” optional
Date tostringYYYY-MM-DD UTC โ€” optional
HTTP status codestringFilter to one HTTP status, e.g. 200
MIME typestringFilter by content type, e.g. text/html
Collapse duplicatesbooleanDedupe by content digest โ€” default on
Max snapshotsintegerCap per run โ€” default 1000, 0 for unlimited
Page IDstringNEXT_PAGE_ID from the previous run, to resume pagination

You might also like

Wayback Machine Scraper

gio21/wayback-machine-scraper

List Internet Archive Wayback Machine snapshots for one or more URLs. Returns timestamp, snapshot URL, HTTP status, MIME type, digest. Useful for tracking website changes over time, OSINT research, content recovery, and brand monitoring.

Wayback Machine Search

crawlerbros/wayback-machine-search

Query Internet Archive's Wayback Machine for historical snapshots of any URL or domain. Filter by date, HTTP status, MIME type, and deduplicate. Optionally fetch the archived page text. Free public CDX API, no authentication.

Wayback Machine CDX URL List Scraper

parseforge/wayback-cdx-scraper

Pull every archived URL the Internet Archive has captured for any domain or URL prefix. Get timestamps, MIME types, status codes, content digests, and direct snapshot links. Filter by date range, status, MIME, and uniqueness. Export to JSON, CSV, or Excel for SEO recovery and competitive research.

Wayback Machine Archive Scraper

andok/wayback-machine-scraper

Fetch historical snapshots of any webpage from the Internet Archive. Perfect for digital forensics and tracking deleted content.

Wayback Machine Scraper - Track Website Changes Over Time

ryanclinton/wayback-machine-search

Search the Internet Archive's Wayback Machine for historical snapshots of any website. Retrieve archived page metadata -- including timestamps, URLs, MIME types, HTTP status codes, and content hashes -- for up to 10,000 snapshots per run.

71

Internet Archive Search โ€” Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support โ€” date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

Wayback Machine CDX Bulk Extractor

automation-lab/wayback-machine-cdx-extractor

Bulk extract archived snapshot metadata from the Wayback Machine CDX API. Get every crawled URL, timestamp, HTTP status code, MIME type, and content digest for any domain or URL pattern. Export to JSON, CSV, or Excel.

๐Ÿ‘ User avatar

Stas Persiianenko

7