VOOZH about

URL: https://apify.com/maximedupre/wayback-machine-search

โ‡ฑ Wayback Machine Search for Archive History ยท Apify


Pricing

from $0.90 / 1,000 saved archive results

Go to Apify Store

Wayback Machine Search

Search Wayback Machine snapshots for URLs, hosts, and domains. Export archive dates, status codes, MIME types, digests, content text, version timelines, reports, and monitoring alerts.

Pricing

from $0.90 / 1,000 saved archive results

Rating

0.0

(0)

Developer

๐Ÿ‘ Maxime Duprรฉ

Maxime Duprรฉ

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

๐Ÿ•ฐ๏ธ Wayback Machine search for archive history

Wayback Machine Search finds historical snapshots in the Internet Archive Wayback Machine for the URLs, hosts, or domains you submit. Use it to export archive dates, original URLs, HTTP status codes, MIME types, content digests, content length, optional archived page text, version timelines, Markdown reports, and monitoring alerts.

It is built for SEO audits, OSINT research, legal evidence checks, website change tracking, link-rot recovery, content history reviews, and scheduled archive monitoring. You can start with one URL such as https://example.com/, a bare domain such as example.com, or up to 50 targets in one run. No Wayback Machine API key, cookies, login, or user proxy setup is required.

๐Ÿ”Ž What this Actor does

  • Searches the Wayback Machine CDX index for exact URLs, URL prefixes, hosts, or full domains.
  • Filters archive rows by date range, HTTP status code, and MIME type.
  • Saves raw snapshot rows with source-backed archive metadata.
  • Collapses repeated captures by content digest or by month, day, or hour.
  • Finds snapshots closest to a target date and adds distance in days.
  • Optionally fetches readable archived page text for a capped number of snapshots.
  • Emits deterministic change evidence from status, digest, length, and fetched text changes.
  • Builds version timeline rows when you want a compact history.
  • Generates a Markdown report in report mode.
  • Supports monitoring mode with alert rows for new archive rows, status changes, content changes, or removed/restored signals.

This Actor searches archive data. It does not crawl the live web, create new Wayback captures, perform visual screenshot diffs, use AI summaries, or promise complete archive coverage. Availability depends on what the Internet Archive has stored.

๐Ÿ“ฆ Data you get

Snapshot rows can include:

  • target - submitted URL or domain that produced the row
  • originalUrl - original archived URL from the Wayback capture
  • waybackTimestamp and archiveDate - source timestamp and ISO date
  • statusCode, mimeType, contentDigest, and contentLength
  • contentStatus and optional content when archived text is fetched
  • distanceFromTargetDays for closest-date evidence
  • change evidence with the previous timestamp and source-backed reason

Version rows group consecutive captures into timeline intervals. Summary rows report per-target coverage, counts, date range, discovered paths, subdomains, and emails found in fetched content. Alert rows appear in monitoring mode only when the selected mechanical alert rule is met.

You can export the dataset as JSON, CSV, Excel, XML, RSS, or HTML, or consume the rows through the Apify API, schedules, webhooks, and integrations.

๐Ÿš€ How to run it

  1. Add one or more URLs or domains in URLs or domains.
  2. Choose Archive scope: exact URL, URL prefix, same host, or same domain and subdomains.
  3. Set optional date, status, and MIME filters.
  4. Pick an output mode: raw snapshots, changed snapshots, timeline, closest snapshot to date, Markdown report, or monitoring delta.
  5. Keep Collapse snapshots by on content digest for compact results, or choose every snapshot for full raw history.
  6. Turn on Fetch archived page text only when you need readable content or phrase search evidence.
  7. Run the Actor and open the dataset or optional Markdown report.

For a small first run, use:

{
"targets":["example.com"],
"matchType":"domain",
"maxResults":10,
"statusFilter":"200",
"mimeFilter":"text/html",
"outputMode":"snapshots",
"collapseBy":"digest",
"includeContent":false
}

โš™๏ธ Input options

targets is required and accepts up to 50 URLs or domains.

matchType controls how broadly each target is searched. Use exact URL for one page, prefix for a path, host for one hostname, and domain when subdomains should be included.

maxResults limits saved snapshot, version, or alert rows per target. The maximum is 10,000.

dateFrom and dateTo accept YYYY, YYYYMM, or YYYYMMDD. statusFilter accepts a status such as 200 or 404. mimeFilter accepts a type such as text/html.

outputMode changes the result shape. Use raw snapshots for exports, timeline for version intervals, closest snapshot to date for evidence work, report for a Markdown summary, and monitoring for scheduled archive checks.

includeContent, maxContentFetch, and historyQuery control archived text fetching. Content fetching is capped so large archive searches do not fetch every historical page by accident.

๐Ÿงพ Output example

{
"recordType":"snapshot",
"target":"example.com",
"originalUrl":"https://example.com/pricing",
"waybackTimestamp":"20240510123045",
"archiveDate":"2024-05-10T12:30:45.000Z",
"statusCode":200,
"mimeType":"text/html",
"contentDigest":"M5W6TLBPLQWJXTQWJ2R5XQ7Y3YQK4K6L",
"contentLength":18432,
"contentStatus":"notRequested",
"content":null,
"distanceFromTargetDays":null,
"change":{
"changed":true,
"type":"digestChange",
"previousArchiveDate":"2024-04-01T08:15:30.000Z",
"previousWaybackTimestamp":"20240401081530",
"evidence":["Digest changed from ABC123 to M5W6TLBPLQWJXTQWJ2R5XQ7Y3YQK4K6L"]
},
"version":null,
"diff":null,
"summary":null,
"alert":null
}

๐Ÿ’ณ Pricing

This Actor uses pay-per-event pricing. You are charged for each saved successful Wayback result: snapshot, version, summary, or monitoring alert. Empty archive searches, invalid inputs, skipped content fetches, and source issues do not create dataset rows.

The planned pricing starts at $0.0018 per saved archive result on the Free tier and goes down to $0.0009 per saved archive result on higher tiers. Always check the Actor Pricing tab before starting a large run.

โš ๏ธ Limits and caveats

  • The Internet Archive may not have snapshots for every page or date.
  • A successful run can return zero rows when no matching archive data exists.
  • Archive text can be unavailable, non-HTML, capped, or skipped by the content fetch limit.
  • Monitoring compares the latest saved archive state for the same target and filters. It is based on Wayback captures, not live website polling.
  • Change labels are mechanical and source-backed. They do not claim semantic meaning such as a product, legal, or pricing change unless the returned text evidence shows it.

โ“ FAQ

โ“ Does this use the official Wayback Machine?

It reads public Internet Archive Wayback Machine data through the CDX index and archived playback pages.

๐Ÿ”‘ Do I need a Wayback Machine API key?

No. The Actor does not ask for a Wayback Machine API key, cookies, or login.

๐Ÿ“ก Can it monitor a live website?

It monitors changes in Wayback Machine archive captures. It does not poll the current live page independently.

๐Ÿ”— Why is there no archiveUrl field?

Rows keep originalUrl and waybackTimestamp. A playback URL can be reconstructed as https://web.archive.org/web/{waybackTimestamp}/{originalUrl} when you need to open the archived page.

๐Ÿ“ Changelog

  • 0.1: Initial release.

๐Ÿ†˜ Support

For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h ๐Ÿซก

๐Ÿ”— Other actors

Made with โค๏ธ by Maxime Duprรฉ

You might also like

Wayback Machine Search

crawlerbros/wayback-machine-search

Query Internet Archive's Wayback Machine for historical snapshots of any URL or domain. Filter by date, HTTP status, MIME type, and deduplicate. Optionally fetch the archived page text. Free public CDX API, no authentication.

Wayback Machine Bulk Lookup

jungle_synthesizer/wayback-machine-bulk-lookup

Look up Wayback Machine snapshots for any URL or list of URLs. Returns capture timeline, optional snapshot markdown, and live-vs-snapshot diff. Date range filtering, capture limit, bulk input. Built for OSINT, journalism, SEO link-rot recovery, and legal evidence.

๐Ÿ‘ User avatar

BowTiedRaccoon

2

Wayback Machine Scraper - Track Website Changes Over Time

ryanclinton/wayback-machine-search

Search the Internet Archive's Wayback Machine for historical snapshots of any website. Retrieve archived page metadata -- including timestamps, URLs, MIME types, HTTP status codes, and content hashes -- for up to 10,000 snapshots per run.

71

Expired Domains Scraper

martin1080p/expired-domains-scraper

The Expired Domains Scraper automates finding valuable expired domains from expireddomains.com, offering filters and sorting by SEO metrics and auction details for efficient domain acquisition.

267

1.0

(4)

Influencer Brand Safety Intelligence MCP Server

ryanclinton/influencer-brand-safety-intelligence-mcp

Creator vetting and brand risk intelligence via the Model Context Protocol.

Internet Archive Search โ€” Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support โ€” date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

SEO ZOMBIE SLAYER

actor_researcher.48/seo-zombie-slayer

SEO Zombie Slayer crawls websites to hunt dead links, SEO issues, performance problems, and security risks. Includes Lighthouse audits, competitor comparison, and a fun game mode with XP, levels, and boss battles to turn SEO audits into action

6

5.0

(1)

Automated reconnaissance actor for bug bounty hunters

wonderful_beluga/automated-reconnaissance-actor-for-bug-bounty-hunters

This Apify actor automates bug bounty recon by scraping the Wayback Machine and GitHub for legacy attack surfaces. It extracts historical URLs, public code, and deprecated files, parsing them to uncover hidden subdomains and forgotten API endpoints. The findings are saved into structured JSON files.

๐Ÿ‘ User avatar

Zaher el siddik

2