Wayback Machine Search

Pricing

from $0.90 / 1,000 saved archive results

Wayback Machine Search

Search Wayback Machine snapshots for URLs, hosts, and domains. Export archive dates, status codes, MIME types, digests, content text, version timelines, reports, and monitoring alerts.

Pricing

from $0.90 / 1,000 saved archive results

Rating

0.0

(0)

Developer

👁 Maxime Dupré

Maxime Dupré

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

🕰️ Wayback Machine search for archive history

Wayback Machine Search finds historical snapshots in the Internet Archive Wayback Machine for the URLs, hosts, or domains you submit. Use it to export archive dates, original URLs, HTTP status codes, MIME types, content digests, content length, optional archived page text, version timelines, Markdown reports, and monitoring alerts.

It is built for SEO audits, OSINT research, legal evidence checks, website change tracking, link-rot recovery, content history reviews, and scheduled archive monitoring. You can start with one URL such as https://example.com/, a bare domain such as example.com, or up to 50 targets in one run. No Wayback Machine API key, cookies, login, or user proxy setup is required.

🔎 What this Actor does

Searches the Wayback Machine CDX index for exact URLs, URL prefixes, hosts, or full domains.
Filters archive rows by date range, HTTP status code, and MIME type.
Saves raw snapshot rows with source-backed archive metadata.
Collapses repeated captures by content digest or by month, day, or hour.
Finds snapshots closest to a target date and adds distance in days.
Optionally fetches readable archived page text for a capped number of snapshots.
Emits deterministic change evidence from status, digest, length, and fetched text changes.
Builds version timeline rows when you want a compact history.
Generates a Markdown report in report mode.
Supports monitoring mode with alert rows for new archive rows, status changes, content changes, or removed/restored signals.

This Actor searches archive data. It does not crawl the live web, create new Wayback captures, perform visual screenshot diffs, use AI summaries, or promise complete archive coverage. Availability depends on what the Internet Archive has stored.

📦 Data you get

Snapshot rows can include:

target - submitted URL or domain that produced the row
originalUrl - original archived URL from the Wayback capture
waybackTimestamp and archiveDate - source timestamp and ISO date
statusCode, mimeType, contentDigest, and contentLength
contentStatus and optional content when archived text is fetched
distanceFromTargetDays for closest-date evidence
change evidence with the previous timestamp and source-backed reason

Version rows group consecutive captures into timeline intervals. Summary rows report per-target coverage, counts, date range, discovered paths, subdomains, and emails found in fetched content. Alert rows appear in monitoring mode only when the selected mechanical alert rule is met.

You can export the dataset as JSON, CSV, Excel, XML, RSS, or HTML, or consume the rows through the Apify API, schedules, webhooks, and integrations.

🚀 How to run it

Add one or more URLs or domains in URLs or domains.
Choose Archive scope: exact URL, URL prefix, same host, or same domain and subdomains.
Set optional date, status, and MIME filters.
Pick an output mode: raw snapshots, changed snapshots, timeline, closest snapshot to date, Markdown report, or monitoring delta.
Keep Collapse snapshots by on content digest for compact results, or choose every snapshot for full raw history.
Turn on Fetch archived page text only when you need readable content or phrase search evidence.
Run the Actor and open the dataset or optional Markdown report.

For a small first run, use:

{
"targets":["example.com"],
"matchType":"domain",
"maxResults":10,
"statusFilter":"200",
"mimeFilter":"text/html",
"outputMode":"snapshots",
"collapseBy":"digest",
"includeContent":false
}

⚙️ Input options

targets is required and accepts up to 50 URLs or domains.

matchType controls how broadly each target is searched. Use exact URL for one page, prefix for a path, host for one hostname, and domain when subdomains should be included.

maxResults limits saved snapshot, version, or alert rows per target. The maximum is 10,000.

dateFrom and dateTo accept YYYY, YYYYMM, or YYYYMMDD. statusFilter accepts a status such as 200 or 404. mimeFilter accepts a type such as text/html.

outputMode changes the result shape. Use raw snapshots for exports, timeline for version intervals, closest snapshot to date for evidence work, report for a Markdown summary, and monitoring for scheduled archive checks.

includeContent, maxContentFetch, and historyQuery control archived text fetching. Content fetching is capped so large archive searches do not fetch every historical page by accident.

🧾 Output example

{
"recordType":"snapshot",
"target":"example.com",
"originalUrl":"https://example.com/pricing",
"waybackTimestamp":"20240510123045",
"archiveDate":"2024-05-10T12:30:45.000Z",
"statusCode":200,
"mimeType":"text/html",
"contentDigest":"M5W6TLBPLQWJXTQWJ2R5XQ7Y3YQK4K6L",
"contentLength":18432,
"contentStatus":"notRequested",
"content":null,
"distanceFromTargetDays":null,
"change":{
"changed":true,
"type":"digestChange",
"previousArchiveDate":"2024-04-01T08:15:30.000Z",
"previousWaybackTimestamp":"20240401081530",
"evidence":["Digest changed from ABC123 to M5W6TLBPLQWJXTQWJ2R5XQ7Y3YQK4K6L"]
},
"version":null,
"diff":null,
"summary":null,
"alert":null
}

💳 Pricing

This Actor uses pay-per-event pricing. You are charged for each saved successful Wayback result: snapshot, version, summary, or monitoring alert. Empty archive searches, invalid inputs, skipped content fetches, and source issues do not create dataset rows.

The planned pricing starts at $0.0018 per saved archive result on the Free tier and goes down to $0.0009 per saved archive result on higher tiers. Always check the Actor Pricing tab before starting a large run.

⚠️ Limits and caveats

The Internet Archive may not have snapshots for every page or date.
A successful run can return zero rows when no matching archive data exists.
Archive text can be unavailable, non-HTML, capped, or skipped by the content fetch limit.
Monitoring compares the latest saved archive state for the same target and filters. It is based on Wayback captures, not live website polling.
Change labels are mechanical and source-backed. They do not claim semantic meaning such as a product, legal, or pricing change unless the returned text evidence shows it.

❓ FAQ

❓ Does this use the official Wayback Machine?

It reads public Internet Archive Wayback Machine data through the CDX index and archived playback pages.

🔑 Do I need a Wayback Machine API key?

No. The Actor does not ask for a Wayback Machine API key, cookies, or login.

📡 Can it monitor a live website?

It monitors changes in Wayback Machine archive captures. It does not poll the current live page independently.

🔗 Why is there no `archiveUrl` field?

Rows keep originalUrl and waybackTimestamp. A playback URL can be reconstructed as https://web.archive.org/web/{waybackTimestamp}/{originalUrl} when you need to open the archived page.

📝 Changelog

0.1: Initial release.

🆘 Support

For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h 🫡

🔗 Other actors

Sitemap Sniffer ↗ - Find public sitemap files and URL inventories before an archive or SEO audit.
Website URL Crawler ↗ - Crawl rendered website links and export a clean link map.
SEMrush Free Website Stats Scraper ↗ - Collect public domain traffic, authority, backlink, and referral metrics.
Ahrefs Free Website Stats Scraper ↗ - Export public Ahrefs domain rating, traffic, rank, and linking website stats.
Font Detector ↗ - Detect fonts, CSS families, and source evidence on public web pages.

Made with ❤️ by Maxime Dupré

👁 Wayback Machine Search avatar

Wayback Machine Search

crawlerbros/wayback-machine-search

Query Internet Archive's Wayback Machine for historical snapshots of any URL or domain. Filter by date, HTTP status, MIME type, and deduplicate. Optionally fetch the archived page text. Free public CDX API, no authentication.

👁 User avatar

Crawler Bros

👁 Wayback Machine Bulk Lookup avatar

Wayback Machine Bulk Lookup

jungle_synthesizer/wayback-machine-bulk-lookup

Look up Wayback Machine snapshots for any URL or list of URLs. Returns capture timeline, optional snapshot markdown, and live-vs-snapshot diff. Date range filtering, capture limit, bulk input. Built for OSINT, journalism, SEO link-rot recovery, and legal evidence.

👁 User avatar

BowTiedRaccoon

👁 Wayback Machine Scraper - Track Website Changes Over Time avatar

Wayback Machine Scraper - Track Website Changes Over Time

ryanclinton/wayback-machine-search

Search the Internet Archive's Wayback Machine for historical snapshots of any website. Retrieve archived page metadata -- including timestamps, URLs, MIME types, HTTP status codes, and content hashes -- for up to 10,000 snapshots per run.

👁 User avatar

Ryan Clinton

👁 Expired Domains Scraper avatar

Expired Domains Scraper

martin1080p/expired-domains-scraper

The Expired Domains Scraper automates finding valuable expired domains from expireddomains.com, offering filters and sorting by SEO metrics and auction details for efficient domain acquisition.

👁 User avatar

Martin Fanta

267

1.0

(4)

👁 Influencer Brand Safety Intelligence MCP Server avatar

Influencer Brand Safety Intelligence MCP Server

ryanclinton/influencer-brand-safety-intelligence-mcp

Creator vetting and brand risk intelligence via the Model Context Protocol.

👁 User avatar

Ryan Clinton

Brand Reputation Monitor — Threats & Reviews

ryanclinton/brand-reputation-monitor

Comprehensive brand reputation threat intelligence engine that runs 8 sub-actors in parallel and applies four proprietary scoring models — Brand Threat Assessment, Domain Impersonation Detection, Review Authenticity Analysis, and Narrative Drift Tracking — to produce a DEFCON-rated threat report...

👁 User avatar

Ryan Clinton

👁 Internet Archive Search — Wayback Machine Advanced Query Tool avatar

Internet Archive Search — Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support — date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

👁 User avatar

Maged

👁 SEO ZOMBIE SLAYER avatar

SEO ZOMBIE SLAYER

actor_researcher.48/seo-zombie-slayer

SEO Zombie Slayer crawls websites to hunt dead links, SEO issues, performance problems, and security risks. Includes Lighthouse audits, competitor comparison, and a fun game mode with XP, levels, and boss battles to turn SEO audits into action

👁 User avatar

ANIRBAN ROY

5.0

(1)

👁 Automated reconnaissance actor for bug bounty hunters avatar

Automated reconnaissance actor for bug bounty hunters

wonderful_beluga/automated-reconnaissance-actor-for-bug-bounty-hunters

This Apify actor automates bug bounty recon by scraping the Wayback Machine and GitHub for legacy attack surfaces. It extracts historical URLs, public code, and deprecated files, parsing them to uncover hidden subdomains and forgotten API endpoints. The findings are saved into structured JSON files.

👁 User avatar

Zaher el siddik

Internet Archive & Wayback Machine Scraper

cloud9_ai/internet-archive-scraper

Search Internet Archive and check Wayback Machine snapshots. Access 800B+ archived pages, books, movies, audio. Search items, get metadata, or check URL archive history. No API key needed. For SEO, OSINT, legal, and research.

👁 User avatar

cloud9

URL: https://apify.com/maximedupre/wayback-machine-search

⇱ Wayback Machine Search for Archive History · Apify

Wayback Machine Search

🕰️ Wayback Machine search for archive history

🔎 What this Actor does

📦 Data you get

🚀 How to run it

⚙️ Input options

🧾 Output example

💳 Pricing

⚠️ Limits and caveats

❓ FAQ

❓ Does this use the official Wayback Machine?

🔑 Do I need a Wayback Machine API key?

📡 Can it monitor a live website?

🔗 Why is there no `archiveUrl` field?

📝 Changelog

🆘 Support

🔗 Other actors

You might also like

Wayback Machine Search

Wayback Machine Bulk Lookup

Wayback Machine Scraper - Track Website Changes Over Time

Expired Domains Scraper

Influencer Brand Safety Intelligence MCP Server

Brand Reputation Monitor — Threats & Reviews

Internet Archive Search — Wayback Machine Advanced Query Tool

SEO ZOMBIE SLAYER

Automated reconnaissance actor for bug bounty hunters

Internet Archive & Wayback Machine Scraper

URL: https://apify.com/maximedupre/wayback-machine-search

⇱ Wayback Machine Search for Archive History · Apify

Wayback Machine Search

🕰️ Wayback Machine search for archive history

🔎 What this Actor does

📦 Data you get

🚀 How to run it

⚙️ Input options

🧾 Output example

💳 Pricing

⚠️ Limits and caveats

❓ FAQ

❓ Does this use the official Wayback Machine?

🔑 Do I need a Wayback Machine API key?

📡 Can it monitor a live website?

🔗 Why is there no archiveUrl field?

📝 Changelog

🆘 Support

🔗 Other actors

You might also like

Wayback Machine Search

Wayback Machine Bulk Lookup

Wayback Machine Scraper - Track Website Changes Over Time

Expired Domains Scraper

Influencer Brand Safety Intelligence MCP Server

Brand Reputation Monitor — Threats & Reviews

Internet Archive Search — Wayback Machine Advanced Query Tool

SEO ZOMBIE SLAYER

Automated reconnaissance actor for bug bounty hunters

Internet Archive & Wayback Machine Scraper

🔗 Why is there no `archiveUrl` field?