VOOZH about

URL: https://apify.com/web.harvester/websites-archiver

โ‡ฑ Websites Archiver (Wayback Machine) ยท Apify


๐Ÿ‘ Websites Archiver (Wayback Machine) avatar

Websites Archiver (Wayback Machine)

Pricing

$9.00/month + usage

Go to Apify Store

Websites Archiver (Wayback Machine)

Effortlessly archive any website with our Automated Website Archiving Tool. It leverages the power of the Wayback Machine at web.archive.org to ensure your sites are preserved for future reference.

Pricing

$9.00/month + usage

Rating

5.0

(1)

Developer

๐Ÿ‘ Web Harvester

Web Harvester

Maintained by Community

Actor stats

3

Bookmarked

88

Total users

2

Monthly active users

6 months ago

Last modified

Share

Website Archiver (Wayback Machine)

Effortlessly archive any website with our Automated Website Archiving Tool. It leverages the power of the Wayback Machine at web.archive.org to ensure your sites are preserved for future reference.

Usage

The actor accepts an input in the following format:

{
"startUrls":[
{
"url":"https://crawlee.dev"
}
],
"fastArchiveMode":true,
"archiveErrorPages":true,
"storeArchivedResources":false
}

Input Options

OptionTypeDefaultDescription
startUrlsarrayrequiredList of URLs to archive
fastArchiveModebooleantrueWhen enabled, sends archive request without waiting for full completion. Faster but provides less detailed output.
archiveErrorPagesbooleantrueWhether to archive pages that return HTTP 4xx and 5xx status codes
storeArchivedResourcesbooleanfalseWhether to include the list of archived resources in the output (only available in full mode)

Output

Full Archive Mode (fastArchiveMode: false)

{
"url":"https://crawlee.dev",
"archivedUrl":"https://web.archive.org/web/20240610223756/https://crawlee.dev/",
"archived":true,
"archivedAt":"2024-06-10T22:38:15.643Z",
"archivedResourcesCount":69,
"archivedResources":[
"https://crawlee.dev/",
"https://crawlee.dev/js/custom.js",
"https://crawlee.dev/assets/css/styles.5a93fba9.css"
]
}

Fast Archive Mode (fastArchiveMode: true)

{
"url":"https://crawlee.dev",
"archivedUrl":"https://web.archive.org/web/20240610223756/https://crawlee.dev/",
"archived":true,
"archivedAt":"2024-06-10T22:38:15.643Z"
}

Failed Archive

{
"url":"https://example.com/blocked",
"archivedUrl":null,
"note":"This URL has been excluded from the Wayback Machine",
"archived":false
}

Running the Actor

To run the actor, you'll need to have an Apify account. Once you're logged in, you can run the actor from the Apify Console. You can also use the Apify API to run the actor programmatically.

For more information on how to use Apify Actors, please refer to the Apify documentation.

You might also like

Wayback Machine Historical Content Scraper

happyfhantum/wayback-machine-historical-content-scraper

Compare archived website snapshots through the Wayback Machine and extract page-history change signals.

89

4.0

Wayback Machine Checker

automation-lab/wayback-machine-checker

This actor checks if URLs are archived in the Internet Archive Wayback Machine. It retrieves snapshot counts, oldest and newest archive dates, and direct links to archived versions. Uses both the Availability API and CDX API for comprehensive results.

๐Ÿ‘ User avatar

Stas Persiianenko

41

Wayback Machine Scraper - Track Website Changes Over Time

ryanclinton/wayback-machine-search

Search the Internet Archive's Wayback Machine for historical snapshots of any website. Retrieve archived page metadata -- including timestamps, URLs, MIME types, HTTP status codes, and content hashes -- for up to 10,000 snapshots per run.

74

Wayback Machine Search

maximedupre/wayback-machine-search

Search Wayback Machine snapshots for URLs, hosts, and domains. Export archive dates, status codes, MIME types, digests, content text, version timelines, reports, and monitoring alerts.

๐Ÿ‘ User avatar

Maxime Duprรฉ

2

Wayback Machine Scraper

gio21/wayback-machine-scraper

List Internet Archive Wayback Machine snapshots for one or more URLs. Returns timestamp, snapshot URL, HTTP status, MIME type, digest. Useful for tracking website changes over time, OSINT research, content recovery, and brand monitoring.

Wayback Machine Search

crawlerbros/wayback-machine-search

Query Internet Archive's Wayback Machine for historical snapshots of any URL or domain. Filter by date, HTTP status, MIME type, and deduplicate. Optionally fetch the archived page text. Free public CDX API, no authentication.

Internet Archive Search โ€” Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support โ€” date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

Wayback Machine Archive Scraper

andok/wayback-machine-scraper

Fetch historical snapshots of any webpage from the Internet Archive. Perfect for digital forensics and tracking deleted content.

Related articles

How to use web scraping for online research
Read more
Python and machine learning
Read more
Pros and cons of web scraping
Read more