Internet Archive Scraper

Pricing

$2.00 / 1,000 item returneds

Internet Archive Scraper

Searches the Internet Archive (archive.org) by keyword and returns structured items (title, creator, year, downloads, subjects, item URL); filter by media type and sort by downloads or upload date.

Pricing

$2.00 / 1,000 item returneds

Rating

5.0

(1)

Developer

👁 Dami's Studio

Dami's Studio

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

7 days ago

Last modified

What you get per item

identifier, title, creator, year, date, mediaType, downloads, subjects (array), description (first ~500 chars), publicdate, and url (https://archive.org/details/{identifier}).

Fields that can be null

title, creator, year, date, description, publicdate — null when archive.org's metadata doesn't include that field for an item.
subjects — empty array when the item has no subject tags.
downloads — 0 when not reported.

Input

Field	Notes
`query`	Required. Keywords, e.g. `nasa apollo`, `jazz`. Supports archive.org Lucene operators, e.g. `title:(grateful dead) AND year:[1977 TO 1980]`.
`mediaType`	Restrict to one type: `texts`, `audio`, `movies`, `software`, `image`, `web`, `data`, `collection`. Empty = any.
`sort`	`downloads` (default), `date`, `publicdate`, or `relevance`.
`maxItems`	Max unique items to return (default 100). Paginates 100 per request until reached or exhausted.

Output

One dataset row per item. Pricing is pay-per-result: you are only charged for genuine item rows (ok: true). Diagnostic rows are never charged — this includes:

empty/invalid input (errorCode: "BAD_INPUT" — empty query or an unknown mediaType),
no results for the query (NO_RESULTS),
rate limits or network errors (RATE_LIMITED / NETWORK / SERVER_ERROR).

Results are de-duplicated by identifier.

Proxy

The archive.org advancedsearch API is a public, no-auth JSON endpoint with no anti-bot, so no proxy is required and the default runs without one (saving proxy credits). Only enable Apify Proxy if you hit IP rate limits at very high volume.

Troubleshooting

Getting a BAD_INPUT row? Provide a non-empty query, and if you set mediaType make sure it's one of the allowed values.
NO_RESULTS? The query matched nothing on archive.org — broaden the keywords or remove the media-type filter.
Want fewer/more results? Adjust maxItems. The archive can return very large result sets for broad queries.

Example

{"query":"jazz","mediaType":"audio","sort":"downloads","maxItems":50}

Notes

The actor calls advancedsearch.php with output=json, requesting identifier, title, creator, year, date, mediatype, downloads, description, subject, and publicdate, then maps each doc to a clean row. Pagination uses page with 100 rows per request until your maxItems is reached or the numFound total is exhausted.

👁 Internet Archive Items Scraper - archive.org Search by Query avatar

Internet Archive Items Scraper - archive.org Search by Query

gio21/archive-org-items-scraper

Search Internet Archive (archive.org) items: books, movies, audio, software, images, web archives, data. Returns title, creator, date, description, downloads, identifier, URLs. Free, no key. For research, content discovery, digital preservation.

👁 User avatar

Gio

👁 Internet Archive Search Scraper avatar

Internet Archive Search Scraper

crawlerbros/internet-archive-search-scraper

Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.

👁 User avatar

Crawler Bros

👁 Internet Archive Search Scraper avatar

Internet Archive Search Scraper

parseforge/internet-archive-search-scraper

Search the Internet Archive's 50M+ item catalog of texts, audio, movies, software, web pages, and images. Filter by collection, media type, creator, and date. Pull identifiers, titles, descriptions, downloads, and rich metadata.

👁 User avatar

ParseForge

Internet Archive Scraper

fortuitous_pirate/internet-archive-scraper

Search the Internet Archive's 35+ million items: books, movies, audio, software, and web pages. Filter by media type, subject, creator, language, or date range. Free API.

👁 User avatar

Fortuitous Pirate

👁 Internet Archive Search — Wayback Machine Advanced Query Tool avatar

Internet Archive Search — Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support — date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

👁 User avatar

Maged

👁 Internet Archive Scraper avatar

Internet Archive Scraper

automation-lab/internet-archive-scraper

Search and extract metadata from the Internet Archive. Find books, videos, audio, software, and more from 40M+ items.

👁 User avatar

Stas Persiianenko

👁 Internet Archive Book Reviews Scraper avatar

Internet Archive Book Reviews Scraper

thescrapelab/internet-archive-book-reviews-scraper

Extract public Archive.org book metadata, ISBNs, ratings, and user reviews from public Internet Archive endpoints. Start from URLs, identifiers, ISBNs, creators, collections, subjects, or search queries. Output is always one dataset row per public review. No API key required.

👁 User avatar

Inus Grobler

Archive.org Scraper

lulzasaur/archive-org-scraper

Scrape the Internet Archive (archive.org). Search 50M+ texts, 13M+ audio, 16M+ movies, and 1.3M+ software items. Get metadata, download counts, file lists, and more via public APIs.

👁 User avatar

lulz bot

Internet Archive & Wayback Machine Scraper

cloud9_ai/internet-archive-scraper

Search Internet Archive and check Wayback Machine snapshots. Access 800B+ archived pages, books, movies, audio. Search items, get metadata, or check URL archive history. No API key needed. For SEO, OSINT, legal, and research.

👁 User avatar

cloud9

👁 Wayback Machine Snapshots Scraper — Internet Archive History avatar

Wayback Machine Snapshots Scraper — Internet Archive History

seemuapps/wayback-machine-snapshots-scraper

List every Internet Archive snapshot of a URL, page, or whole domain. Timestamp, snapshot URL, status code, mime type, content length. No login.

👁 User avatar

Andrew

URL: https://apify.com/dami_studio/internet-archive-scraper

⇱ Internet Archive Scraper - Search Millions of Items · Apify

Internet Archive Scraper

What you get per item

Fields that can be null

Input

Output

Proxy

Troubleshooting

Example

Notes

You might also like

Internet Archive Items Scraper - archive.org Search by Query

Internet Archive Search Scraper

Internet Archive Search Scraper

Internet Archive Scraper

Internet Archive Search — Wayback Machine Advanced Query Tool

Internet Archive Scraper

Internet Archive Book Reviews Scraper

Archive.org Scraper

Internet Archive & Wayback Machine Scraper

Wayback Machine Snapshots Scraper — Internet Archive History