VOOZH about

URL: https://apify.com/dami_studio/internet-archive-scraper

โ‡ฑ Internet Archive Scraper - Search Millions of Items ยท Apify


Pricing

$2.00 / 1,000 item returneds

Go to Apify Store

Internet Archive Scraper

Searches the Internet Archive (archive.org) by keyword and returns structured items (title, creator, year, downloads, subjects, item URL); filter by media type and sort by downloads or upload date.

Pricing

$2.00 / 1,000 item returneds

Rating

5.0

(1)

Developer

๐Ÿ‘ Dami's Studio

Dami's Studio

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

Search the Internet Archive (archive.org) by keyword and get back clean, structured items โ€” title, creator, year, downloads, subjects, description and the item URL. No API key, no login.

Built on the public advancedsearch.php JSON API. Filter by media type (texts, audio, movies, software, image, โ€ฆ), sort by downloads, date, or relevance, and paginate transparently up to your item limit.

What you get per item

identifier, title, creator, year, date, mediaType, downloads, subjects (array), description (first ~500 chars), publicdate, and url (https://archive.org/details/{identifier}).

Fields that can be null

  • title, creator, year, date, description, publicdate โ€” null when archive.org's metadata doesn't include that field for an item.
  • subjects โ€” empty array when the item has no subject tags.
  • downloads โ€” 0 when not reported.

Input

FieldNotes
queryRequired. Keywords, e.g. nasa apollo, jazz. Supports archive.org Lucene operators, e.g. title:(grateful dead) AND year:[1977 TO 1980].
mediaTypeRestrict to one type: texts, audio, movies, software, image, web, data, collection. Empty = any.
sortdownloads (default), date, publicdate, or relevance.
maxItemsMax unique items to return (default 100). Paginates 100 per request until reached or exhausted.

Output

One dataset row per item. Pricing is pay-per-result: you are only charged for genuine item rows (ok: true). Diagnostic rows are never charged โ€” this includes:

  • empty/invalid input (errorCode: "BAD_INPUT" โ€” empty query or an unknown mediaType),
  • no results for the query (NO_RESULTS),
  • rate limits or network errors (RATE_LIMITED / NETWORK / SERVER_ERROR).

Results are de-duplicated by identifier.

Proxy

The archive.org advancedsearch API is a public, no-auth JSON endpoint with no anti-bot, so no proxy is required and the default runs without one (saving proxy credits). Only enable Apify Proxy if you hit IP rate limits at very high volume.

Troubleshooting

  • Getting a BAD_INPUT row? Provide a non-empty query, and if you set mediaType make sure it's one of the allowed values.
  • NO_RESULTS? The query matched nothing on archive.org โ€” broaden the keywords or remove the media-type filter.
  • Want fewer/more results? Adjust maxItems. The archive can return very large result sets for broad queries.

Example

{"query":"jazz","mediaType":"audio","sort":"downloads","maxItems":50}

Notes

The actor calls advancedsearch.php with output=json, requesting identifier, title, creator, year, date, mediatype, downloads, description, subject, and publicdate, then maps each doc to a clean row. Pagination uses page with 100 rows per request until your maxItems is reached or the numFound total is exhausted.

You might also like

Internet Archive Items Scraper - archive.org Search by Query

gio21/archive-org-items-scraper

Search Internet Archive (archive.org) items: books, movies, audio, software, images, web archives, data. Returns title, creator, date, description, downloads, identifier, URLs. Free, no key. For research, content discovery, digital preservation.

Internet Archive Search Scraper

crawlerbros/internet-archive-search-scraper

Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.

Internet Archive Search Scraper

parseforge/internet-archive-search-scraper

Search the Internet Archive's 50M+ item catalog of texts, audio, movies, software, web pages, and images. Filter by collection, media type, creator, and date. Pull identifiers, titles, descriptions, downloads, and rich metadata.

Internet Archive Search โ€” Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support โ€” date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

Internet Archive Scraper

automation-lab/internet-archive-scraper

Search and extract metadata from the Internet Archive. Find books, videos, audio, software, and more from 40M+ items.

๐Ÿ‘ User avatar

Stas Persiianenko

22

Internet Archive Book Reviews Scraper

thescrapelab/internet-archive-book-reviews-scraper

Extract public Archive.org book metadata, ISBNs, ratings, and user reviews from public Internet Archive endpoints. Start from URLs, identifiers, ISBNs, creators, collections, subjects, or search queries. Output is always one dataset row per public review. No API key required.

Wayback Machine Snapshots Scraper โ€” Internet Archive History

seemuapps/wayback-machine-snapshots-scraper

List every Internet Archive snapshot of a URL, page, or whole domain. Timestamp, snapshot URL, status code, mime type, content length. No login.