👁 Internet Archive Search Scraper avatar

Internet Archive Search Scraper

Pricing

from $3.00 / 1,000 results

Internet Archive Search Scraper

Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

👁 Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

8 days ago

Last modified

What does this actor do?

This actor lets you:

Search the entire Internet Archive by keyword with filters for media type, collection, language, date range, and sort order.
Fetch specific items by their unique Archive.org identifiers, getting enriched metadata including file counts and item sizes.

Data Source

All data is retrieved from the Internet Archive public API:

Advanced Search API: https://archive.org/advancedsearch.php — free, no authentication required.
Metadata API: https://archive.org/metadata/{identifier} — free, no authentication required.

Input

Field	Type	Description
`mode`	Select	`search` (default) or `byIdentifiers`
`query`	String	Search keywords (e.g. "public domain books", "jazz music")
`mediaType`	Select	Filter by type: texts, audio, movies, software, image, etree, data, web, collection, account
`collection`	String	Filter by collection slug (e.g. "gutenbergbooks", "librivoxaudio", "prelinger")
`language`	String	Filter by language code (e.g. "eng", "fra", "spa")
`dateFrom`	String	Start date filter (YYYY or YYYY-MM-DD)
`dateTo`	String	End date filter (YYYY or YYYY-MM-DD)
`sortBy`	Select	Sort order: most downloaded, newest, oldest, or alphabetical
`identifiers`	Array	Specific Archive.org identifiers (for byIdentifiers mode)
`maxItems`	Integer	Max items to return (default: 50, max: 5000)

Example Inputs

Search for classic literature texts:

{
"mode":"search",
"query":"shakespeare",
"mediaType":"texts",
"language":"eng",
"maxItems":25
}

Fetch specific items by identifier:

{
"mode":"byIdentifiers",
"identifiers":["gutenberg-hamlet","adventures_of_huckleberry_finn_librivox"],
"maxItems":10
}

Search for audio recordings in a date range:

{
"mode":"search",
"query":"blues music",
"mediaType":"audio",
"dateFrom":"1920",
"dateTo":"1960",
"sortBy":"-publicdate",
"maxItems":100
}

Output

Each item in the dataset contains:

Field	Description
`identifier`	Unique Archive.org identifier
`url`	Direct URL to the item page (archive.org/details/{identifier})
`title`	Item title
`description`	Item description
`creator`	Author or creator
`date`	Creation or publication date
`mediatype`	Type of media (texts, audio, movies, etc.)
`collection`	Collection it belongs to
`language`	Language code(s)
`subject`	Subject tags (up to 10)
`format`	File format(s) (up to 5)
`downloads`	Total download count
`files_count`	Number of files in the item (byIdentifiers mode)
`item_size`	Total size in bytes (byIdentifiers mode)
`server`	Serving server hostname (byIdentifiers mode)
`scrapedAt`	ISO 8601 timestamp of when data was scraped

Example Output

{
"identifier":"gutenberg-hamlet",
"url":"https://archive.org/details/gutenberg-hamlet",
"title":"Hamlet",
"description":"A classic tragedy by William Shakespeare",
"creator":"William Shakespeare",
"date":"1603",
"mediatype":"texts",
"collection":"gutenbergbooks",
"language":"eng",
"subject":["drama","tragedy","Shakespeare"],
"format":["PDF","EPUB","Plain Text"],
"downloads":85432,
"scrapedAt":"2026-01-15T10:30:00+00:00"
}

Frequently Asked Questions

Is this free to use? Yes. The Internet Archive provides a completely free public API with no authentication required.

How many items can I retrieve? Up to 5,000 items per run using the maxItems parameter.

What media types are available? Texts (books), Audio, Movies/Video, Software, Images, Live Music (etree), Data sets, Web Archives, and Collections.

Can I filter by collection? Yes — use the collection field with a collection slug (e.g. "gutenbergbooks" for Project Gutenberg books, "librivoxaudio" for LibriVox audiobooks, "prelinger" for Prelinger Archives films).

Can I search in specific languages? Yes — use ISO 639-3 language codes like "eng" (English), "fra" (French), "spa" (Spanish), "deu" (German).

What are identifiers? Every Internet Archive item has a unique identifier (e.g. "gutenberg-hamlet"). You can find these in Archive.org URLs: archive.org/details/{identifier}.

How is the data rate-limited? The actor adds a 0.3s delay between search pages and 0.5s between metadata requests to respect the API's guidelines.

Use Cases

Building digital library catalogs
Research on public domain content
Finding historical audio/video recordings
Locating old software for preservation research
Downloading metadata for academic research
Tracking download statistics for archive items

👁 Internet Archive Items Scraper - archive.org Search by Query avatar

Internet Archive Items Scraper - archive.org Search by Query

gio21/archive-org-items-scraper

Search Internet Archive (archive.org) items: books, movies, audio, software, images, web archives, data. Returns title, creator, date, description, downloads, identifier, URLs. Free, no key. For research, content discovery, digital preservation.

👁 User avatar

Gio

👁 Internet Archive Scraper avatar

Internet Archive Scraper

automation-lab/internet-archive-scraper

Search and extract metadata from the Internet Archive. Find books, videos, audio, software, and more from 40M+ items.

👁 User avatar

Stas Persiianenko

Internet Archive Scraper

fortuitous_pirate/internet-archive-scraper

Search the Internet Archive's 35+ million items: books, movies, audio, software, and web pages. Filter by media type, subject, creator, language, or date range. Free API.

👁 User avatar

Fortuitous Pirate

👁 Internet Archive Search — Wayback Machine Advanced Query Tool avatar

Internet Archive Search — Wayback Machine Advanced Query Tool

maged120/archive-org-advanced-search

Search the Internet Archive (archive.org) with full advanced filter support — date range, media type, language, subject, and more. Returns metadata from archived web pages, books, audio, and video.

👁 User avatar

Maged

Archive.org Scraper

lulzasaur/archive-org-scraper

Scrape the Internet Archive (archive.org). Search 50M+ texts, 13M+ audio, 16M+ movies, and 1.3M+ software items. Get metadata, download counts, file lists, and more via public APIs.

👁 User avatar

lulz bot

👁 Internet Archive Search Scraper avatar

Internet Archive Search Scraper

parseforge/internet-archive-search-scraper

Search the Internet Archive's 50M+ item catalog of texts, audio, movies, software, web pages, and images. Filter by collection, media type, creator, and date. Pull identifiers, titles, descriptions, downloads, and rich metadata.

👁 User avatar

ParseForge

Internet Archive & Wayback Machine Scraper

cloud9_ai/internet-archive-scraper

Search Internet Archive and check Wayback Machine snapshots. Access 800B+ archived pages, books, movies, audio. Search items, get metadata, or check URL archive history. No API key needed. For SEO, OSINT, legal, and research.

👁 User avatar

cloud9

👁 Internet Archive Book Reviews Scraper avatar

Internet Archive Book Reviews Scraper

thescrapelab/internet-archive-book-reviews-scraper

Extract public Archive.org book metadata, ISBNs, ratings, and user reviews from public Internet Archive endpoints. Start from URLs, identifiers, ISBNs, creators, collections, subjects, or search queries. Output is always one dataset row per public review. No API key required.

👁 User avatar

Inus Grobler

👁 Wayback Machine Snapshots Scraper — Internet Archive History avatar