Pricing
from $3.00 / 1,000 results
Internet Archive Search Scraper
Searches and retrieves items from the Internet Archive (archive.org) - 44M+ books, videos, audio, software, and web archives. Free, no API key required.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
8 days ago
Last modified
Categories
Share
Search and retrieve items from the Internet Archive (archive.org) โ the world's largest digital library with 44M+ books, videos, audio recordings, software, and web archives. Free, no API key required.
What does this actor do?
This actor lets you:
- Search the entire Internet Archive by keyword with filters for media type, collection, language, date range, and sort order.
- Fetch specific items by their unique Archive.org identifiers, getting enriched metadata including file counts and item sizes.
Data Source
All data is retrieved from the Internet Archive public API:
- Advanced Search API:
https://archive.org/advancedsearch.phpโ free, no authentication required. - Metadata API:
https://archive.org/metadata/{identifier}โ free, no authentication required.
Input
| Field | Type | Description |
|---|---|---|
mode | Select | search (default) or byIdentifiers |
query | String | Search keywords (e.g. "public domain books", "jazz music") |
mediaType | Select | Filter by type: texts, audio, movies, software, image, etree, data, web, collection, account |
collection | String | Filter by collection slug (e.g. "gutenbergbooks", "librivoxaudio", "prelinger") |
language | String | Filter by language code (e.g. "eng", "fra", "spa") |
dateFrom | String | Start date filter (YYYY or YYYY-MM-DD) |
dateTo | String | End date filter (YYYY or YYYY-MM-DD) |
sortBy | Select | Sort order: most downloaded, newest, oldest, or alphabetical |
identifiers | Array | Specific Archive.org identifiers (for byIdentifiers mode) |
maxItems | Integer | Max items to return (default: 50, max: 5000) |
Example Inputs
Search for classic literature texts:
{"mode":"search","query":"shakespeare","mediaType":"texts","language":"eng","maxItems":25}
Fetch specific items by identifier:
{"mode":"byIdentifiers","identifiers":["gutenberg-hamlet","adventures_of_huckleberry_finn_librivox"],"maxItems":10}
Search for audio recordings in a date range:
{"mode":"search","query":"blues music","mediaType":"audio","dateFrom":"1920","dateTo":"1960","sortBy":"-publicdate","maxItems":100}
Output
Each item in the dataset contains:
| Field | Description |
|---|---|
identifier | Unique Archive.org identifier |
url | Direct URL to the item page (archive.org/details/{identifier}) |
title | Item title |
description | Item description |
creator | Author or creator |
date | Creation or publication date |
mediatype | Type of media (texts, audio, movies, etc.) |
collection | Collection it belongs to |
language | Language code(s) |
subject | Subject tags (up to 10) |
format | File format(s) (up to 5) |
downloads | Total download count |
files_count | Number of files in the item (byIdentifiers mode) |
item_size | Total size in bytes (byIdentifiers mode) |
server | Serving server hostname (byIdentifiers mode) |
scrapedAt | ISO 8601 timestamp of when data was scraped |
Example Output
{"identifier":"gutenberg-hamlet","url":"https://archive.org/details/gutenberg-hamlet","title":"Hamlet","description":"A classic tragedy by William Shakespeare","creator":"William Shakespeare","date":"1603","mediatype":"texts","collection":"gutenbergbooks","language":"eng","subject":["drama","tragedy","Shakespeare"],"format":["PDF","EPUB","Plain Text"],"downloads":85432,"scrapedAt":"2026-01-15T10:30:00+00:00"}
Frequently Asked Questions
Is this free to use? Yes. The Internet Archive provides a completely free public API with no authentication required.
How many items can I retrieve?
Up to 5,000 items per run using the maxItems parameter.
What media types are available? Texts (books), Audio, Movies/Video, Software, Images, Live Music (etree), Data sets, Web Archives, and Collections.
Can I filter by collection?
Yes โ use the collection field with a collection slug (e.g. "gutenbergbooks" for Project Gutenberg books, "librivoxaudio" for LibriVox audiobooks, "prelinger" for Prelinger Archives films).
Can I search in specific languages? Yes โ use ISO 639-3 language codes like "eng" (English), "fra" (French), "spa" (Spanish), "deu" (German).
What are identifiers?
Every Internet Archive item has a unique identifier (e.g. "gutenberg-hamlet"). You can find these in Archive.org URLs: archive.org/details/{identifier}.
How is the data rate-limited? The actor adds a 0.3s delay between search pages and 0.5s between metadata requests to respect the API's guidelines.
Use Cases
- Building digital library catalogs
- Research on public domain content
- Finding historical audio/video recordings
- Locating old software for preservation research
- Downloading metadata for academic research
- Tracking download statistics for archive items
