OpenLibrary Books β Metadata, ISBNs, Authors, CSV, No API Key
Pricing
Pay per usage
OpenLibrary Books β Metadata, ISBNs, Authors, CSV, No API Key
19 runs. OpenLibrary metadata as CSV/JSON β titles, authors, ISBNs, subjects, languages, pageCount, coverUrl, ebookAccess, ratings. By query/ISBN/subject/author. For library cataloguing + book-rec engines + academic research. No API key. Backed by 951-run Trustpilot flagship + 31-actor portfolio.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
5
Total users
1
Monthly active users
2 months ago
Last modified
Categories
Share
OpenLibrary Book Scraper β Metadata, ISBNs, Authors, Subjects
Scrape book metadata from the free OpenLibrary API. No API key, no rate-limit token, no auth wall. Four input modes: search queries, ISBN lookups, subject browse, author works. Output JSON or CSV.
Built for: library data builds, reading-list automation, ISBN enrichment, book recommendation datasets, academic citation enrichment.
What this actor does (honest scope, verified against src/main.js)
Calls these public OpenLibrary endpoints under the hood:
| Input field | Endpoint hit | Returns |
|---|---|---|
searchQueries | /search.json?q=β¦&page=N&limit=50 | 50 docs/page, paginated until maxBooksPerSource reached or numFound <= collected |
isbns | /isbn/{isbn}.json | One book per ISBN |
subjects | /subjects/{slug}.json?limit=min(maxBooksPerSource, 50) | Hard-capped at 50 β even if you set maxBooksPerSource=200, subject browse returns at most 50 |
authors | /search/authors.json?limit=1 + /authors/{key}/works.json?limit=min(maxBooksPerSource, 50) | First author match only (no disambiguation), then up to 50 works |
Sets User-Agent: ApifyOpenLibraryScraper/1.0. Inserts polite delays between requests: 200ms after each work-description fetch, 300ms before each ISBN/subject/author lookup, 500ms between search-mode pages. If includeDescription=true (default), search-mode and isbn-mode fire one extra /works/{key}.json per book to pull the description text β slower but richer. Subject-mode and author-mode never fetch the work-description endpoint β they read whatever description is already in the listing payload.
Input parameters
| Field | Type | Default | Description |
|---|---|---|---|
searchQueries | array of strings | [] | Free-text search (title/keyword/phrase) |
isbns | array of strings | [] | ISBN-10 or ISBN-13 lookups |
subjects | array of strings | [] | Subject names β auto-lowercased and spaces replaced with underscores (e.g. "Science Fiction" β slug science_fiction). Special characters NOT escaped beyond URL-encoding β exotic subject names may 404. |
authors | array of strings | [] | Author names (e.g. "Isaac Asimov"). Only the first match is taken (limit=1) β common-name authors may resolve to a different person than expected. Use the OpenLibrary author-key directly via a custom build if disambiguation matters. |
maxBooksPerSource | integer | 50 | Cap per query/ISBN/subject/author (schema allows 1-200, but subjects and authors are server-capped at 50 regardless) |
includeDescription | boolean | true | Fetch full description (extra API call per book in search-mode and isbn-mode only) |
You can mix all four modes in a single run. Each output record carries a source field telling you which mode produced it (search:<query>, isbn:<n>, subject:<s>, author:<a>).
Output schema (varies by source mode β fields differ deliberately)
Records from different modes carry different field sets. This is by design β OpenLibrary returns richer metadata for search results than for ISBN / subject / author endpoints.
search: mode (22 base fields, +description with includeDescription, +2 metadata = up to 25)
{"title":"Foundation","authors":["Isaac Asimov"],"authorKeys":["OL26320A"],"firstPublishYear":1951,"publishYears":[1951,1952,1955,1962,1974],"isbn":"9780553293357","allIsbns":["9780553293357","9780553382570","..."],"subjects":["Science fiction","Galactic empire","..."],"publishers":["Bantam Spectra","Doubleday","..."],"languages":["eng"],"pageCount":244,"editionCount":142,"coverUrl":"https://covers.openlibrary.org/b/id/9261361-L.jpg","openLibraryKey":"/works/OL46828W","openLibraryUrl":"https://openlibrary.org/works/OL46828W","ebookAccess":"borrowable","hasFulltext":true,"ratingsAverage":4.12,"ratingsCount":1284,"wantToRead":8421,"currentlyReading":412,"alreadyRead":6203,"description":"In the waning days of a future Galactic Empire...","source":"search:foundation","scrapedAt":"2026-04-29T12:00:00.000Z"}
Field caps in search-mode: allIsbns truncated to first 10, subjects truncated to first 20, publishers truncated to first 5. pageCount is number_of_pages_median (median across editions, not the specific-edition page count).
isbn: mode (10 base fields, +description+subjects if includeDescription=true)
title, isbn, publishers (uncapped), publishDate, pageCount (specific-edition number_of_pages, NOT median), coverUrl, openLibraryKey, openLibraryUrl, source, scrapedAt. With includeDescription=true, adds description and subjects (uncapped). Description fetch is wrapped in a silent try/catch β on failure, both description and subjects are simply absent (no error field, no retry).
subject: mode (10 fields)
title, authors (array of names β different shape than search-mode's authorKeys), coverUrl, openLibraryKey, openLibraryUrl, editionCount, firstPublishYear, subject, source, scrapedAt. No ratings, no ISBN, no description in this mode β that's an OpenLibrary /subjects/ endpoint limitation, not ours. Server hard-caps to 50 records regardless of maxBooksPerSource.
author: mode (7 base fields, +description if includeDescription=true)
title, authors (1-element array with the resolved author name), authorKey (singular β different from search-mode's plural authorKeys), openLibraryKey, openLibraryUrl, covers (capped to first 3 cover URLs), source, scrapedAt. With includeDescription=true, adds description IF the author-works payload already contains it (no extra API call β purely best-effort). Server hard-caps to 50 works regardless of maxBooksPerSource.
Field-name asymmetry across modes: search-mode emits authorKeys (plural array) + coverUrl (single URL); author-mode emits authorKey (singular string) + covers (array of up to 3); subject-mode and isbn-mode emit neither. If you join across modes, normalize these explicitly.
Operational caveats
- β οΈ Outer try/catch wraps the entire 4-mode for-loop (
src/main.jslines 57-222). ISBN, subject, and author loops have inner try/catch so individual lookup failures don't halt their batch. BUT search-mode does NOT have inner protection β a single search-API failure (e.g. transient HTTP 500, network blip) kills the run mid-stream and skips ALL remaining search queries, ISBN lookups, subject browses, and author lookups. Run problematic queries in isolation if dropout matters. - No retry / no proxy. Single
fetch()per URL. Heavy bursts may eventually trigger OpenLibrary's polite-use ceiling (~100 req/min unofficial); the actor will surface that as a thrown HTTP error. - Description-fetch silent-empty. When
includeDescription=trueand the work-page fetch fails,descriptionis set to empty string (search-mode) or absent (isbn-mode) β no error is logged per book. - Subject slug transform is naive. Input
"Science Fiction"β slug"science_fiction". Special characters beyond letters/spaces are URL-encoded but not slug-normalized; subjects like"RenΓ© Magritte's books"will likely 404.
What this actor does NOT do
- No reading-progress / personal-list scraping β OpenLibrary doesn't expose individual users' lists.
- No full-text book content β only metadata + descriptions. Read free books at openlibrary.org or via Internet Archive.
- No price comparison β OpenLibrary is metadata-only, not a bookstore.
- No deduplication across modes β if you search
"Foundation"and lookup ISBN9780553293357, you'll get 2 records. Dedupe byopenLibraryKeypost-run if needed. - No incremental crawl / cursor state β each run starts fresh from page 1.
- No author disambiguation β first match wins.
When this stops being enough
If you need book full-text β use Internet Archive. If you need real-time bookstore prices β write a separate Amazon/Bookshop scraper. If you need annotated bibliographies β look at Goodreads (no public API since 2020, harder).
Custom builds β pilot tiers
This actor runs on Apify's standard compute. If you need a custom variant β search-mode-only with retry+backoff, ISBN-bulk with deduplication, subject browse paginated past the 50-cap (via search workaround), author-key direct lookup, hourly cron, Slack alerts on new releases β three tiers:
- Pilot β $97 Β· 1 actor, basic config, 7-day support. Good for one-off "top 200 books in subject X" via search + subject hybrid.
- Standard β $297 Β· custom actor + Slack/email alerts on results, 30-day support. Most reading-list / catalog-enrichment projects fit here.
- Premium β $797 Β· custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (weekly new-release feed, ISBN-stream enrichment, author-tracking dashboards).
Email: spinov001@gmail.com β drop the input shape and the schema you need; quote within 48h.
Proof of work: 31 published Apify scrapers (78 total in portfolio) β Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).
More tips: t.me/scraping_ai Β· blog.spinov.online
Related scrapers
| Source | Actor | Data |
|---|---|---|
| OpenLibrary (this) | Book metadata + ISBN/subject/author | Bibliographic |
| Wikipedia Scraper | Article + sections + references | Encyclopedic |
| arXiv Paper Scraper | Academic preprints | Research |
| [Google Books style β request a custom build via email] | β | β |
All 31 published actors free to inspect on Apify Store.
Disclaimer
Scrapes the publicly accessible OpenLibrary API endpoints. Respects polite delays (200-500ms between requests). Not affiliated with the Internet Archive or OpenLibrary.
Honest disclosure: search-mode 22 base fields (up to 25 with description + 2 metadata fields), isbn-mode 10 base, subject-mode 10 fields, author-mode 7 base. Subject and author endpoints server-capped at 50 records regardless of maxBooksPerSource. Outer try/catch β single search-API failure halts the entire run. Single-attempt fetch, no retry/no proxy. Author-mode uses limit=1 for disambiguation β first match wins.
