VOOZH about

URL: https://apify.com/knotless_cadence/openlibrary-book-scraper

⇱ OpenLibrary Book Scraper β€” ISBN, Authors, Book Metadata API Β· Apify


πŸ‘ OpenLibrary Books β€” Metadata, ISBNs, Authors, CSV, No API Key avatar

OpenLibrary Books β€” Metadata, ISBNs, Authors, CSV, No API Key

Pricing

Pay per usage

Go to Apify Store

OpenLibrary Books β€” Metadata, ISBNs, Authors, CSV, No API Key

19 runs. OpenLibrary metadata as CSV/JSON β€” titles, authors, ISBNs, subjects, languages, pageCount, coverUrl, ebookAccess, ratings. By query/ISBN/subject/author. For library cataloguing + book-rec engines + academic research. No API key. Backed by 951-run Trustpilot flagship + 31-actor portfolio.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

πŸ‘ Alex

Alex

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

1

Monthly active users

2 months ago

Last modified

Share

OpenLibrary Book Scraper β€” Metadata, ISBNs, Authors, Subjects

Scrape book metadata from the free OpenLibrary API. No API key, no rate-limit token, no auth wall. Four input modes: search queries, ISBN lookups, subject browse, author works. Output JSON or CSV.

Built for: library data builds, reading-list automation, ISBN enrichment, book recommendation datasets, academic citation enrichment.


What this actor does (honest scope, verified against src/main.js)

Calls these public OpenLibrary endpoints under the hood:

Input fieldEndpoint hitReturns
searchQueries/search.json?q=…&page=N&limit=5050 docs/page, paginated until maxBooksPerSource reached or numFound <= collected
isbns/isbn/{isbn}.jsonOne book per ISBN
subjects/subjects/{slug}.json?limit=min(maxBooksPerSource, 50)Hard-capped at 50 β€” even if you set maxBooksPerSource=200, subject browse returns at most 50
authors/search/authors.json?limit=1 + /authors/{key}/works.json?limit=min(maxBooksPerSource, 50)First author match only (no disambiguation), then up to 50 works

Sets User-Agent: ApifyOpenLibraryScraper/1.0. Inserts polite delays between requests: 200ms after each work-description fetch, 300ms before each ISBN/subject/author lookup, 500ms between search-mode pages. If includeDescription=true (default), search-mode and isbn-mode fire one extra /works/{key}.json per book to pull the description text β€” slower but richer. Subject-mode and author-mode never fetch the work-description endpoint β€” they read whatever description is already in the listing payload.


Input parameters

FieldTypeDefaultDescription
searchQueriesarray of strings[]Free-text search (title/keyword/phrase)
isbnsarray of strings[]ISBN-10 or ISBN-13 lookups
subjectsarray of strings[]Subject names β€” auto-lowercased and spaces replaced with underscores (e.g. "Science Fiction" β†’ slug science_fiction). Special characters NOT escaped beyond URL-encoding β€” exotic subject names may 404.
authorsarray of strings[]Author names (e.g. "Isaac Asimov"). Only the first match is taken (limit=1) β€” common-name authors may resolve to a different person than expected. Use the OpenLibrary author-key directly via a custom build if disambiguation matters.
maxBooksPerSourceinteger50Cap per query/ISBN/subject/author (schema allows 1-200, but subjects and authors are server-capped at 50 regardless)
includeDescriptionbooleantrueFetch full description (extra API call per book in search-mode and isbn-mode only)

You can mix all four modes in a single run. Each output record carries a source field telling you which mode produced it (search:<query>, isbn:<n>, subject:<s>, author:<a>).


Output schema (varies by source mode β€” fields differ deliberately)

Records from different modes carry different field sets. This is by design β€” OpenLibrary returns richer metadata for search results than for ISBN / subject / author endpoints.

search: mode (22 base fields, +description with includeDescription, +2 metadata = up to 25)

{
"title":"Foundation",
"authors":["Isaac Asimov"],
"authorKeys":["OL26320A"],
"firstPublishYear":1951,
"publishYears":[1951,1952,1955,1962,1974],
"isbn":"9780553293357",
"allIsbns":["9780553293357","9780553382570","..."],
"subjects":["Science fiction","Galactic empire","..."],
"publishers":["Bantam Spectra","Doubleday","..."],
"languages":["eng"],
"pageCount":244,
"editionCount":142,
"coverUrl":"https://covers.openlibrary.org/b/id/9261361-L.jpg",
"openLibraryKey":"/works/OL46828W",
"openLibraryUrl":"https://openlibrary.org/works/OL46828W",
"ebookAccess":"borrowable",
"hasFulltext":true,
"ratingsAverage":4.12,
"ratingsCount":1284,
"wantToRead":8421,
"currentlyReading":412,
"alreadyRead":6203,
"description":"In the waning days of a future Galactic Empire...",
"source":"search:foundation",
"scrapedAt":"2026-04-29T12:00:00.000Z"
}

Field caps in search-mode: allIsbns truncated to first 10, subjects truncated to first 20, publishers truncated to first 5. pageCount is number_of_pages_median (median across editions, not the specific-edition page count).

isbn: mode (10 base fields, +description+subjects if includeDescription=true)

title, isbn, publishers (uncapped), publishDate, pageCount (specific-edition number_of_pages, NOT median), coverUrl, openLibraryKey, openLibraryUrl, source, scrapedAt. With includeDescription=true, adds description and subjects (uncapped). Description fetch is wrapped in a silent try/catch β€” on failure, both description and subjects are simply absent (no error field, no retry).

subject: mode (10 fields)

title, authors (array of names β€” different shape than search-mode's authorKeys), coverUrl, openLibraryKey, openLibraryUrl, editionCount, firstPublishYear, subject, source, scrapedAt. No ratings, no ISBN, no description in this mode β€” that's an OpenLibrary /subjects/ endpoint limitation, not ours. Server hard-caps to 50 records regardless of maxBooksPerSource.

author: mode (7 base fields, +description if includeDescription=true)

title, authors (1-element array with the resolved author name), authorKey (singular β€” different from search-mode's plural authorKeys), openLibraryKey, openLibraryUrl, covers (capped to first 3 cover URLs), source, scrapedAt. With includeDescription=true, adds description IF the author-works payload already contains it (no extra API call β€” purely best-effort). Server hard-caps to 50 works regardless of maxBooksPerSource.

Field-name asymmetry across modes: search-mode emits authorKeys (plural array) + coverUrl (single URL); author-mode emits authorKey (singular string) + covers (array of up to 3); subject-mode and isbn-mode emit neither. If you join across modes, normalize these explicitly.


Operational caveats

  • ⚠️ Outer try/catch wraps the entire 4-mode for-loop (src/main.js lines 57-222). ISBN, subject, and author loops have inner try/catch so individual lookup failures don't halt their batch. BUT search-mode does NOT have inner protection β€” a single search-API failure (e.g. transient HTTP 500, network blip) kills the run mid-stream and skips ALL remaining search queries, ISBN lookups, subject browses, and author lookups. Run problematic queries in isolation if dropout matters.
  • No retry / no proxy. Single fetch() per URL. Heavy bursts may eventually trigger OpenLibrary's polite-use ceiling (~100 req/min unofficial); the actor will surface that as a thrown HTTP error.
  • Description-fetch silent-empty. When includeDescription=true and the work-page fetch fails, description is set to empty string (search-mode) or absent (isbn-mode) β€” no error is logged per book.
  • Subject slug transform is naive. Input "Science Fiction" β†’ slug "science_fiction". Special characters beyond letters/spaces are URL-encoded but not slug-normalized; subjects like "RenΓ© Magritte's books" will likely 404.

What this actor does NOT do

  • No reading-progress / personal-list scraping β€” OpenLibrary doesn't expose individual users' lists.
  • No full-text book content β€” only metadata + descriptions. Read free books at openlibrary.org or via Internet Archive.
  • No price comparison β€” OpenLibrary is metadata-only, not a bookstore.
  • No deduplication across modes β€” if you search "Foundation" and lookup ISBN 9780553293357, you'll get 2 records. Dedupe by openLibraryKey post-run if needed.
  • No incremental crawl / cursor state β€” each run starts fresh from page 1.
  • No author disambiguation β€” first match wins.

When this stops being enough

If you need book full-text β†’ use Internet Archive. If you need real-time bookstore prices β†’ write a separate Amazon/Bookshop scraper. If you need annotated bibliographies β†’ look at Goodreads (no public API since 2020, harder).


Custom builds β€” pilot tiers

This actor runs on Apify's standard compute. If you need a custom variant β€” search-mode-only with retry+backoff, ISBN-bulk with deduplication, subject browse paginated past the 50-cap (via search workaround), author-key direct lookup, hourly cron, Slack alerts on new releases β€” three tiers:

  • Pilot β€” $97 Β· 1 actor, basic config, 7-day support. Good for one-off "top 200 books in subject X" via search + subject hybrid.
  • Standard β€” $297 Β· custom actor + Slack/email alerts on results, 30-day support. Most reading-list / catalog-enrichment projects fit here.
  • Premium β€” $797 Β· custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (weekly new-release feed, ISBN-stream enrichment, author-tracking dashboards).

Email: spinov001@gmail.com β€” drop the input shape and the schema you need; quote within 48h.

Proof of work: 31 published Apify scrapers (78 total in portfolio) β€” Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).

More tips: t.me/scraping_ai Β· blog.spinov.online


Related scrapers

SourceActorData
OpenLibrary (this)Book metadata + ISBN/subject/authorBibliographic
Wikipedia ScraperArticle + sections + referencesEncyclopedic
arXiv Paper ScraperAcademic preprintsResearch
[Google Books style β€” request a custom build via email]β€”β€”

All 31 published actors free to inspect on Apify Store.


Disclaimer

Scrapes the publicly accessible OpenLibrary API endpoints. Respects polite delays (200-500ms between requests). Not affiliated with the Internet Archive or OpenLibrary.

Honest disclosure: search-mode 22 base fields (up to 25 with description + 2 metadata fields), isbn-mode 10 base, subject-mode 10 fields, author-mode 7 base. Subject and author endpoints server-capped at 50 records regardless of maxBooksPerSource. Outer try/catch β€” single search-API failure halts the entire run. Single-attempt fetch, no retry/no proxy. Author-mode uses limit=1 for disambiguation β€” first match wins.

You might also like

Open Library Book Search

gentle_cloud/open-library-book-search

Search and extract book data from Open Library (openlibrary.org) β€” titles, authors, publishers, ISBNs, ratings, reading stats, cover images, and more. Free API, no key required.

Open Library Scraper β€” Book Metadata in Bulk

devilscrapes/openlibrary-books-scraper

Search the Open Library API (the Internet Archive's open book catalogue) and export structured book metadata β€” title, authors, ISBNs, subjects, publish year, cover URL, edition count, OpenLibrary ID β€” to JSON or CSV. We handle pagination and retries across 30M+ works.

πŸ“š Open Library Intelligence - 20M+ Books & Covers

benthepythondev/openlibrary-book-intelligence

Search and extract book data from Open Library's database of 20+ million books. Get titles, authors, publishers, publication dates, ISBNs, covers, subjects, and edition info. Search by title, author, ISBN, or subject. Free alternative to Google Books API.

Open Library ISBN Book Metadata Scraper

jungle_synthesizer/openlibrary-isbn-book-metadata-scraper

Bulk-enrich ISBNs with full Open Library metadata: title, authors, publishers, subjects, ratings, reading-status counts, and cross-reference identifiers (Goodreads, LibraryThing, LCCN, OCLC, Wikidata). Accepts up to thousands of ISBNs in a single run.

πŸ‘ User avatar

BowTiedRaccoon

2