VOOZH about

URL: https://apify.com/parseforge/project-gutenberg-books-scraper

โ‡ฑ Project Gutenberg Scraper (75,000+ free books) ยท Apify


Pricing

from $13.00 / 1,000 result items

Go to Apify Store

Project Gutenberg Books Scraper

Search 75,000+ free public-domain books from Project Gutenberg. Returns title, author with birth/death years, cover image, plain-text and EPUB download URLs, Kindle and HTML formats, subjects, bookshelves, language, copyright status, summaries and download counts. Filter by author or language.

Pricing

from $13.00 / 1,000 result items

Rating

0.0

(0)

Developer

๐Ÿ‘ ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

2

Monthly active users

a month ago

Last modified

Share

๐Ÿ‘ ParseForge Banner

๐Ÿ“š Project Gutenberg Books Scraper

๐Ÿš€ Search 75,000+ free public-domain books from Project Gutenberg.

๐Ÿ•’ Last updated: 2026-05-06 ยท ๐Ÿ“Š 28 fields per record ยท 75,000+ books ยท public-domain catalog ยท plain-text, EPUB, Kindle, HTML, PDF download URLs

The Project Gutenberg Books Scraper searches the Project Gutenberg catalog and returns structured records for any free public-domain ebook. Output includes title, author with birth/death years, cover image, plain-text and EPUB download URLs, Kindle and HTML formats, subjects, bookshelves, language, copyright status, summaries, and download counts.

Project Gutenberg has been digitizing public-domain texts since 1971 and now hosts 75,000+ books across 60+ languages. Filters run server-side, so a single run can isolate every Shakespeare play, all 19th-century French novels, or the most-downloaded books of all time.

๐ŸŽฏ Target Audience๐Ÿ’ก Primary Use Cases
Researchers, NLP/ML teams, librarians, educators, content creators, ebook app developersBuilding text corpora, NLP training datasets, public-domain ebook libraries, literary research, citation generation

๐Ÿ“‹ What the Project Gutenberg Books Scraper does

Five filtering workflows in a single run:

  • ๐Ÿ” Free-text search. Match by title, author, or general keywords.
  • ๐Ÿ‘ค Author filter. Restrict to one author across all their works.
  • ๐Ÿท๏ธ Topic filter. Filter by subject (history, philosophy, science, fiction).
  • ๐ŸŒ Language filter. ISO 639 language codes (en, fr, de, es, zh, ja).
  • ๐Ÿ“… Author year filter. Filter authors by birth/death year for period studies.

๐Ÿ’ก Why it matters: clean, server-side filtering removes the parser-and-pagination work from your team and keeps your dataset fresh on every run.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


โš™๏ธ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan up to 1,000,000.
querystring"shakespeare"Free-text keyword search.
languagestring""ISO 639 language code.
topicstring""Subject filter.
authorYearStartintegernullAuthor born after this year.
authorYearEndintegernullAuthor died before this year.
copyrightStatusstring""`true`=copyrighted, `false`=public domain, empty=any.

Example: every Shakespeare work.

{
"maxItems":100,
"query":"shakespeare"
}

Example: 19th-century French novels.

{
"maxItems":200,
"language":"fr",
"authorYearStart":1800,
"authorYearEnd":1900
}

๐Ÿ“Š Output

Each record contains 28 fields. Download the dataset as CSV, Excel, JSON, or XML.

๐Ÿงพ Schema

FieldTypeExample
๐Ÿ–ผ๏ธ coverUrlstringnull
๐Ÿ†” gutenbergIdstring"100"
๐Ÿ“› titlestring"The Complete Works of William Shakespeare"
๐Ÿ‘ค authorsTextstring"Shakespeare, William"
๐Ÿ‘ค authorsarray[ { name, birthYear, deathYear } ]
๐Ÿท๏ธ subjectsarray["Drama","English drama"]
๐Ÿ“ bookshelvesarray["Plays"]
๐ŸŒ languagesarray["en"]
๐Ÿ“‹ copyrightbooleanfalse
๐Ÿ“ฅ downloadCountnumber45230
๐Ÿ“„ plainTextUrlstring"https://www.gutenberg.org/files/100/100-0.txt"
๐Ÿ“• epubUrlstring"https://www.gutenberg.org/ebooks/100.epub3.images"
๐Ÿ“– kindleUrlstring"https://www.gutenberg.org/ebooks/100.kf8.images"
๐ŸŒ htmlUrlstring"https://www.gutenberg.org/files/100/100-h/100-h.htm"
๐Ÿ”— gutenbergUrlstring"https://www.gutenberg.org/ebooks/100"

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

Capability
๐Ÿ“š75,000+ books. Every public-domain text Project Gutenberg has digitized since 1971.
๐ŸŒ60+ languages. English dominates, but you can find French, German, Spanish, Chinese, and more.
๐Ÿ“„Multi-format URLs. Plain-text, EPUB, Kindle, HTML, and PDF when available.
๐Ÿ“ฅDownload counts. Filter and rank by reader popularity.
โš–๏ธPublic domain. Use commercially without restrictions in most jurisdictions.

๐Ÿ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
โญ This Actor$5 free credit75,000+ booksLive per runquery, author, lang, topic, yearโšก 2 min
Manual Gutenberg browsingFreeManualLiveWeb filters only๐Ÿ•’ Manual
Standard EbooksFreeCurated subsetSlowLimited๐Ÿข Account
Internet Archive TextsFreeMassiveVariableBulk only๐Ÿข ETL

Pick this Actor when you want broad coverage, server-side filtering, and no pipeline maintenance.


๐Ÿš€ How to use

  1. ๐Ÿ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. ๐ŸŒ Open the Actor. Go to the Project Gutenberg Books Scraper page on the Apify Store.
  3. ๐ŸŽฏ Set input. Pick your filters and maxItems.
  4. ๐Ÿš€ Run it. Click Start and let the Actor collect your data.
  5. ๐Ÿ“ฅ Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

โฑ๏ธ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


๐Ÿ’ผ Business use cases

๐Ÿค– NLP & ML

  • Build training corpora for language models
  • Authorship-attribution datasets
  • Style-transfer corpora
  • Multilingual training data

๐Ÿ“š Libraries & Education

  • Build classroom ebook collections
  • Curriculum-aligned reading lists
  • Free supplementary materials for K-12
  • Library catalog enrichment

๐Ÿ“ฐ Content & Publishing

  • Republish public-domain works
  • Generate audiobook scripts
  • Create curated newsletters
  • Build literary discovery apps

๐Ÿ”ฌ Research & Academia

  • Citation generation
  • Distant-reading studies
  • Genre evolution analysis
  • Translation corpora

๐Ÿ”Œ Automating Project Gutenberg Books Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • ๐ŸŸข Node.js. Install the apify-client NPM package.
  • ๐Ÿ Python. Use the apify-client PyPI package.
  • ๐Ÿ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Hourly, daily, or weekly refreshes keep downstream databases in sync automatically.


๐ŸŒŸ Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

๐ŸŽ“ Research and academia

  • Reproducible literary corpora
  • Versioned text snapshots
  • Computational linguistics studies
  • Course material with primary sources

๐ŸŽจ Personal and creative

  • Personal ebook collections
  • Indie reading-app side projects
  • Newsletter on classic literature
  • Hobbyist literary databases

๐Ÿค Non-profit and civic

  • Library digitization projects
  • Reading-list contributions
  • Cultural-preservation outreach
  • Multilingual literacy programs

๐Ÿงช Experimentation

  • Train tokenizers on diverse text
  • Test text-mining pipelines
  • Prototype text-recommendation engines
  • Build literary-analysis dashboards

๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


โ“ Frequently Asked Questions

๐Ÿงฉ How does it work?

Provide a query, author, language, or topic filter. The Actor queries the Project Gutenberg catalog and emits one record per book.

๐Ÿ“ฅ Can I download the actual book contents?

The Actor returns metadata and direct download URLs for plain-text, EPUB, Kindle, HTML, and PDF formats. Use those URLs to fetch the actual contents.

โš–๏ธ Is everything truly public domain?

Yes for the vast majority. The copyright field flags the rare exceptions still under copyright in some jurisdictions.

๐Ÿ“Š How many fields per record?

28, including title, authors with birth/death years, cover, all download URLs, subjects, bookshelves, language, and download counts.

๐Ÿ” Can I schedule runs?

Yes. New books and translations are added regularly. Schedule weekly to capture additions.

๐ŸŒ Which languages are supported?

60+, with strongest coverage in English, French, German, Spanish, Italian, Dutch, Portuguese, and Chinese.

๐Ÿ‘ค Does it include author biographies?

No, but it returns author birth/death years for period research.

๐Ÿ’ณ Do I need a paid Apify plan?

No. The free plan covers preview runs. A paid plan unlocks higher item counts and scheduling.

๐Ÿ†˜ What if a run fails?

Apify retries transient errors. Partial datasets are preserved.

๐ŸŽ™๏ธ Can I generate audiobooks from this?

Yes. Pull plain-text URLs and pipe through any text-to-speech engine.


๐Ÿ”Œ Integrate with any app

Project Gutenberg Books Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes.


๐Ÿ”— Recommended Actors

๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


๐Ÿ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


โš ๏ธ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Project Gutenberg, the Gutendex project, or any contributing volunteers. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.

You might also like

Project Gutenberg Books Scraper

gio21/gutenberg-books-scraper

Scrape public-domain books from Project Gutenberg via the Gutendex API. Filter by topic, author, language, search query. Returns title, authors, languages, copyright, download_count, formats (EPUB, MOBI, TXT, HTML), subjects, bookshelves. Pay per book returned.

Project Gutenberg Books Scraper | 70K+ Free eBooks

parseforge/gutendex-project-gutenberg-books-scraper

Export 70,000+ public-domain books from Project Gutenberg via the Gutendex API. Search by keyword, language, topic, or author lifespan, or fetch by book ID. Pull titles, authors, subjects, languages, download links, and full-text formats. Download as CSV, Excel, JSON, or XML.

Project Gutenberg Research Scraper

happyfhantum/project-gutenberg-research-scraper

Exhaustively searches Project Gutenberg's 70,000+ free ebooks using multi-page pagination and smart filtering. Perfect for academic research, finding complete author works, or discovering books on specialized topics. Gets all results, not just the first page.

Free eBook Scraper

epctex/gutenberg-scraper

Explore and Download Free eBooks - Find and download a wide selection of free eBooks from Project Gutenberg. Search by keywords and language preferences. Discover literary gems in multiple formats.

Books Scraper (Google Books + Open Library)

dami_studio/books-scraper

Searches Google Books and Open Library (no API key) and returns normalized book records: title, authors, publisher, year, ISBN-13, page count, categories, rating, language, cover image, URL, and price (Google Books). Best for building reading lists a

2

5.0

Open Library Books Scraper

gio21/openlibrary-books-scraper

Search and scrape books on Open Library by title, author, subject, or ISBN. Returns title, authors, first publish year, edition count, ISBNs, cover image, language, ebook access status. Pay per book returned.

Amazon Book Scraper โ€” Books Data & Metadata Extractor

scrapepilot/amazon-book-scraper----books-data-metadata-extractor

Scrape Amazon books data from any keyword, URL, or ASIN list. Get full book metadata โ€” title, author, rating, reviews, price, publisher, pages, language, and cover image. Supports 7 Amazon marketplaces. No login. $8.99/month. 2-hour free trial.

18