👁 Project Gutenberg Books Scraper avatar

Project Gutenberg Books Scraper

Pricing

from $13.00 / 1,000 result items

Project Gutenberg Books Scraper

Search 75,000+ free public-domain books from Project Gutenberg. Returns title, author with birth/death years, cover image, plain-text and EPUB download URLs, Kindle and HTML formats, subjects, bookshelves, language, copyright status, summaries and download counts. Filter by author or language.

Pricing

from $13.00 / 1,000 result items

Rating

0.0

(0)

Developer

👁 ParseForge

ParseForge

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

📚 Project Gutenberg Books Scraper

🚀 Search 75,000+ free public-domain books from Project Gutenberg.

🕒 Last updated: 2026-05-06 · 📊 28 fields per record · 75,000+ books · public-domain catalog · plain-text, EPUB, Kindle, HTML, PDF download URLs

The Project Gutenberg Books Scraper searches the Project Gutenberg catalog and returns structured records for any free public-domain ebook. Output includes title, author with birth/death years, cover image, plain-text and EPUB download URLs, Kindle and HTML formats, subjects, bookshelves, language, copyright status, summaries, and download counts.

Project Gutenberg has been digitizing public-domain texts since 1971 and now hosts 75,000+ books across 60+ languages. Filters run server-side, so a single run can isolate every Shakespeare play, all 19th-century French novels, or the most-downloaded books of all time.

🎯 Target Audience	💡 Primary Use Cases
Researchers, NLP/ML teams, librarians, educators, content creators, ebook app developers	Building text corpora, NLP training datasets, public-domain ebook libraries, literary research, citation generation

📋 What the Project Gutenberg Books Scraper does

Five filtering workflows in a single run:

🔍 Free-text search. Match by title, author, or general keywords.
👤 Author filter. Restrict to one author across all their works.
🏷️ Topic filter. Filter by subject (history, philosophy, science, fiction).
🌐 Language filter. ISO 639 language codes (en, fr, de, es, zh, ja).
📅 Author year filter. Filter authors by birth/death year for period studies.

💡 Why it matters: clean, server-side filtering removes the parser-and-pagination work from your team and keeps your dataset fresh on every run.

🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.

⚙️ Input

Input	Type	Default	Behavior
maxItems	integer	10	Records to return. Free plan caps at 10, paid plan up to 1,000,000.
query	string	"shakespeare"	Free-text keyword search.
language	string	""	ISO 639 language code.
topic	string	""	Subject filter.
authorYearStart	integer	null	Author born after this year.
authorYearEnd	integer	null	Author died before this year.
copyrightStatus	string	""	`true`=copyrighted, `false`=public domain, empty=any.

Example: every Shakespeare work.

{
"maxItems":100,
"query":"shakespeare"
}

Example: 19th-century French novels.

{
"maxItems":200,
"language":"fr",
"authorYearStart":1800,
"authorYearEnd":1900
}

📊 Output

Each record contains 28 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

Field	Type	Example
🖼️ `coverUrl`	string	null
🆔 `gutenbergId`	string	`"100"`
📛 `title`	string	`"The Complete Works of William Shakespeare"`
👤 `authorsText`	string	`"Shakespeare, William"`
👤 `authors`	array	`[ { name, birthYear, deathYear } ]`
🏷️ `subjects`	array	`["Drama","English drama"]`
📁 `bookshelves`	array	`["Plays"]`
🌐 `languages`	array	`["en"]`
📋 `copyright`	boolean	`false`
📥 `downloadCount`	number	`45230`
📄 `plainTextUrl`	string	`"https://www.gutenberg.org/files/100/100-0.txt"`
📕 `epubUrl`	string	`"https://www.gutenberg.org/ebooks/100.epub3.images"`
📖 `kindleUrl`	string	`"https://www.gutenberg.org/ebooks/100.kf8.images"`
🌐 `htmlUrl`	string	`"https://www.gutenberg.org/files/100/100-h/100-h.htm"`
🔗 `gutenbergUrl`	string	`"https://www.gutenberg.org/ebooks/100"`

📦 Sample records

✨ Why choose this Actor

	Capability
📚	75,000+ books. Every public-domain text Project Gutenberg has digitized since 1971.
🌐	60+ languages. English dominates, but you can find French, German, Spanish, Chinese, and more.
📄	Multi-format URLs. Plain-text, EPUB, Kindle, HTML, and PDF when available.
📥	Download counts. Filter and rank by reader popularity.
⚖️	Public domain. Use commercially without restrictions in most jurisdictions.

📈 How it compares to alternatives

Approach	Cost	Coverage	Refresh	Filters	Setup
⭐ This Actor	$5 free credit	75,000+ books	Live per run	query, author, lang, topic, year	⚡ 2 min
Manual Gutenberg browsing	Free	Manual	Live	Web filters only	🕒 Manual
Standard Ebooks	Free	Curated subset	Slow	Limited	🐢 Account
Internet Archive Texts	Free	Massive	Variable	Bulk only	🐢 ETL

Pick this Actor when you want broad coverage, server-side filtering, and no pipeline maintenance.

🚀 How to use

📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
🌐 Open the Actor. Go to the Project Gutenberg Books Scraper page on the Apify Store.
🎯 Set input. Pick your filters and maxItems.
🚀 Run it. Click Start and let the Actor collect your data.
📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.

💼 Business use cases

🤖 NLP & ML

Build training corpora for language models
Authorship-attribution datasets
Style-transfer corpora
Multilingual training data

📚 Libraries & Education

Build classroom ebook collections
Curriculum-aligned reading lists
Free supplementary materials for K-12
Library catalog enrichment

📰 Content & Publishing

Republish public-domain works
Generate audiobook scripts
Create curated newsletters
Build literary discovery apps

🔬 Research & Academia

Citation generation
Distant-reading studies
Genre evolution analysis
Translation corpora

🔌 Automating Project Gutenberg Books Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

🟢 Node.js. Install the apify-client NPM package.
🐍 Python. Use the apify-client PyPI package.
📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Hourly, daily, or weekly refreshes keep downstream databases in sync automatically.

🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

Reproducible literary corpora
Versioned text snapshots
Computational linguistics studies
Course material with primary sources

🎨 Personal and creative

Personal ebook collections
Indie reading-app side projects
Newsletter on classic literature
Hobbyist literary databases

🤝 Non-profit and civic

Library digitization projects
Reading-list contributions
Cultural-preservation outreach
Multilingual literacy programs

🧪 Experimentation

Train tokenizers on diverse text
Test text-mining pipelines
Prototype text-recommendation engines
Build literary-analysis dashboards

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

❓ Frequently Asked Questions

🧩 How does it work?

Provide a query, author, language, or topic filter. The Actor queries the Project Gutenberg catalog and emits one record per book.

📥 Can I download the actual book contents?

The Actor returns metadata and direct download URLs for plain-text, EPUB, Kindle, HTML, and PDF formats. Use those URLs to fetch the actual contents.

⚖️ Is everything truly public domain?

Yes for the vast majority. The copyright field flags the rare exceptions still under copyright in some jurisdictions.

📊 How many fields per record?

28, including title, authors with birth/death years, cover, all download URLs, subjects, bookshelves, language, and download counts.

🔁 Can I schedule runs?

Yes. New books and translations are added regularly. Schedule weekly to capture additions.

🌐 Which languages are supported?

60+, with strongest coverage in English, French, German, Spanish, Italian, Dutch, Portuguese, and Chinese.

👤 Does it include author biographies?

No, but it returns author birth/death years for period research.

💳 Do I need a paid Apify plan?

No. The free plan covers preview runs. A paid plan unlocks higher item counts and scheduling.

🆘 What if a run fails?

Apify retries transient errors. Partial datasets are preserved.

🎙️ Can I generate audiobooks from this?

Yes. Pull plain-text URLs and pipe through any text-to-speech engine.

🔌 Integrate with any app

Project Gutenberg Books Scraper connects to any cloud service via Apify integrations:

Make - Automate multi-step workflows
Zapier - Connect with 5,000+ apps
Slack - Get run notifications in your channels
Airbyte - Pipe data into your warehouse
GitHub - Trigger runs from commits and releases
Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes.

🔗 Recommended Actors

📖 Open Library Books - 30M+ books and editions
🌐 Wikidata Entity Search - 100M+ open knowledge-graph entities
🎨 Openverse Media - 800M+ openly licensed images and audio
🎓 arXiv Scraper - Academic preprints
🎬 TVMaze TV Shows - TV show metadata

💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.

🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.

⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Project Gutenberg, the Gutendex project, or any contributing volunteers. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.

👁 Project Gutenberg Books Scraper avatar

Project Gutenberg Books Scraper

gio21/gutenberg-books-scraper

Scrape public-domain books from Project Gutenberg via the Gutendex API. Filter by topic, author, language, search query. Returns title, authors, languages, copyright, download_count, formats (EPUB, MOBI, TXT, HTML), subjects, bookshelves. Pay per book returned.

👁 User avatar

Gio

Gutenberg Books Scraper

fortuitous_pirate/gutenberg-books-scraper

Scrape book metadata from Project Gutenberg: 70,000+ free public domain ebooks. Search by title, author, topic, or language. Returns authors, subjects, formats, and download links.

👁 User avatar

Fortuitous Pirate

Project Gutenberg Scraper

lulzasaur/gutenberg-scraper

Scrape Project Gutenberg (gutenberg.org). Search 70K+ free public domain ebooks. Extract titles, authors, subjects, download formats (EPUB, Kindle, TXT, HTML), and full metadata.

👁 User avatar

lulz bot

👁 Project Gutenberg Books Scraper | 70K+ Free eBooks avatar

Project Gutenberg Books Scraper | 70K+ Free eBooks

parseforge/gutendex-project-gutenberg-books-scraper

Export 70,000+ public-domain books from Project Gutenberg via the Gutendex API. Search by keyword, language, topic, or author lifespan, or fetch by book ID. Pull titles, authors, subjects, languages, download links, and full-text formats. Download as CSV, Excel, JSON, or XML.

👁 User avatar

ParseForge

👁 Project Gutenberg Research Scraper avatar

Project Gutenberg Research Scraper

happyfhantum/project-gutenberg-research-scraper

Exhaustively searches Project Gutenberg's 70,000+ free ebooks using multi-page pagination and smart filtering. Perfect for academic research, finding complete author works, or discovering books on specialized topics. Gets all results, not just the first page.

👁 User avatar

Kelsey Todd

👁 Free eBook Scraper avatar

Free eBook Scraper

epctex/gutenberg-scraper

Explore and Download Free eBooks - Find and download a wide selection of free eBooks from Project Gutenberg. Search by keywords and language preferences. Discover literary gems in multiple formats.

👁 User avatar

epctex

291

5.0

OpenLibrary Book Search - Books & Authors

vernacular_reservoir/openlibrary-book-search

Search millions of books from OpenLibrary. Find books by title, author, subject or ISBN. Extract title, authors, publish year, ratings, subjects, publishers, cover image and description. No API key required.

👁 User avatar

Aleksandrs

👁 Books Scraper (Google Books + Open Library) avatar

Books Scraper (Google Books + Open Library)

dami_studio/books-scraper

Searches Google Books and Open Library (no API key) and returns normalized book records: title, authors, publisher, year, ISBN-13, page count, categories, rating, language, cover image, URL, and price (Google Books). Best for building reading lists a

👁 User avatar

Dami's Studio

5.0

👁 Open Library Books Scraper avatar

Open Library Books Scraper

gio21/openlibrary-books-scraper

Search and scrape books on Open Library by title, author, subject, or ISBN. Returns title, authors, first publish year, edition count, ISBNs, cover image, language, ebook access status. Pay per book returned.

👁 User avatar

Gio

👁 Amazon Book Scraper — Books Data & Metadata Extractor avatar

Amazon Book Scraper — Books Data & Metadata Extractor

scrapepilot/amazon-book-scraper----books-data-metadata-extractor

Scrape Amazon books data from any keyword, URL, or ASIN list. Get full book metadata — title, author, rating, reviews, price, publisher, pages, language, and cover image. Supports 7 Amazon marketplaces. No login. $8.99/month. 2-hour free trial.

👁 User avatar

Scrape Pilot

URL: https://apify.com/parseforge/project-gutenberg-books-scraper

⇱ Project Gutenberg Scraper (75,000+ free books) · Apify

Project Gutenberg Books Scraper

📚 Project Gutenberg Books Scraper

📋 What the Project Gutenberg Books Scraper does

🎬 Full Demo

⚙️ Input

📊 Output

🧾 Schema

📦 Sample records

✨ Why choose this Actor

📈 How it compares to alternatives

🚀 How to use

💼 Business use cases

🤖 NLP & ML

📚 Libraries & Education

📰 Content & Publishing

🔬 Research & Academia

🔌 Automating Project Gutenberg Books Scraper

🌟 Beyond business use cases

🎓 Research and academia

🎨 Personal and creative

🤝 Non-profit and civic

🧪 Experimentation

🤖 Ask an AI assistant about this scraper

❓ Frequently Asked Questions

🧩 How does it work?

📥 Can I download the actual book contents?

⚖️ Is everything truly public domain?

📊 How many fields per record?

🔁 Can I schedule runs?

🌐 Which languages are supported?

👤 Does it include author biographies?

💳 Do I need a paid Apify plan?

🆘 What if a run fails?

🎙️ Can I generate audiobooks from this?

🔌 Integrate with any app

🔗 Recommended Actors

You might also like

Project Gutenberg Books Scraper

Gutenberg Books Scraper

Project Gutenberg Scraper

Project Gutenberg Books Scraper | 70K+ Free eBooks

Project Gutenberg Research Scraper

Free eBook Scraper

OpenLibrary Book Search - Books & Authors

Books Scraper (Google Books + Open Library)

Open Library Books Scraper

Amazon Book Scraper — Books Data & Metadata Extractor