Free eBook Scraper

Pricing

Pay per usage

Free eBook Scraper

Explore and Download Free eBooks - Find and download a wide selection of free eBooks from Project Gutenberg. Search by keywords and language preferences. Discover literary gems in multiple formats.

Pricing

Pay per usage

Rating

5.0

(7)

Developer

👁 epctex

epctex

Maintained by Community

Actor stats

Bookmarked

292

Total users

Monthly active users

19 hours ago

Last modified

Actor - Gutenberg.org Scraper

Gutenberg.org Scraper is an Apify actor for extracting data of ebooks from Gutenberg.org. It allows you to search for keywords and pick a language. It is build on top of Apify SDK and you can run it both on Apify platform and locally.

Gutenberg.org Scraper Input Parameters

The input of this scraper should be JSON containing the list of pages on Gutenberg that should be visited. Possible fields are:

Field	Type	Description
search	String	(optional)
language	Array	(optional) List of languages that Gutenberg provides. You can fetch all ebooks of a language with it
startUrls	Array	(optional) List of Gutenberg URLs. You should provide ony "search" or "browse" URLs
maxItems	Integer	(optional) Maximum number of items that output will contain
extendOutputFunction	string	Function that takes a JQuery handle ($) as argument and returns data that will be merged with the default output. More information in Extend output function
proxyConfig	Object	Proxy configuration

This solution requires the use of Proxy servers, either your own proxy servers or you can use Apify Proxy.

Gutenberg Scraper Input example

{
"proxyConfig":{"useApifyProxy":true},
"startUrls":[
{"url":"https://www.gutenberg.org/browse/recent/last7"},
{"url":"https://www.gutenberg.org/browse/titles/h"}
]
}

Gutenberg Ebook Output

The structure of each item in Gutenberg ebooks looks like this:

{
"author":"United States. National Park Service",
"title":"Cumberland Island: Junior Ranger Program Activity Guide for Ages 5-7",
"language":"English",
"htmlURL":"https://www.gutenberg.org/files/61452/61452-h/61452-h.htm",
"epubURL":"https://www.gutenberg.org/ebooks/61452.epub.images?session_id=24e44a13d40847bb8d8b13a9216689880a3221cf",
"kindleURL":"https://www.gutenberg.org/ebooks/61452.kindle.images?session_id=24e44a13d40847bb8d8b13a9216689880a3221cf",
"plainTextURL":"https://www.gutenberg.org/files/61452/61452-0.txt"
}

Extend output function

You can use this function to update the default output of this actor. This function gets a JQuery handle $ as an argument so you can choose what data from the page you want to scrape. The output from this will function will get merged with the default output.

The return value of this function has to be an object!

You can return fields to achive 3 different things:

Add a new field - Return object with a field that is not in the default output
Change a field - Return an existing field with a new value
Remove a field - Return an existing field with a value undefined

Compute Unit Consumption

The actor optimized to run blazing fast and scrape many product as possible. Therefore, it forefronts all product detail requests. If actor doesn't block very often it'll scrape ~250 products in 3 minutes with 0.0235 compute units.

During the Run

During the run, the actor will output messages letting you know what is going on. Each message always contains a short label specifying which page from the provided list is currently specified. When items are loaded from the page, you should see a message about this event with a loaded item count and total item count for each page.

If you provide incorrect input to the actor, it will immediately stop with failure state and output an explanation of what is wrong.

Gutenberg Export

During the run, the actor stores results into a dataset. Each item is a separate item in the dataset.

You can manage the results in any languague (Python, PHP, Node JS/NPM). See the FAQ or our API reference to learn more about getting results from this Gutenberg actor.

Contact

Please visit us through epctex.com to see all the products that is available for you. If you are looking for any custom integration or so, please reach out to us through the chat box in epctex.com. In need of support? devops@epctex.com is at your service.

Gutenberg Books Scraper

fortuitous_pirate/gutenberg-books-scraper

Scrape book metadata from Project Gutenberg: 70,000+ free public domain ebooks. Search by title, author, topic, or language. Returns authors, subjects, formats, and download links.

👁 User avatar

Fortuitous Pirate

Project Gutenberg Scraper

lulzasaur/gutenberg-scraper

Scrape Project Gutenberg (gutenberg.org). Search 70K+ free public domain ebooks. Extract titles, authors, subjects, download formats (EPUB, Kindle, TXT, HTML), and full metadata.

👁 User avatar

lulz bot

👁 Project Gutenberg Books Scraper | 70K+ Free eBooks avatar

Project Gutenberg Books Scraper | 70K+ Free eBooks

parseforge/gutendex-project-gutenberg-books-scraper

Export 70,000+ public-domain books from Project Gutenberg via the Gutendex API. Search by keyword, language, topic, or author lifespan, or fetch by book ID. Pull titles, authors, subjects, languages, download links, and full-text formats. Download as CSV, Excel, JSON, or XML.

👁 User avatar

ParseForge

👁 Project Gutenberg Research Scraper avatar

Project Gutenberg Research Scraper

happyfhantum/project-gutenberg-research-scraper

Exhaustively searches Project Gutenberg's 70,000+ free ebooks using multi-page pagination and smart filtering. Perfect for academic research, finding complete author works, or discovering books on specialized topics. Gets all results, not just the first page.

👁 User avatar

Kelsey Todd

👁 Project Gutenberg Books Scraper avatar

Project Gutenberg Books Scraper

parseforge/project-gutenberg-books-scraper

Search 75,000+ free public-domain books from Project Gutenberg. Returns title, author with birth/death years, cover image, plain-text and EPUB download URLs, Kindle and HTML formats, subjects, bookshelves, language, copyright status, summaries and download counts. Filter by author or language.

👁 User avatar

ParseForge

👁 Cloudbeds Ebooks Spider avatar

Cloudbeds Ebooks Spider

getdataforme/cloudbeds-ebooks-spider

This Apify Actor automates scraping of Cloudbeds ebooks and reports, extracting titles, subtitles, images, share links, PDF downloads, and content summaries....

👁 User avatar

GetDataForMe

👁 📨 Free Email Domain Scraper avatar

📨 Free Email Domain Scraper

scrapio/free-email-domain-scraper

📨 Free Email Domain Scraper extracts email domains from any website with ease. Fast, accurate, and free—perfect for lead generation, B2B outreach, and market research. 🚀 Try free today!

👁 User avatar

Scrapio

👁 📨 Free Email Domain Scraper avatar

📨 Free Email Domain Scraper

simpleapi/free-email-domain-scraper

📧 Free Email Domain Scraper finds and extracts email addresses from any website—fast and easy! 🌐 Great for B2B lead gen, outreach, and research. 💼 Download free-email-domain-scraper and start building targeted lists today!

👁 User avatar

SimpleAPI

👁 Project Gutenberg Books Scraper avatar

Project Gutenberg Books Scraper

gio21/gutenberg-books-scraper

Scrape public-domain books from Project Gutenberg via the Gutendex API. Filter by topic, author, language, search query. Returns title, authors, languages, copyright, download_count, formats (EPUB, MOBI, TXT, HTML), subjects, bookshelves. Pay per book returned.

👁 User avatar

Gio

Free Google Hotels Scraper — Search + Prices

s-r/free-google-hotels-scraper

👁 User avatar

👁 Blog article image

Top 8 free or low-cost proxies for web scraping

URL: https://apify.com/epctex/gutenberg-scraper

⇱ Free eBook Data Extractor · Apify

Free eBook Scraper

Actor - Gutenberg.org Scraper

Gutenberg.org Scraper Input Parameters

Gutenberg Scraper Input example

Gutenberg Ebook Output

Extend output function

Compute Unit Consumption

During the Run

Gutenberg Export

Contact

You might also like

Gutenberg Books Scraper

Project Gutenberg Scraper

Project Gutenberg Books Scraper | 70K+ Free eBooks

Project Gutenberg Research Scraper

Project Gutenberg Books Scraper

Cloudbeds Ebooks Spider

📨 Free Email Domain Scraper

📨 Free Email Domain Scraper

Project Gutenberg Books Scraper

Free Google Hotels Scraper — Search + Prices

Related articles