LibriVox Audiobooks Scraper

Pricing

from $10.00 / 1,000 result items

LibriVox Audiobooks Scraper

Pull free public domain audiobooks from LibriVox: title, author, narrator, language, runtime, chapter count, genre, copyright year, description, RSS feed, and MP3 download URLs. Export to JSON, CSV, or Excel for educators, podcasters, language learners, and audio content libraries.

Pricing

from $10.00 / 1,000 result items

Rating

0.0

(0)

Developer

👁 ParseForge

ParseForge

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

📚 LibriVox Audiobooks Scraper

🚀 Export the world's largest public-domain audiobook library in seconds. Browse 20,000+ free audiobooks from LibriVox, filter by title, author, language, or genre, and pull every section's reader credit, runtime, and direct audio URL. No login, no manual catalog scrape.

🕒 Last updated: 2026-05-23 · 📊 23 fields per record · 📚 20,000+ audiobooks · 🎤 100k+ volunteer readings · 🌍 multilingual catalog

The LibriVox Audiobooks Scraper queries the LibriVox catalog and returns 23 structured fields per audiobook, including the title, author list, primary author, language, copyright year, runtime in human-readable and seconds form, description, genres, translators, plus direct links to the LibriVox page, RSS podcast feed, ZIP download, Internet Archive page, and original Project Gutenberg text source. Optional extended mode adds the full sections list with per-section reader credits, individual playtimes, and per-file audio URLs.

The catalog includes classic literature read aloud (Project Gutenberg titles), original LibriVox productions, multilingual works, and short-story collections. Most readings are in English but the project covers dozens of other languages including French, German, Spanish, Italian, Dutch, Portuguese, Latin, Japanese, and Mandarin. This Actor turns the catalog into clean CSV, Excel, JSON, or XML in under five minutes.

🎯 Target Audience	💡 Primary Use Cases
Audiobook app developers, podcast networks, education and EdTech teams, accessibility specialists, public libraries, audio content curators	Stock audiobook apps with free content, classroom listening assignments, accessibility libraries for visually impaired users, podcast feed generation, language-learning audio decks

📋 What the LibriVox Audiobooks Scraper does

Five filtering workflows in a single run:

🔎 Title substring search. Case-insensitive title match like pride, monte cristo.
✍️ Author substring search. Match by author last name or full name.
🌐 Language filter. Filter to a specific LibriVox language (English, French, German, Spanish, Japanese, etc.).
🎭 Genre filter. Pick one of 28 LibriVox genres (Romance, Crime & Mystery, Philosophy, Poetry, and more).
📑 Extended mode toggle. When enabled, each record carries the full sections list with reader credits, runtimes, and audio URLs.

Each record includes the LibriVox ID, title, full author list, primary author, language, copyright year, section count, total runtime (human-readable and seconds), full description (plain text and HTML), genres, translators when applicable, and the canonical LibriVox page, RSS podcast feed, ZIP archive URL, Project Gutenberg source link, Internet Archive page, and any other reference URLs.

💡 Why it matters: LibriVox is the canonical free audiobook archive, but its catalog page is paginated and its section structure is nested under each book. Building your own crawler means walking thousands of pages and threading section lookups. This Actor returns everything as flat structured rows ready for a database or content platform.

🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded audiobook catalog.

⚙️ Input

Input	Type	Default	Behavior
maxItems	integer	10	Audiobooks to return. Free plan caps at 10, paid plan at 1,000,000.
title	string	""	Case-insensitive title substring.
author	string	""	Author last-name substring.
language	string	""	Language name as used by LibriVox.
genre	string	""	One of 28 LibriVox genres.
extended	boolean	true	When true, include sections list with reader credits and audio URLs.

Example: 50 Jane Austen audiobooks in English.

{
"maxItems":50,
"author":"austen",
"language":"English",
"extended":true
}

Example: 20 French-language poetry audiobooks with full section list.

{
"maxItems":20,
"language":"French",
"genre":"Poetry",
"extended":true
}

⚠️ Good to Know: LibriVox sections are individual chapters or tracks read by volunteer narrators. When extended is true the per-section reader credit, length, and audio file URL are exposed for every track in the book. Expect 10-100 sections per long novel.

📊 Output

Each audiobook record contains 23 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

Field	Type	Example
🆔 `librivoxId`	string	`"100"`
📚 `title`	string	`"Pride and Prejudice"`
✍️ `authors`	array	`[{"first_name":"Jane","last_name":"Austen"}]`
👤 `primaryAuthor`	string	`"Jane Austen"`
🌐 `language`	string	`"English"`
📅 `copyrightYear`	string	`"1813"`
🔢 `numSections`	number	`61`
⏱️ `totalTime`	string	`"11:35:46"`
⏲️ `totalTimeSeconds`	number	`41746`
📝 `description`	string	`"Pride and Prejudice is the second novel by Jane Austen..."`
📄 `descriptionHtml`	string	`"<p>Pride and Prejudice is the second novel..."`
🎭 `genres`	array	`["General Fiction", "Romance"]`
🌍 `translators`	array	`[]`
🔗 `urlLibrivox`	string	`"https://librivox.org/pride-and-prejudice-by-jane-austen/"`
📻 `urlRss`	string	`"https://librivox.org/rss/100"`
📦 `urlZipFile`	string	`"https://www.archive.org/.../pride_and_prejudice_64kb_mp3.zip"`
🌐 `urlProject`	string \| null	`"https://www.gutenberg.org/ebooks/1342"`
🔗 `urlOther`	string \| null	`null`
📚 `urlInternetArchive`	string	`"https://archive.org/details/pride_and_prejudice_0809_librivox"`
📖 `urlTextSource`	string \| null	`"https://www.gutenberg.org/files/1342/1342-h/1342-h.htm"`
🔢 `sectionsCount`	number	`61`
🎤 `sections`	array	`[{"chapter":"Chapter 1","reader":"Karen Savage","playtime":"00:14:23","audioUrl":"..."}]`
🕒 `scrapedAt`	ISO 8601	`"2026-05-23T00:00:00.000Z"`

📦 Sample records

✨ Why choose this Actor

	Capability
📚	20,000+ audiobook catalog. Every public-domain title LibriVox has produced.
🎯	Multi-dimensional filters. Title, author, language, and genre combine in a single run.
🎤	Per-section reader credits. Extended mode exposes chapter-level narrator, runtime, and audio URL.
📻	RSS podcast feeds included. Drop straight into a podcast player.
⚡	Fast. 10 audiobooks in under 5 seconds, 1,000 in under 5 minutes.
🔁	Always fresh. Live catalog reads on every run.
🚫	No authentication. Public archive, no key required.

📊 LibriVox is the canonical free audiobook library and a foundation for any audio content product targeting public-domain works.

📈 How it compares to alternatives

Approach	Cost	Coverage	Refresh	Filters	Setup
⭐ LibriVox Audiobooks Scraper (this Actor)	$5 free credit, then pay-per-use	20,000+ audiobooks	Live per run	title, author, language, genre	⚡ 2 min
Commercial audiobook libraries	$14.95+/month	Curated paid catalog	Daily	Limited	🐢 Days
Custom site scraper	Free engineering	Full	Cron driven	Hand built	⏳ Weeks
Per-book browsing	Free	One book at a time	Manual	UI only	🕒 Painful

Pick this Actor when you want a clean, filterable feed of the entire LibriVox catalog with zero parser maintenance.

🚀 How to use

📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
🌐 Open the Actor. Go to the LibriVox Audiobooks Scraper page on the Apify Store.
🎯 Set input. Add optional title, author, language, or genre filters, choose extended mode if you want per-section data.
🚀 Run it. Click Start and let the Actor collect catalog records.
📥 Download. Grab your results from the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.

💼 Business use cases

🎧 Audiobook Apps

Stock a freemium audiobook app with public-domain titles
Build a kids' audiobook section from children's fiction
Generate themed audio libraries (romance, mystery, classics)
Add multilingual content packs to existing apps

🎙️ Podcast Networks

Spin up "classic literature" podcast feeds from RSS URLs
Curate themed reading series for syndication
Source content for an audio newsletter or daily-listen app
Build chaptered podcast versions of long novels

🎓 Education & Accessibility

Stock classroom listening assignments for literature classes
Build accessibility libraries for visually impaired users
Augment ESL programs with audio for graded readers
Provide reading-along audio for early literacy programs

📚 Library Apps & Catalogs

Add a free audiobook collection to a digital library catalog
Source ISBN-less audiobook records for cataloging projects
Build a "listen-along" companion to a Gutenberg etext app
Curate themed reading lists with audio companions

🔌 Automating LibriVox Audiobooks Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

🟢 Node.js. Install the apify-client NPM package.
🐍 Python. Use the apify-client PyPI package.
📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep a downstream audiobook catalog topped up with the latest LibriVox publications.

🌟 Beyond business use cases

Public-domain audiobooks power more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

Reception studies of classic literature
Audiobook narration and voice-acting research
Reproducible corpora citing exact dataset pulls
Cross-language comparative literature with audio

🎨 Personal and creative

Curated bedtime story collections for parents
Mood-based reading lists for hobbyist apps
Visualization dashboards of reader hours by language
Themed playlists for road trips or long walks

🤝 Non-profit and civic

Free audio libraries for under-resourced schools
Audio access for visually impaired community members
Senior-living center listening programs
Language-revitalization audio packs for minority tongues

🧪 Experimentation

Train automatic speech recognition on volunteer narration
Build alignment datasets pairing audio with Gutenberg text
Prototype voice-cloning research with diverse readers
Test podcast-publishing pipelines with real RSS feeds

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

❓ Frequently Asked Questions

🧩 How does it work?

Pick optional title, author, language, or genre filters and choose whether to include per-section detail. Click Start and the Actor returns clean rows with audio links, RSS feeds, ZIP archives, and reader credits.

📏 How complete is the metadata?

LibriVox metadata is curated by volunteer catalogers. Most fields are populated, with the occasional gap for very old additions. The description, genres, and section reader credits are typically complete.

🔁 How often is the catalog refreshed?

LibriVox publishes new audiobooks weekly. Every Actor run hits the live catalog, so new releases appear in your dataset right away.

🌐 Which languages are supported?

Most audiobooks are in English, but LibriVox covers dozens of other languages including French, German, Spanish, Italian, Dutch, Portuguese, Latin, Japanese, Mandarin, and more. Use the language input to filter.

🎤 Do I get per-chapter audio URLs?

Yes, when extended is true. Each section carries the chapter name, reader credit, playtime, and a direct MP3 URL hosted by the Internet Archive.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to trigger this Actor on any cron interval (weekly is recommended for new releases).

⚖️ Is this content legal to use?

Yes. LibriVox audiobooks are public domain in the United States. Source texts are also typically Project Gutenberg public-domain works. Always verify the copyright status in your jurisdiction.

💼 Can I use these audiobooks commercially?

Yes. LibriVox recordings are dedicated to the public domain worldwide. You can use, remix, and resell them freely. Attribution to the volunteer readers is a nice courtesy.

💳 Do I need a paid Apify plan to use this Actor?

No. The free plan covers testing and small runs (10 records per run). A paid plan unlocks the higher cap, scheduling, and concurrency.

🔁 What happens if a run fails or gets interrupted?

Apify retries transient errors automatically. If a run still fails, inspect the log, fix the input, and restart. Partial datasets are preserved.

🆘 What if I need help?

Our support team is here. Use the Apify platform messaging or the Tally form linked below.

🔌 Integrate with any app

LibriVox Audiobooks Scraper connects to any cloud service via Apify integrations:

Make - Automate multi-step workflows
Zapier - Connect with 5,000+ apps
Slack - Get run notifications in your channels
Airbyte - Pipe audiobook data into your warehouse
GitHub - Trigger runs from commits and releases
Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh audiobook records into your catalog or alert your content team in Slack.

🔗 Recommended Actors

🏛️ Library of Congress Scraper - 170M+ digitized cultural records
🗣️ Tatoeba Sentence Corpus Scraper - 12M+ multilingual example sentences
🌐 MyMemory Translation Scraper - Bulk text translation across 70+ language codes
📰 ArXiv Scraper - Academic preprints with metadata
🎨 Met Museum Scraper - Open-access artworks from The Met

💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.

🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.

⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the LibriVox project. All trademarks mentioned are the property of their respective owners. Only publicly available catalog data is collected. LibriVox audiobooks are dedicated to the public domain.

Amiibo Scraper

fortuitous_pirate/amiibo-scraper

Fetch free public domain audiobooks from LibriVox. Search by title, author, genre, or language. Returns metadata including authors, genres, download links, and total duration. No API key required.

👁 User avatar

Fortuitous Pirate

👁 Spotify Audiobooks Search and Scraper 📚 avatar

Spotify Audiobooks Search and Scraper 📚

apiharvest/spotify-audiobooks-search-and-scraper

📚 Scrape Spotify audiobooks with full chapter listings, narrator names, publisher, star ratings, accessibility data, pricing & content descriptions. Turn on Fetch Details for paginated chapters and similar audiobook recommendations. Complete audiobook metadata from Spotify.

👁 User avatar

APIHarvest

👁 Youtube Video and MP3 Downloader avatar

Youtube Video and MP3 Downloader

kingscraper/youtube-video-and-mp3-downloader

🎥📥 YouTube Video & MP3 Downloader 🎵⚡📹 Extract videos, 🎧 MP3 audio, 📝 subtitles & 🖼️ thumbnails from YouTube. 🚀 Batch download 100+ videos with 📊 complete metadata, 📈 statistics & 📺 channel details. ✨ Perfect for 👨‍💻 content creators, 🎙️ podcasters & 🔬 researchers!

👁 User avatar

King Scraper

👁 Duolingo Language Data Scraper | Course Vocabulary Export avatar

Duolingo Language Data Scraper | Course Vocabulary Export

parseforge/duolingo-language-data-scraper

Export Duolingo language course skills, lexemes and translations. Specify source and target language codes to pull the vocabulary set learners encounter. Useful for linguistics research, language app builders and translation tooling. CSV, Excel, JSON or XML.

👁 User avatar

ParseForge

👁 YouTube Mp3/Audio Downloader avatar

YouTube Mp3/Audio Downloader

codenest/youtube-mp3-audio-downloader

Easily and fast extract high-quality MP3/audio from YouTube videos & Shorts! 🎵 Get multiple formats, bitrates, and full metadata. Perfect for podcasters 🎙️, musicians 🎶, educators 📚, and content creators. Batch download audio with crystal-clear quality! 🚀YouTube Mp3/Audio Downloader.

👁 User avatar

CodeNest

156

2.5

👁 Project Gutenberg Books Scraper avatar

Project Gutenberg Books Scraper

parseforge/project-gutenberg-books-scraper

Search 75,000+ free public-domain books from Project Gutenberg. Returns title, author with birth/death years, cover image, plain-text and EPUB download URLs, Kindle and HTML formats, subjects, bookshelves, language, copyright status, summaries and download counts. Filter by author or language.

👁 User avatar

ParseForge

👁 iTunes & Apple Store Search Scraper avatar

iTunes & Apple Store Search Scraper

parseforge/itunes-apple-search-scraper

Search the iTunes and App Store catalog for apps, music, podcasts, movies, TV shows, audiobooks, and ebooks. Capture title, artist, price, genre, ratings, release date, artwork, preview URL, and store ID. Export to JSON, CSV, or Excel for market research, ASO, and content analytics.

👁 User avatar

ParseForge

👁 RSS Feed Scraper & RSS to JSON Converter avatar

RSS Feed Scraper & RSS to JSON Converter

xtech/feed-extractor

Scrape and parse RSS, Atom, JSON Feed (and podcast RSS) URLs into clean, structured JSON. Outputs one dataset row per feed entry/item for easy export to CSV/JSON and automations.

👁 User avatar

Xtech

Rss Feed Scraper

technicaldost/rss-feed-scraper

👁 User avatar

Technical Dost Solutions

5.0

👁 GitHub Trending Repos Scraper avatar

GitHub Trending Repos Scraper

parseforge/github-trending-scraper

Pull GitHub trending repositories with stars, forks, language, description, contributors, license, topics, and full repo metadata. Choose daily, weekly, or monthly windows and filter by programming language or spoken language. Export to JSON, CSV, or Excel for developer intelligence and tech trends.

👁 User avatar

ParseForge

URL: https://apify.com/parseforge/librivox-audiobooks-scraper