VOOZH about

URL: https://apify.com/parseforge/librivox-audiobooks-scraper

โ‡ฑ LibriVox Audiobooks Scraper ยท Apify


Pricing

from $10.00 / 1,000 result items

Go to Apify Store

LibriVox Audiobooks Scraper

Pull free public domain audiobooks from LibriVox: title, author, narrator, language, runtime, chapter count, genre, copyright year, description, RSS feed, and MP3 download URLs. Export to JSON, CSV, or Excel for educators, podcasters, language learners, and audio content libraries.

Pricing

from $10.00 / 1,000 result items

Rating

0.0

(0)

Developer

๐Ÿ‘ ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Share

๐Ÿ‘ ParseForge Banner

๐Ÿ“š LibriVox Audiobooks Scraper

๐Ÿš€ Export the world's largest public-domain audiobook library in seconds. Browse 20,000+ free audiobooks from LibriVox, filter by title, author, language, or genre, and pull every section's reader credit, runtime, and direct audio URL. No login, no manual catalog scrape.

๐Ÿ•’ Last updated: 2026-05-23 ยท ๐Ÿ“Š 23 fields per record ยท ๐Ÿ“š 20,000+ audiobooks ยท ๐ŸŽค 100k+ volunteer readings ยท ๐ŸŒ multilingual catalog

The LibriVox Audiobooks Scraper queries the LibriVox catalog and returns 23 structured fields per audiobook, including the title, author list, primary author, language, copyright year, runtime in human-readable and seconds form, description, genres, translators, plus direct links to the LibriVox page, RSS podcast feed, ZIP download, Internet Archive page, and original Project Gutenberg text source. Optional extended mode adds the full sections list with per-section reader credits, individual playtimes, and per-file audio URLs.

The catalog includes classic literature read aloud (Project Gutenberg titles), original LibriVox productions, multilingual works, and short-story collections. Most readings are in English but the project covers dozens of other languages including French, German, Spanish, Italian, Dutch, Portuguese, Latin, Japanese, and Mandarin. This Actor turns the catalog into clean CSV, Excel, JSON, or XML in under five minutes.

๐ŸŽฏ Target Audience๐Ÿ’ก Primary Use Cases
Audiobook app developers, podcast networks, education and EdTech teams, accessibility specialists, public libraries, audio content curatorsStock audiobook apps with free content, classroom listening assignments, accessibility libraries for visually impaired users, podcast feed generation, language-learning audio decks

๐Ÿ“‹ What the LibriVox Audiobooks Scraper does

Five filtering workflows in a single run:

  • ๐Ÿ”Ž Title substring search. Case-insensitive title match like pride, monte cristo.
  • โœ๏ธ Author substring search. Match by author last name or full name.
  • ๐ŸŒ Language filter. Filter to a specific LibriVox language (English, French, German, Spanish, Japanese, etc.).
  • ๐ŸŽญ Genre filter. Pick one of 28 LibriVox genres (Romance, Crime & Mystery, Philosophy, Poetry, and more).
  • ๐Ÿ“‘ Extended mode toggle. When enabled, each record carries the full sections list with reader credits, runtimes, and audio URLs.

Each record includes the LibriVox ID, title, full author list, primary author, language, copyright year, section count, total runtime (human-readable and seconds), full description (plain text and HTML), genres, translators when applicable, and the canonical LibriVox page, RSS podcast feed, ZIP archive URL, Project Gutenberg source link, Internet Archive page, and any other reference URLs.

๐Ÿ’ก Why it matters: LibriVox is the canonical free audiobook archive, but its catalog page is paginated and its section structure is nested under each book. Building your own crawler means walking thousands of pages and threading section lookups. This Actor returns everything as flat structured rows ready for a database or content platform.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded audiobook catalog.


โš™๏ธ Input

InputTypeDefaultBehavior
maxItemsinteger10Audiobooks to return. Free plan caps at 10, paid plan at 1,000,000.
titlestring""Case-insensitive title substring.
authorstring""Author last-name substring.
languagestring""Language name as used by LibriVox.
genrestring""One of 28 LibriVox genres.
extendedbooleantrueWhen true, include sections list with reader credits and audio URLs.

Example: 50 Jane Austen audiobooks in English.

{
"maxItems":50,
"author":"austen",
"language":"English",
"extended":true
}

Example: 20 French-language poetry audiobooks with full section list.

{
"maxItems":20,
"language":"French",
"genre":"Poetry",
"extended":true
}

โš ๏ธ Good to Know: LibriVox sections are individual chapters or tracks read by volunteer narrators. When extended is true the per-section reader credit, length, and audio file URL are exposed for every track in the book. Expect 10-100 sections per long novel.


๐Ÿ“Š Output

Each audiobook record contains 23 fields. Download the dataset as CSV, Excel, JSON, or XML.

๐Ÿงพ Schema

FieldTypeExample
๐Ÿ†” librivoxIdstring"100"
๐Ÿ“š titlestring"Pride and Prejudice"
โœ๏ธ authorsarray[{"first_name":"Jane","last_name":"Austen"}]
๐Ÿ‘ค primaryAuthorstring"Jane Austen"
๐ŸŒ languagestring"English"
๐Ÿ“… copyrightYearstring"1813"
๐Ÿ”ข numSectionsnumber61
โฑ๏ธ totalTimestring"11:35:46"
โฒ๏ธ totalTimeSecondsnumber41746
๐Ÿ“ descriptionstring"Pride and Prejudice is the second novel by Jane Austen..."
๐Ÿ“„ descriptionHtmlstring"<p>Pride and Prejudice is the second novel..."
๐ŸŽญ genresarray["General Fiction", "Romance"]
๐ŸŒ translatorsarray[]
๐Ÿ”— urlLibrivoxstring"https://librivox.org/pride-and-prejudice-by-jane-austen/"
๐Ÿ“ป urlRssstring"https://librivox.org/rss/100"
๐Ÿ“ฆ urlZipFilestring"https://www.archive.org/.../pride_and_prejudice_64kb_mp3.zip"
๐ŸŒ urlProjectstring | null"https://www.gutenberg.org/ebooks/1342"
๐Ÿ”— urlOtherstring | nullnull
๐Ÿ“š urlInternetArchivestring"https://archive.org/details/pride_and_prejudice_0809_librivox"
๐Ÿ“– urlTextSourcestring | null"https://www.gutenberg.org/files/1342/1342-h/1342-h.htm"
๐Ÿ”ข sectionsCountnumber61
๐ŸŽค sectionsarray[{"chapter":"Chapter 1","reader":"Karen Savage","playtime":"00:14:23","audioUrl":"..."}]
๐Ÿ•’ scrapedAtISO 8601"2026-05-23T00:00:00.000Z"

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

Capability
๐Ÿ“š20,000+ audiobook catalog. Every public-domain title LibriVox has produced.
๐ŸŽฏMulti-dimensional filters. Title, author, language, and genre combine in a single run.
๐ŸŽคPer-section reader credits. Extended mode exposes chapter-level narrator, runtime, and audio URL.
๐Ÿ“ปRSS podcast feeds included. Drop straight into a podcast player.
โšกFast. 10 audiobooks in under 5 seconds, 1,000 in under 5 minutes.
๐Ÿ”Always fresh. Live catalog reads on every run.
๐ŸšซNo authentication. Public archive, no key required.

๐Ÿ“Š LibriVox is the canonical free audiobook library and a foundation for any audio content product targeting public-domain works.


๐Ÿ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
โญ LibriVox Audiobooks Scraper (this Actor)$5 free credit, then pay-per-use20,000+ audiobooksLive per runtitle, author, language, genreโšก 2 min
Commercial audiobook libraries$14.95+/monthCurated paid catalogDailyLimited๐Ÿข Days
Custom site scraperFree engineeringFullCron drivenHand builtโณ Weeks
Per-book browsingFreeOne book at a timeManualUI only๐Ÿ•’ Painful

Pick this Actor when you want a clean, filterable feed of the entire LibriVox catalog with zero parser maintenance.


๐Ÿš€ How to use

  1. ๐Ÿ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. ๐ŸŒ Open the Actor. Go to the LibriVox Audiobooks Scraper page on the Apify Store.
  3. ๐ŸŽฏ Set input. Add optional title, author, language, or genre filters, choose extended mode if you want per-section data.
  4. ๐Ÿš€ Run it. Click Start and let the Actor collect catalog records.
  5. ๐Ÿ“ฅ Download. Grab your results from the Dataset tab as CSV, Excel, JSON, or XML.

โฑ๏ธ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


๐Ÿ’ผ Business use cases

๐ŸŽง Audiobook Apps

  • Stock a freemium audiobook app with public-domain titles
  • Build a kids' audiobook section from children's fiction
  • Generate themed audio libraries (romance, mystery, classics)
  • Add multilingual content packs to existing apps

๐ŸŽ™๏ธ Podcast Networks

  • Spin up "classic literature" podcast feeds from RSS URLs
  • Curate themed reading series for syndication
  • Source content for an audio newsletter or daily-listen app
  • Build chaptered podcast versions of long novels

๐ŸŽ“ Education & Accessibility

  • Stock classroom listening assignments for literature classes
  • Build accessibility libraries for visually impaired users
  • Augment ESL programs with audio for graded readers
  • Provide reading-along audio for early literacy programs

๐Ÿ“š Library Apps & Catalogs

  • Add a free audiobook collection to a digital library catalog
  • Source ISBN-less audiobook records for cataloging projects
  • Build a "listen-along" companion to a Gutenberg etext app
  • Curate themed reading lists with audio companions

๐Ÿ”Œ Automating LibriVox Audiobooks Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • ๐ŸŸข Node.js. Install the apify-client NPM package.
  • ๐Ÿ Python. Use the apify-client PyPI package.
  • ๐Ÿ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep a downstream audiobook catalog topped up with the latest LibriVox publications.


๐ŸŒŸ Beyond business use cases

Public-domain audiobooks power more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

๐ŸŽ“ Research and academia

  • Reception studies of classic literature
  • Audiobook narration and voice-acting research
  • Reproducible corpora citing exact dataset pulls
  • Cross-language comparative literature with audio

๐ŸŽจ Personal and creative

  • Curated bedtime story collections for parents
  • Mood-based reading lists for hobbyist apps
  • Visualization dashboards of reader hours by language
  • Themed playlists for road trips or long walks

๐Ÿค Non-profit and civic

  • Free audio libraries for under-resourced schools
  • Audio access for visually impaired community members
  • Senior-living center listening programs
  • Language-revitalization audio packs for minority tongues

๐Ÿงช Experimentation

  • Train automatic speech recognition on volunteer narration
  • Build alignment datasets pairing audio with Gutenberg text
  • Prototype voice-cloning research with diverse readers
  • Test podcast-publishing pipelines with real RSS feeds

๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


โ“ Frequently Asked Questions

๐Ÿงฉ How does it work?

Pick optional title, author, language, or genre filters and choose whether to include per-section detail. Click Start and the Actor returns clean rows with audio links, RSS feeds, ZIP archives, and reader credits.

๐Ÿ“ How complete is the metadata?

LibriVox metadata is curated by volunteer catalogers. Most fields are populated, with the occasional gap for very old additions. The description, genres, and section reader credits are typically complete.

๐Ÿ” How often is the catalog refreshed?

LibriVox publishes new audiobooks weekly. Every Actor run hits the live catalog, so new releases appear in your dataset right away.

๐ŸŒ Which languages are supported?

Most audiobooks are in English, but LibriVox covers dozens of other languages including French, German, Spanish, Italian, Dutch, Portuguese, Latin, Japanese, Mandarin, and more. Use the language input to filter.

๐ŸŽค Do I get per-chapter audio URLs?

Yes, when extended is true. Each section carries the chapter name, reader credit, playtime, and a direct MP3 URL hosted by the Internet Archive.

โฐ Can I schedule regular runs?

Yes. Use Apify Schedules to trigger this Actor on any cron interval (weekly is recommended for new releases).

โš–๏ธ Is this content legal to use?

Yes. LibriVox audiobooks are public domain in the United States. Source texts are also typically Project Gutenberg public-domain works. Always verify the copyright status in your jurisdiction.

๐Ÿ’ผ Can I use these audiobooks commercially?

Yes. LibriVox recordings are dedicated to the public domain worldwide. You can use, remix, and resell them freely. Attribution to the volunteer readers is a nice courtesy.

๐Ÿ’ณ Do I need a paid Apify plan to use this Actor?

No. The free plan covers testing and small runs (10 records per run). A paid plan unlocks the higher cap, scheduling, and concurrency.

๐Ÿ” What happens if a run fails or gets interrupted?

Apify retries transient errors automatically. If a run still fails, inspect the log, fix the input, and restart. Partial datasets are preserved.

๐Ÿ†˜ What if I need help?

Our support team is here. Use the Apify platform messaging or the Tally form linked below.


๐Ÿ”Œ Integrate with any app

LibriVox Audiobooks Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe audiobook data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh audiobook records into your catalog or alert your content team in Slack.


๐Ÿ”— Recommended Actors

๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


๐Ÿ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


โš ๏ธ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the LibriVox project. All trademarks mentioned are the property of their respective owners. Only publicly available catalog data is collected. LibriVox audiobooks are dedicated to the public domain.

You might also like

Spotify Audiobooks Search and Scraper ๐Ÿ“š

apiharvest/spotify-audiobooks-search-and-scraper

๐Ÿ“š Scrape Spotify audiobooks with full chapter listings, narrator names, publisher, star ratings, accessibility data, pricing & content descriptions. Turn on Fetch Details for paginated chapters and similar audiobook recommendations. Complete audiobook metadata from Spotify.

Youtube Video and MP3 Downloader

kingscraper/youtube-video-and-mp3-downloader

๐ŸŽฅ๐Ÿ“ฅ YouTube Video & MP3 Downloader ๐ŸŽตโšก๐Ÿ“น Extract videos, ๐ŸŽง MP3 audio, ๐Ÿ“ subtitles & ๐Ÿ–ผ๏ธ thumbnails from YouTube. ๐Ÿš€ Batch download 100+ videos with ๐Ÿ“Š complete metadata, ๐Ÿ“ˆ statistics & ๐Ÿ“บ channel details. โœจ Perfect for ๐Ÿ‘จโ€๐Ÿ’ป content creators, ๐ŸŽ™๏ธ podcasters & ๐Ÿ”ฌ researchers!

Duolingo Language Data Scraper | Course Vocabulary Export

parseforge/duolingo-language-data-scraper

Export Duolingo language course skills, lexemes and translations. Specify source and target language codes to pull the vocabulary set learners encounter. Useful for linguistics research, language app builders and translation tooling. CSV, Excel, JSON or XML.

YouTube Mp3/Audio Downloader

codenest/youtube-mp3-audio-downloader

Easily and fast extract high-quality MP3/audio from YouTube videos & Shorts! ๐ŸŽต Get multiple formats, bitrates, and full metadata. Perfect for podcasters ๐ŸŽ™๏ธ, musicians ๐ŸŽถ, educators ๐Ÿ“š, and content creators. Batch download audio with crystal-clear quality! ๐Ÿš€YouTube Mp3/Audio Downloader.

156

2.5

Project Gutenberg Books Scraper

parseforge/project-gutenberg-books-scraper

Search 75,000+ free public-domain books from Project Gutenberg. Returns title, author with birth/death years, cover image, plain-text and EPUB download URLs, Kindle and HTML formats, subjects, bookshelves, language, copyright status, summaries and download counts. Filter by author or language.

iTunes & Apple Store Search Scraper

parseforge/itunes-apple-search-scraper

Search the iTunes and App Store catalog for apps, music, podcasts, movies, TV shows, audiobooks, and ebooks. Capture title, artist, price, genre, ratings, release date, artwork, preview URL, and store ID. Export to JSON, CSV, or Excel for market research, ASO, and content analytics.

RSS Feed Scraper & RSS to JSON Converter

xtech/feed-extractor

Scrape and parse RSS, Atom, JSON Feed (and podcast RSS) URLs into clean, structured JSON. Outputs one dataset row per feed entry/item for easy export to CSV/JSON and automations.

GitHub Trending Repos Scraper

parseforge/github-trending-scraper

Pull GitHub trending repositories with stars, forks, language, description, contributors, license, topics, and full repo metadata. Choose daily, weekly, or monthly windows and filter by programming language or spoken language. Export to JSON, CSV, or Excel for developer intelligence and tech trends.