VOOZH about

URL: https://apify.com/scrapersdelight/npr-transcript-scraper

⇱ NPR Transcript Scraper β€” Fresh Air, Morning Edition & ATC Β· Apify


πŸ‘ NPR Transcript Scraper β€” Fresh Air, Morning Edition & ATC avatar

NPR Transcript Scraper β€” Fresh Air, Morning Edition & ATC

Pricing

Pay per event

Go to Apify Store

NPR Transcript Scraper β€” Fresh Air, Morning Edition & ATC

Scrape full NPR transcripts β€” Fresh Air, Morning Edition, All Things Considered & Weekend Edition. Speaker-labeled paragraphs, full text, date, author & audio URL per story, plus a new-transcript monitor with alerts. No login or API key. $2 per 1,000 transcripts.

Pricing

Pay per event

Rating

0.0

(0)

Developer

πŸ‘ Scrapers Delight

Scrapers Delight

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 days ago

Last modified

Share

πŸ“» NPR Transcript Scraper β€” Fresh Air, Morning Edition & All Things Considered

Get the full, speaker-labeled transcript of NPR's broadcast stories β€” no login, no AI transcription. NPR publishes a complete transcript page for nearly every segment of Fresh Air, Morning Edition, All Things Considered and Weekend Edition, and this actor reads it: clean paragraphs, full text, speakers, date, author, section and the MP3 audio URL. Scrape one story, the latest broadcast days of any program, or page the archive back to 1991.

Because the transcript is already published, there's no speech-to-text compute β€” it's fast and cheap.


What does it do?

For each story (by program archive crawl or direct URL) it returns:

  • πŸ“ Full transcript (plain text + paragraphs[]) β€” straight from npr.org/transcripts
  • πŸ—£οΈ Speakers β€” the inline labels NPR prints (LEILA FADEL, HOST, JON HAMILTON, BYLINE, guests)
  • πŸ“… Broadcast date, published date, author & section
  • 🎧 Audio URL + duration (the segment MP3)
  • 🚩 Honest flagging β€” stories whose transcript isn't published yet come back hasTranscript: false; nothing is synthesized

No ASR, no API key, no timestamps invented (NPR publishes none β€” output is paragraphs, never SRT).


What data does it extract?

For every story: storyId, title, storyUrl, transcriptUrl, program, episodeDate, publishedAt, author, section, speakers[], paragraphs[], paragraph_count, text, audioUrl, audioDuration, hasTranscript, is_new (monitor), scraped_at.


Who is it for?

  • ✍️ Journalists, researchers & students quoting and searching radio coverage.
  • πŸ€– AI / RAG builders β€” dense, professionally produced news + interview transcripts, ideal retrieval/training data.
  • πŸ“° Newsletter writers & media monitors tracking what NPR said about a topic, daily.
  • πŸŽ™οΈ Podcast/radio analysts studying program rundowns and guests.

How to use it (step by step)

  1. Click Try for free.
  2. Pick programs (fresh-air, morning-edition, all-things-considered, weekend-edition-saturday, weekend-edition-sunday) β€” or paste direct story/transcript URLs.
  3. Set Max broadcast days and Max transcripts to size the run (each archive page = 5 broadcast days; archive reaches back to 1991).
  4. Click Start, open the Dataset tab to view/export.
  5. (Optional) enable monitorMode + a Schedule to get only NEW transcripts each run, with Slack/webhook/email alerts.

Quick start

{"programs":["fresh-air"],"maxEpisodes":5,"maxStories":10}

Direct story lookup

{"storyUrls":["https://www.npr.org/transcripts/nx-s1-5849937","963319470"]}

Input

FieldWhat it does
programsNPR program slugs to crawl (fresh-air, morning-edition, …)
storyUrlsdirect transcript/story URLs or bare story ids
maxEpisodesrecent broadcast days per program (0 = no day cap)
maxStorieshard cap on transcripts fetched per run (0 = unlimited)
oldestDateoptional YYYY-MM-DD floor for archive pagination
includeMissingTranscriptsalso output stories with no transcript yet, flagged
monitorMode, alertOnNewTranscriptrecurring new-transcript watcher + alerts
webhookUrl, slackWebhookUrl, emailRecipientsalert channels
proxyConfiguration, requestConcurrencyproxy + parallelism

Output example

{
"storyId":"nx-s1-5849937",
"title":"Socioeconomic factors are becoming 'biologically embedded' in children's brains",
"storyUrl":"https://www.npr.org/2026/06/11/nx-s1-5849937/child-brain-development-stress-sleep-neighborhood-economics",
"transcriptUrl":"https://www.npr.org/transcripts/nx-s1-5849937",
"program":"morning-edition",
"episodeDate":"2026-06-11",
"publishedAt":"2026-06-11",
"author":"Jon Hamilton",
"section":"Science",
"speakers":["LEILA FADEL","JON HAMILTON"],
"paragraphs":["LEILA FADEL, HOST:","New research suggests that the neighborhood a child lives in leaves a lasting imprint on their brain…","…"],
"paragraph_count":18,
"text":"LEILA FADEL, HOST:\n\nNew research suggests…",
"audioUrl":"https://ondemand.npr.org/anon.npr-mp3/npr/me/2026/06/…mp3",
"audioDuration":228,
"hasTranscript":true
}

Export to JSON, CSV, Excel, HTML, or RSS, or fetch via the Apify API.


Monitor mode β€” new-transcript alerts per program

Run on a Schedule (e.g. every 6 hours) with monitorMode: true: the actor remembers every transcript it has seen (in a named, persistent store) and outputs/alerts only the new ones. NPR posts a story's transcript a few hours after broadcast β€” unseen stories are simply picked up on a later run, never faked.

{"programs":["morning-edition","all-things-considered"],"maxEpisodes":3,"monitorMode":true,"slackWebhookUrl":"https://hooks.slack.com/…"}

How much does it cost?

Pay-per-event β€” and with no transcription compute, it's cheap:

EventWhat it coversPrice
lot-scrapedeach story returned$0.004 / story
lot-detail-enrichedeach transcript page fetched$0.004 / story
monitor-run-completedeach scheduled watch run$0.05 / run
new-lot-detectedeach new transcript found$0.02 / transcript
alert-deliveredeach Slack/email/webhook push$0.005 / alert

That's about $8 per 1,000 full transcripts.


Is it legal to scrape these transcripts?

This actor reads publicly published transcript pages on npr.org (NPR even runs a text-only site, text.npr.org). The content is NPR's (copyrighted). Scraping public pages is generally legal, but you are responsible for your use β€” review NPR's terms of use and permissions policy; don't republish transcripts you're not licensed to.


FAQ

Is there a Whisper/ASR step? No β€” NPR publishes the transcript; this actor reads it. Fast and cheap.

Which programs work? Any show with an npr.org program archive: fresh-air, morning-edition, all-things-considered, weekend-edition-saturday, weekend-edition-sunday are verified; any npr.org/programs/{slug} is accepted.

Do I get timestamps? No β€” NPR's published transcripts contain none, so the actor outputs paragraphs + full text (never fabricated SRT). You DO get the segment MP3 URL and its duration.

Do I get speaker labels? Yes β€” NPR prints inline labels (TERRY GROSS, HOST:); they're kept in the paragraphs and collected into speakers[].

A story came back hasTranscript: false β€” why? Same-day stories gain their transcript a few hours after broadcast, and a few segments (music interludes etc.) never get one. The actor flags these honestly instead of guessing. In monitor mode they're re-checked next run.

How far back can I go? The program archives paginate to 1991. Use maxEpisodes: 0 + oldestDate (and a generous maxStories) for deep backfills.

Both story-id formats? Yes β€” new nx-s1-… ids and legacy numeric ids (e.g. 963319470) both resolve.

How do I monitor several programs? List them all in programs β€” state is tracked per story id, so there's no cross-program double counting.

How do I export? JSON, CSV, Excel, HTML, or RSS from the Dataset tab, or via the Apify API.

Does it need a proxy or login? No login, no API key. Apify's datacenter proxy (default) is plenty β€” no anti-bot was observed.


Feedback

Want episode-rundown mode (every segment of a broadcast day), topic filtering, or another NPR show verified? Open an issue on the actor.

You might also like

TikTok Transcript Scraper

crawlerbros/tiktok-transcript-scraper

Extract transcripts and subtitles from TikTok videos in all available languages. Returns timestamped segments plus full plain-text transcript per language.

132

James Edition Real Estate Scraper

parseforge/james-edition-real-estate-scraper

Scrape luxury real estate listings from James Edition. Extract property details including price, location, beds, baths, sqft, and property type.

71

5.0

Youtube Transcript Scraper

scraper-engine/youtube-transcript-scraper

YouTube Transcript Scraper extracts full transcripts from public YouTube videos with ease. Quickly retrieve spoken content for research, summarization, SEO, or accessibilityβ€”just enter a video URL and get clean, structured text. No login or API key required.

πŸ‘ User avatar

Scraper Engine

264

5.0

South China Morning Post (scmp.com) News Scraper

xtracto/scmp-scraper

Retrieves full South China Morning Post articles, including content protected by soft paywalls, for comprehensive regional coverage.

πŸ‘ User avatar

Farhan Febrian Nauval

7

Youtube Video Transcript Scraper [ Subtitles ]

alpha-scraper/youtube-video-transcript-scraper-subtitles

[ πŸŽ₯ Get any type of formats Transcript ] Extract full transcripts from public videos with ease ⚑ Quickly get spoken content for research, summaries & accessibility Just enter a video URL – no login or API key needed Fast, clean & structured text for pros ✨.

8

5.0

Archive.org Subtitle & Transcript Scraper β€” TXT, SRT & VTT

scrapersdelight/archive-transcript-scraper

Download captions from any Archive.org film, TV, or audio item: clean transcript text, timestamped cues, normalized SRT & VTT, one row per language. Search 3M+ captioned items, monitor for new ones. No login or API key. $2 per 1,000 transcripts.

πŸ‘ User avatar

Scrapers Delight

2