NPR Transcript Scraper β Fresh Air, Morning Edition & ATC
Pricing
Pay per event
NPR Transcript Scraper β Fresh Air, Morning Edition & ATC
Scrape full NPR transcripts β Fresh Air, Morning Edition, All Things Considered & Weekend Edition. Speaker-labeled paragraphs, full text, date, author & audio URL per story, plus a new-transcript monitor with alerts. No login or API key. $2 per 1,000 transcripts.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 days ago
Last modified
Categories
Share
π» NPR Transcript Scraper β Fresh Air, Morning Edition & All Things Considered
Get the full, speaker-labeled transcript of NPR's broadcast stories β no login, no AI transcription. NPR publishes a complete transcript page for nearly every segment of Fresh Air, Morning Edition, All Things Considered and Weekend Edition, and this actor reads it: clean paragraphs, full text, speakers, date, author, section and the MP3 audio URL. Scrape one story, the latest broadcast days of any program, or page the archive back to 1991.
Because the transcript is already published, there's no speech-to-text compute β it's fast and cheap.
What does it do?
For each story (by program archive crawl or direct URL) it returns:
- π Full transcript (plain text +
paragraphs[]) β straight from npr.org/transcripts - π£οΈ Speakers β the inline labels NPR prints (
LEILA FADEL, HOST,JON HAMILTON, BYLINE, guests) - π Broadcast date, published date, author & section
- π§ Audio URL + duration (the segment MP3)
- π© Honest flagging β stories whose transcript isn't published yet come back
hasTranscript: false; nothing is synthesized
No ASR, no API key, no timestamps invented (NPR publishes none β output is paragraphs, never SRT).
What data does it extract?
For every story: storyId, title, storyUrl, transcriptUrl, program, episodeDate, publishedAt, author, section, speakers[], paragraphs[], paragraph_count, text, audioUrl, audioDuration, hasTranscript, is_new (monitor), scraped_at.
Who is it for?
- βοΈ Journalists, researchers & students quoting and searching radio coverage.
- π€ AI / RAG builders β dense, professionally produced news + interview transcripts, ideal retrieval/training data.
- π° Newsletter writers & media monitors tracking what NPR said about a topic, daily.
- ποΈ Podcast/radio analysts studying program rundowns and guests.
How to use it (step by step)
- Click Try for free.
- Pick programs (
fresh-air,morning-edition,all-things-considered,weekend-edition-saturday,weekend-edition-sunday) β or paste direct story/transcript URLs. - Set Max broadcast days and Max transcripts to size the run (each archive page = 5 broadcast days; archive reaches back to 1991).
- Click Start, open the Dataset tab to view/export.
- (Optional) enable monitorMode + a Schedule to get only NEW transcripts each run, with Slack/webhook/email alerts.
Quick start
{"programs":["fresh-air"],"maxEpisodes":5,"maxStories":10}
Direct story lookup
{"storyUrls":["https://www.npr.org/transcripts/nx-s1-5849937","963319470"]}
Input
| Field | What it does |
|---|---|
programs | NPR program slugs to crawl (fresh-air, morning-edition, β¦) |
storyUrls | direct transcript/story URLs or bare story ids |
maxEpisodes | recent broadcast days per program (0 = no day cap) |
maxStories | hard cap on transcripts fetched per run (0 = unlimited) |
oldestDate | optional YYYY-MM-DD floor for archive pagination |
includeMissingTranscripts | also output stories with no transcript yet, flagged |
monitorMode, alertOnNewTranscript | recurring new-transcript watcher + alerts |
webhookUrl, slackWebhookUrl, emailRecipients | alert channels |
proxyConfiguration, requestConcurrency | proxy + parallelism |
Output example
{"storyId":"nx-s1-5849937","title":"Socioeconomic factors are becoming 'biologically embedded' in children's brains","storyUrl":"https://www.npr.org/2026/06/11/nx-s1-5849937/child-brain-development-stress-sleep-neighborhood-economics","transcriptUrl":"https://www.npr.org/transcripts/nx-s1-5849937","program":"morning-edition","episodeDate":"2026-06-11","publishedAt":"2026-06-11","author":"Jon Hamilton","section":"Science","speakers":["LEILA FADEL","JON HAMILTON"],"paragraphs":["LEILA FADEL, HOST:","New research suggests that the neighborhood a child lives in leaves a lasting imprint on their brainβ¦","β¦"],"paragraph_count":18,"text":"LEILA FADEL, HOST:\n\nNew research suggestsβ¦","audioUrl":"https://ondemand.npr.org/anon.npr-mp3/npr/me/2026/06/β¦mp3","audioDuration":228,"hasTranscript":true}
Export to JSON, CSV, Excel, HTML, or RSS, or fetch via the Apify API.
Monitor mode β new-transcript alerts per program
Run on a Schedule (e.g. every 6 hours) with monitorMode: true: the actor remembers every transcript it has seen (in a named, persistent store) and outputs/alerts only the new ones. NPR posts a story's transcript a few hours after broadcast β unseen stories are simply picked up on a later run, never faked.
{"programs":["morning-edition","all-things-considered"],"maxEpisodes":3,"monitorMode":true,"slackWebhookUrl":"https://hooks.slack.com/β¦"}
How much does it cost?
Pay-per-event β and with no transcription compute, it's cheap:
| Event | What it covers | Price |
|---|---|---|
lot-scraped | each story returned | $0.004 / story |
lot-detail-enriched | each transcript page fetched | $0.004 / story |
monitor-run-completed | each scheduled watch run | $0.05 / run |
new-lot-detected | each new transcript found | $0.02 / transcript |
alert-delivered | each Slack/email/webhook push | $0.005 / alert |
That's about $8 per 1,000 full transcripts.
Is it legal to scrape these transcripts?
This actor reads publicly published transcript pages on npr.org (NPR even runs a text-only site, text.npr.org). The content is NPR's (copyrighted). Scraping public pages is generally legal, but you are responsible for your use β review NPR's terms of use and permissions policy; don't republish transcripts you're not licensed to.
FAQ
Is there a Whisper/ASR step? No β NPR publishes the transcript; this actor reads it. Fast and cheap.
Which programs work?
Any show with an npr.org program archive: fresh-air, morning-edition, all-things-considered, weekend-edition-saturday, weekend-edition-sunday are verified; any npr.org/programs/{slug} is accepted.
Do I get timestamps? No β NPR's published transcripts contain none, so the actor outputs paragraphs + full text (never fabricated SRT). You DO get the segment MP3 URL and its duration.
Do I get speaker labels?
Yes β NPR prints inline labels (TERRY GROSS, HOST:); they're kept in the paragraphs and collected into speakers[].
A story came back hasTranscript: false β why?
Same-day stories gain their transcript a few hours after broadcast, and a few segments (music interludes etc.) never get one. The actor flags these honestly instead of guessing. In monitor mode they're re-checked next run.
How far back can I go?
The program archives paginate to 1991. Use maxEpisodes: 0 + oldestDate (and a generous maxStories) for deep backfills.
Both story-id formats?
Yes β new nx-s1-β¦ ids and legacy numeric ids (e.g. 963319470) both resolve.
How do I monitor several programs?
List them all in programs β state is tracked per story id, so there's no cross-program double counting.
How do I export? JSON, CSV, Excel, HTML, or RSS from the Dataset tab, or via the Apify API.
Does it need a proxy or login? No login, no API key. Apify's datacenter proxy (default) is plenty β no anti-bot was observed.
Feedback
Want episode-rundown mode (every segment of a broadcast day), topic filtering, or another NPR show verified? Open an issue on the actor.
