MIT OpenCourseWare Transcript Scraper โ Lectures to Text
Pricing
from $1.00 / 1,000 per record returneds
MIT OpenCourseWare Transcript Scraper โ Lectures to Text
Extract MIT OpenCourseWare video-lecture transcripts โ no login, no ASR. Give it a course (crawls every lecture) or specific lecture URLs: full transcript text, timestamped segments & SRT/VTT, plus course and lecture titles. Creative-Commons content. $2 per 1,000 lectures.
Pricing
from $1.00 / 1,000 per record returneds
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
14 days ago
Last modified
Categories
Share
๐ MIT OpenCourseWare Lecture Transcript Scraper
Pull MIT OpenCourseWare video-lecture transcripts โ no login, no AI transcription. MIT OCW publishes a transcript for every lecture, and this actor reads it: full text, timestamped segments, and SRT/VTT, plus course and lecture titles. Give it a course (it crawls every lecture) or specific lecture URLs.
It reads OCW's own captions, so there's no speech-to-text compute โ fast and cheap. (MIT OCW is free, Creative-Commons educational content.)
What does it do?
For each lecture (from a course crawl or direct URLs) it returns:
- ๐ Full transcript (plain text) โ always included
- โฒ๏ธ Timestamped segments โ
{start, end, text} - ๐ฌ SRT / VTT subtitles
- ๐ท๏ธ Course title + lecture title
No ASR, no API key โ it reads the published .vtt caption track.
What data does it extract?
For every lecture: url, course_title, lecture_title, transcript, segments[], srt, vtt, segment_count, is_new (monitor), scraped_at.
Who is it for?
- ๐ Learners & educators turning lectures into searchable notes and study guides.
- ๐ค AI / RAG builders โ rigorous, structured lecture content is excellent training/retrieval data.
- ๐ Localization / accessibility workflows.
How to use it (step by step)
- Click Try for free.
- Paste a course URL (
https://ocw.mit.edu/courses/{slug}/) โ or specific lecture URLs. - (Optional) add
srt/vtt/segmentsformats. - Click Start, open the Dataset tab to view/export.
- (Optional) set monitorMode + a Schedule to capture lectures as courses update.
Quick start
{"courseUrls":["https://ocw.mit.edu/courses/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/"],"transcriptFormats":["txt","srt"]}
Input
| Field | What it does |
|---|---|
courseUrls | OCW course URLs (crawls each course's lectures) |
lectureUrls | specific lecture resource URLs |
transcriptFormats | txt ยท segments ยท srt ยท vtt |
maxLectures | hard cap per run (0 = all) |
monitorMode, alertOnNewLecture | recurring watcher + alerts |
webhookUrl, slackWebhookUrl, emailRecipients | alert channels |
proxyConfiguration, requestConcurrency | proxy + parallelism |
Output
Each lecture is one dataset record (fields above). Export to JSON, CSV, Excel, HTML, or RSS, or fetch via the Apify API.
How much does it cost?
Pay-per-event โ and with no transcription compute, it's cheap:
| Event | What it covers | Suggested price |
|---|---|---|
lot-scraped | each lecture returned | ~$0.003 / lecture |
lot-detail-enriched | each transcript fetched | ~$0.003 / lecture |
monitor-run-completed | each scheduled watch run | ~$0.05 / run |
new-lot-detected | each new lecture | ~$0.02 / lecture |
alert-delivered | each Slack/email/webhook push | ~$0.005 / alert |
(Final per-event prices are set on the actor's pricing page.)
Is it legal to scrape OCW transcripts?
MIT OpenCourseWare is published free to the public under a Creative Commons BY-NC-SA license. This actor reads those public transcripts. You must comply with the CC BY-NC-SA terms โ attribute MIT OCW, non-commercial use, share-alike โ and review OCW's site terms. You are responsible for your use.
FAQ
Does it crawl a whole course? Yes โ give a course URL and it finds + transcribes every video lecture.
Is there a Whisper/ASR step?
No โ it reads OCW's .vtt captions, so it's fast and cheap.
Can I get subtitles?
Yes โ add srt and/or vtt to transcriptFormats.
How do I export? JSON, CSV, Excel, HTML, or RSS from the Dataset tab, or via the Apify API.
Feedback
Want PDF-notes extraction or per-department crawling? Open an issue on the actor.
