VOOZH about

URL: https://apify.com/scrapersdelight/mit-ocw-transcript-scraper

โ‡ฑ MIT OCW Subtitle Downloader โ€” Lecture Transcripts to Text ยท Apify


๐Ÿ‘ MIT OpenCourseWare Transcript Scraper โ€” Lectures to Text avatar

MIT OpenCourseWare Transcript Scraper โ€” Lectures to Text

Pricing

from $1.00 / 1,000 per record returneds

Go to Apify Store

MIT OpenCourseWare Transcript Scraper โ€” Lectures to Text

Extract MIT OpenCourseWare video-lecture transcripts โ€” no login, no ASR. Give it a course (crawls every lecture) or specific lecture URLs: full transcript text, timestamped segments & SRT/VTT, plus course and lecture titles. Creative-Commons content. $2 per 1,000 lectures.

Pricing

from $1.00 / 1,000 per record returneds

Rating

0.0

(0)

Developer

๐Ÿ‘ Scrapers Delight

Scrapers Delight

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

14 days ago

Last modified

Share

๐ŸŽ“ MIT OpenCourseWare Lecture Transcript Scraper

Pull MIT OpenCourseWare video-lecture transcripts โ€” no login, no AI transcription. MIT OCW publishes a transcript for every lecture, and this actor reads it: full text, timestamped segments, and SRT/VTT, plus course and lecture titles. Give it a course (it crawls every lecture) or specific lecture URLs.

It reads OCW's own captions, so there's no speech-to-text compute โ€” fast and cheap. (MIT OCW is free, Creative-Commons educational content.)


What does it do?

For each lecture (from a course crawl or direct URLs) it returns:

  • ๐Ÿ“ Full transcript (plain text) โ€” always included
  • โฒ๏ธ Timestamped segments โ€” {start, end, text}
  • ๐ŸŽฌ SRT / VTT subtitles
  • ๐Ÿท๏ธ Course title + lecture title

No ASR, no API key โ€” it reads the published .vtt caption track.


What data does it extract?

For every lecture: url, course_title, lecture_title, transcript, segments[], srt, vtt, segment_count, is_new (monitor), scraped_at.


Who is it for?

  • ๐ŸŽ“ Learners & educators turning lectures into searchable notes and study guides.
  • ๐Ÿค– AI / RAG builders โ€” rigorous, structured lecture content is excellent training/retrieval data.
  • ๐ŸŒ Localization / accessibility workflows.

How to use it (step by step)

  1. Click Try for free.
  2. Paste a course URL (https://ocw.mit.edu/courses/{slug}/) โ€” or specific lecture URLs.
  3. (Optional) add srt/vtt/segments formats.
  4. Click Start, open the Dataset tab to view/export.
  5. (Optional) set monitorMode + a Schedule to capture lectures as courses update.

Quick start

{"courseUrls":["https://ocw.mit.edu/courses/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/"],"transcriptFormats":["txt","srt"]}

Input

FieldWhat it does
courseUrlsOCW course URLs (crawls each course's lectures)
lectureUrlsspecific lecture resource URLs
transcriptFormatstxt ยท segments ยท srt ยท vtt
maxLectureshard cap per run (0 = all)
monitorMode, alertOnNewLecturerecurring watcher + alerts
webhookUrl, slackWebhookUrl, emailRecipientsalert channels
proxyConfiguration, requestConcurrencyproxy + parallelism

Output

Each lecture is one dataset record (fields above). Export to JSON, CSV, Excel, HTML, or RSS, or fetch via the Apify API.


How much does it cost?

Pay-per-event โ€” and with no transcription compute, it's cheap:

EventWhat it coversSuggested price
lot-scrapedeach lecture returned~$0.003 / lecture
lot-detail-enrichedeach transcript fetched~$0.003 / lecture
monitor-run-completedeach scheduled watch run~$0.05 / run
new-lot-detectedeach new lecture~$0.02 / lecture
alert-deliveredeach Slack/email/webhook push~$0.005 / alert

(Final per-event prices are set on the actor's pricing page.)


Is it legal to scrape OCW transcripts?

MIT OpenCourseWare is published free to the public under a Creative Commons BY-NC-SA license. This actor reads those public transcripts. You must comply with the CC BY-NC-SA terms โ€” attribute MIT OCW, non-commercial use, share-alike โ€” and review OCW's site terms. You are responsible for your use.


FAQ

Does it crawl a whole course? Yes โ€” give a course URL and it finds + transcribes every video lecture.

Is there a Whisper/ASR step? No โ€” it reads OCW's .vtt captions, so it's fast and cheap.

Can I get subtitles? Yes โ€” add srt and/or vtt to transcriptFormats.

How do I export? JSON, CSV, Excel, HTML, or RSS from the Dataset tab, or via the Apify API.


Feedback

Want PDF-notes extraction or per-department crawling? Open an issue on the actor.

You might also like

Coursera Transcript Scraper โ€” Lecture Subtitles (No Login)

scrapersdelight/coursera-transcript-scraper

Extract Coursera lecture transcripts from the course's own subtitle tracks โ€” no login, no ASR. By course slug: each open lecture's transcript as text, timestamped segments & SRT/VTT, in 30+ languages. Gated lectures are flagged, not faked. $2 per 1,000 lectures.

๐Ÿ‘ User avatar

Scrapers Delight

4

MIT OpenCourseWare Scraper | Free MIT Course Data

parseforge/mit-ocw-scraper

Pull MIT OpenCourseWare courses with title, instructor, department, level, semester, syllabus, lecture notes, problem sets, exams, and video URLs. Build free education datasets, study tools, and AI training corpora using world-class material from MIT, all openly licensed.

MIT OpenCourseWare Scraper

crawlerbros/mit-open-course-ware-scraper

Scrape MIT OpenCourseWare (ocw.mit.edu) - 2,500+ free MIT courses with full metadata: title, department, level, instructors, topics, resource types, descriptions, and image URLs. Search by keyword, browse by department or level, or fetch a single course by URL.

Udemy Scraper | $2 / 1k | All In One

fatihtahta/udemy-scraper

Scrape Udemy into clean, structured course, review and instructor data. $4 per 1,000 results. Capture titles, pricing and discounts, ratings, popularity, lecture counts, levels, languages, images, and profiles. Ideal for course market research, competitor analysis, and building targeted lead lists.

Coursera Scraper | All In One | $0.8 / 1k

fatihtahta/coursera-scraper

Scrape Coursera into clean, structured course and review data. Get titles, pricing and discounts, ratings, popularity, lecture counts, levels, languages, images and more. Ideal for course market research, competitor analysis, and building targeted lead lists.

Dailymotion Transcript Scraper โ€” Subtitles to TXT, SRT, VTT

scrapersdelight/dailymotion-transcript-scraper

Extract any public Dailymotion video's subtitle transcript โ€” no login, no ASR. By video URL/ID or a search query: full text, timestamped segments & SRT/VTT, plus title, owner and duration, from Dailymotion's own subtitle tracks. $2 per 1,000 videos.

๐Ÿ‘ User avatar

Scrapers Delight

4

Vimeo Transcript Scraper โ€” Captions to TXT, SRT & VTT

scrapersdelight/vimeo-transcript-scraper

Extract any public Vimeo video's captions and transcript โ€” no login, no ASR. By video URL/ID or a page that links Vimeo videos: transcript text, timestamped segments & SRT/VTT, plus title, owner and duration, from Vimeo's own caption tracks. $2 per 1,000 videos.

๐Ÿ‘ User avatar

Scrapers Delight

5

Podcast Transcript Scraper โ€” Any RSS Feed to Text & SRT

scrapersdelight/podcast-transcript-scraper

Extract per-episode transcripts from any podcast RSS feed via the Podcasting 2.0 <podcast:transcript> tag โ€” no login, no ASR. Clean text, timestamped segments & SRT/VTT per episode, plus metadata. Works with Buzzsprout, Captivate, Transistor, RSS.com & more. $2 per 1,000 episodes.

๐Ÿ‘ User avatar

Scrapers Delight

6

Loom Transcript Downloader โ€” Video Captions to Text

scrapersdelight/loom-transcript-scraper

Extract any public Loom video's transcript โ€” no login, no ASR. Reads Loom's own auto-captions from the share page: full text, timestamped segments & SRT/VTT, plus title, owner and duration. Schedule it to transcribe new videos in a folder.

๐Ÿ‘ User avatar

Scrapers Delight

11

TikTok Transcript Scraper

crawlerbros/tiktok-transcript-scraper

Extract transcripts and subtitles from TikTok videos in all available languages. Returns timestamped segments plus full plain-text transcript per language.

132

5.0