Pricing
$5.99/month + usage
Youtube Video Transcript Scraper
A powerful YouTube Video Transcript Scraper that instantly pulls clean, accurate captions from any video β perfect for creators, researchers, and AI workflows. Fast, reliable, and built to save your time.
Pricing
$5.99/month + usage
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
4
Total users
0
Monthly active users
6 months ago
Last modified
Categories
Share
π YouTube Video Transcript Scraper
π Build
π Version
π License
π Apify
Accurate, timestamped transcripts for full-length YouTube videos β chapters, subtitles, and multi-language support.
π Overview
This Actor extracts clean, timestamped transcripts from full-length YouTube videos (standard watch URLs, youtu.be, and /watch?v= formats). It's designed for longer content: works with multiple caption tracks, handles chapters, and produces export-ready captions (SRT/VTT) alongside structured JSON suitable for analytics and LLM pipelines.
π‘ Why Full-Video Focus?
- Full videos often contain chapters, multiple speakers, and longer dialogues β transcripts must preserve timing and structure.
- Supports official captions when available and high-quality ASR fallbacks when not.
- Produces SRT/VTT, plain text, and structured JSON for downstream processing.
π§ Key Features
- β
Multi-format URL normalization (
/watch?v=,youtu.be). - β Prefer official caption tracks; fallback to ASR extraction when captions are missing.
- β Preserve chapters and video metadata (title, duration, thumbnails).
- β Export as JSON, plain text, SRT, and VTT.
- β Optional speaker diarization and language detection.
- β Configurable chunking for very long videos and resume/retry support.
- β Proxy-compatible and production-ready for large-scale jobs.
β‘ Quick Start β Console
- Open the Actor on Apify Console.
- Paste one or more YouTube video URLs into the input (watch links or youtu.be links accepted).
- Click Run β results appear in the Dataset and Files (SRT/VTT) tabs.
βοΈ Quick Start β CLI & Python
CLI
$apify call neuro-scraper/youtube-transcript-fetcher --input ./videos_input.json
Python (apify-client)
from apify_client import ApifyClientclient = ApifyClient('<APIFY_TOKEN>')run = client.actor('neuro-scraper/youtube-transcript-fetcher').call(run_input={"startUrls":[{"url":"https://www.youtube.com/watch?v=EXAMPLE"}],"workers":3,"exportFormats":["json","srt","vtt"]})for item in client.dataset(run['defaultDatasetId']).list_items()['items']:print(item['Transcript']['plain_text'][:400])
π Inputs (Video-focused)
| Name | Type | Required | Default | Example | Notes |
|---|---|---|---|---|---|
startUrls | array | Yes | [] | [{"url":"https://www.youtube.com/watch?v=abcd1234"}] | List of YouTube video URLs |
workers | integer | Optional | 5 | 10 | Max concurrent fetches |
exportFormats | array | Optional | ["json"] | ["json","srt","vtt"] | Output formats to generate |
speakerDiarization | boolean | Optional | false | true | Enable speaker detection (best-effort) |
language | string | Optional | null | "en" | Force output language (ISO code) |
proxyConfiguration | object | Optional | {} | {"useApifyProxy": true} | Proxy settings |
Example input (Console JSON):
{"startUrls":[{"url":"https://www.youtube.com/watch?v=abcd1234"},{"url":"https://youtu.be/abcd1234"}],"workers":5,"exportFormats":["json","srt","vtt"],"speakerDiarization":true,"proxyConfiguration":{"useApifyProxy":true}}
π Outputs
Each Dataset item contains rich metadata and multiple transcript representations. Example:
{"inputUrl":"https://www.youtube.com/watch?v=abcd1234","fetchedAt":"2025-11-04T10:00:00Z","success":true,"video":{"title":"Example Video","duration":3720,"chapters":[{"title":"Intro","start":0},{"title":"Main topic","start":60}]},"Transcript":{"plain_text":"Full transcript text...","with_timestamps":[{"text":"Hello and welcome to the show.","start":0.2,"end":4.5},{"text":"Today we'll talk about...","start":5.0,"end":9.3}],"speaker_segments":[{"speaker":"Speaker 1","start":0.2,"end":4.5,"text":"Hello and welcome to the show."}]},"files":{"srt":"runs/<runId>/files/abcd1234.srt","vtt":"runs/<runId>/files/abcd1234.vtt"}}
Notes: Files (SRT/VTT) are attached to the run and accessible from the Files tab for easy download.
π Environment Variables
APIFY_TOKENβ required for authentication.HTTP_PROXY,HTTPS_PROXYβ optional custom proxies.APIFY_PROXY_PASSWORDβ use with Apify Proxy.
Store credentials securely as secrets β never in plaintext.
βΆοΈ How to Run (short checklist)
- Open Apify Console β Actors β YouTube Transcript Fetcher.
- Provide video URLs (watch or youtu.be), set desired export formats, and toggle options.
- Run and inspect Dataset and Files tabs for JSON/SRT/VTT outputs.
π Logs & Troubleshooting
- No transcript available β video may lack captions and audio quality may be too poor for ASR.
- Partial transcripts β long videos may be chunked; check run logs for retry or chunk status.
- Timeouts / failures β lower
workersor increase timeouts; enable proxy if region-restricted.
Monitor real-time logs in the Console Run Log panel for detailed error messages.
β± Scheduling & Webhooks
- Schedule daily or weekly runs for channel-level ingestion.
- Use Webhooks to push transcript files or Dataset updates to downstream systems (storage, search index, or ML pipelines).
π Changelog
- 1.0.0 β 2025-11-04: Initial release β full-video support.
π Notes & TODO
- TODO: Add example of chapter-aware summarization pipeline.
- TODO: Improve speaker diarization accuracy with optional external ASR.
β Final note
This README is designed for researchers, media teams, and engineers who need robust, exportable transcripts from full-length YouTube videos β suitable for analytics, captioning, and training data generation.
