VOOZH about

URL: https://apify.com/app.tanalytics/youtube-transcript-api---ai-training-data

โ‡ฑ YouTube Transcript Extractor for AI Training Data ยท Apify


๐Ÿ‘ YouTube Transcript API - AI Training Data avatar

YouTube Transcript API - AI Training Data

Pricing

from $0.01 / youtube transcript extraction

Go to Apify Store

YouTube Transcript API - AI Training Data

Extract YouTube video transcripts optimized for AI and machine learning workflows. Features chunking for LLM context limits, SRT/VTT formats, and music symbol removal. Perfect for building training datasets, content analysis, and subtitle generation.

Pricing

from $0.01 / youtube transcript extraction

Rating

0.0

(0)

Developer

๐Ÿ‘ Tan Analytics

Tan Analytics

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

0

Monthly active users

3 months ago

Last modified

Share

YouTube Transcript Extractor - AI Training Data

Extract YouTube video transcripts optimized for AI and machine learning workflows.


Why This Actor?

FeatureFree ToolsThis Actor
AI chunkingโŒโœ… Split by token limit
Token countingโŒโœ… Estimated tokens
Clean transcriptsโŒโœ… Remove โ™ช [music]
SRT/VTT formatsโŒโœ… All included
Video metadataโŒโœ… Title, author, thumbnail
AffordableLimited$0.01 per video

Use Cases

AI Training Data

Build high-quality training datasets from YouTube videos. Chunked transcripts fit any LLM context window.

Content Analysis

Analyze video content. Get word counts, token estimates, and structured metadata.

Subtitle Generation

Export transcripts in SRT or VTT format for video editing, captions, or accessibility.

Academic Research

Extract lectures, interviews, and documentaries. Clean transcripts ready for analysis.


Features

๐ŸŽฏ AI-Optimized Output

  • Smart chunking - Split transcripts to fit your LLM's context window
  • Token estimation - Know exactly how many tokens you're working with
  • Clean mode - Remove music symbols (โ™ช), [applause], [laughter] for cleaner training data

๐Ÿ“„ Multiple Formats

  • Plain text - Raw transcript
  • SRT subtitles - For video editors
  • VTT subtitles - Web-compatible
  • Timestamps - Optional [MM:SS] markers

๐Ÿ“Š Metadata Enrichment

  • Video title and author
  • Thumbnail URL
  • Duration (formatted and raw)
  • Word count and character count
  • Detected language

๐Ÿ”’ Reliability

  • Automatic proxy fallback (Direct โ†’ Datacenter โ†’ Residential)
  • YouTube Shorts support
  • Multi-language transcripts

Pricing

$0.01 per transcript extraction

VideosCost
10$0.10
100$1.00
1,000$10.00

No monthly commitment. Pay only for what you use.


Quick Start

Input

{
"videoUrl":"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"language":"en",
"chunkSize":2000,
"cleanTranscript":true,
"outputFormat":"text"
}

Output

{
"videoUrl":"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"videoId":"dQw4w9WgXcQ",
"transcript":"โ™ช We're no strangers to love โ™ช",
"transcriptClean":"We're no strangers to love",
"chunks":[
{
"id":0,
"text":"We're no strangers to love...",
"start":1.36,
"end":110.0,
"wordCount":230
}
],
"metadata":{
"title":"Rick Astley - Never Gonna Give You Up",
"author":"Rick Astley",
"thumbnailUrl":"https://img.youtube.com/vi/dQw4w9WgXcQ/maxresdefault.jpg",
"duration":211.32,
"durationFormatted":"03:31",
"wordCount":367,
"estimatedTokens":488,
"language":"en"
},
"transcriptSRT":"1\n00:00:01,360 --> 00:00:03,040\nโ™ช We're no strangers to love โ™ช",
"transcriptVTT":"WEBVTT\n\n00:00:01.360 --> 00:00:03.040\nโ™ช We're no strangers to love โ™ช"
}

Input Parameters

ParameterTypeDefaultDescription
videoUrlstringrequiredYouTube video URL
languagestring"en"Preferred transcript language
chunkSizeinteger2000Max chars per chunk (0 = off)
cleanTranscriptbooleanfalseRemove music symbols and filler
includeMetadatabooleantrueInclude video metadata
outputFormatstring"text"Format: text, srt, or vtt
includeTimestampsbooleantrueAdd [MM:SS] timestamps

Supported URLs

  • https://www.youtube.com/watch?v=VIDEO_ID
  • https://youtu.be/VIDEO_ID
  • https://www.youtube.com/shorts/VIDEO_ID

FAQ

Q: What if a video has no transcript? A: The actor will return an error for that video.

Q: Can I extract transcripts in other languages? A: Yes. Set language to the ISO code (e.g., "es" for Spanish).

Q: What's the maximum chunk size? A: Default is 2000 characters (~500 tokens). Set to 0 to disable chunking.

Q: How accurate is token estimation? A: We use ~1.33 tokens per word as a rough estimate.


Support

Open an issue on GitHub or contact for enterprise pricing on large volumes.


$0.01 per transcript | Try it now on Apify

You might also like

YouTube Transcript API - AI Training Data (Batch)

app.tanalytics/youtube-transcript-batch

Batch extract YouTube transcripts at scale. Process thousands of videos in parallel with AI-optimized output. Smart chunking, token estimation, SRT/VTT export. $10 per 1K.

2

Youtube Transcript Scraper

thedoor/youtube-transcript-scraper

Extract full YouTube transcripts instantly. Bulk video support, precise timestamps, and multiple export formats (CSV, Excel, JSON). Perfect for AI training, SEO, and content analysis.

YouTube Transcript Enhanced

automation-lab/youtube-transcript-enhanced

Extract YouTube transcripts with SRT/VTT subtitle export, paragraph chunking, keyword search, time range filtering, and text analytics. Works with any public video.

๐Ÿ‘ User avatar

Stas Persiianenko

16

1.0

YouTube Subtitle Scraper โ€“ Download Captions & Transcripts

datascoutapi/youtube-subtitle-scraper

Extract YouTube subtitles and auto-generated captions in seconds โ€“ no YouTube API key required. Get clean, structured subtitle text ready for analysis, SEO optimization, content repurposing, academic research, or building AI/training datasets.

Youtube Video Subtitles Scraper

simpleapi/youtube-video-subtitles-scraper

YouTube Video Subtitles Scraper extracts captions and subtitle tracks from YouTube videos in multiple languages. Returns timed transcripts, language codes, and download formats (SRT, VTT, TXT). Ideal for accessibility, translation, research, SEO, and automating transcript content analysis workflows

YouTube Subtitle Extractor

entertained_rattlesnake/youtube-subtitle-extractor

Extract subtitles and transcripts from YouTube videos and export them as JSON, TXT, SRT and VTT.

๐Ÿ‘ User avatar

Entertained Rattlesnake

2

YouTube Transcript API & Bulk Subtitle Downloader

tugelbay/youtube-transcript

Bulk YouTube transcript API for SRT/VTT, Markdown, JSON, and text exports with metadata for AI/RAG, research, subtitles, and content workflows. Guide: https://konabayev.com/tools/youtube-transcript-scraper/?utm_source=apify_info&utm_medium=referral&utm_campaign=youtube-transcript

๐Ÿ‘ User avatar

Tugelbay Konabayev

30

YouTube Transcript & Captions Scraper

benthepythondev/youtube-transcript-scraper

Extract transcripts from any YouTube video with captions. Supports 100+ languages, auto-generated captions, and translation. Output as plain text, SRT, VTT, or JSON with timestamps. Includes video metadata (title, channel, views). Perfect for content repurposing and AI training.