VOOZH about

URL: https://apify.com/ethereal_wool/youtube-transcript-scraper

โ‡ฑ YouTube Transcript Scraper & API ยท Apify


๐Ÿ‘ ๐ŸŽฅ YouTube Transcript Scraper avatar

๐ŸŽฅ YouTube Transcript Scraper

Pricing

$3.00 / 1,000 results

Go to Apify Store

๐ŸŽฅ YouTube Transcript Scraper

Extract YouTube transcript data โ€” name, and more. Scrape by keyword, URL or ID. Export to JSON, CSV & Excel, use the API, schedule runs and integrate. No code required.

Pricing

$3.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Jackie Chen

Jackie Chen

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Categories

Share

YouTube Transcript & Subtitle Scraper

๐Ÿ‘ youtube-transcript-scraper

Scrape YouTube transcripts and subtitles for any public video. Give one or more video IDs or URLs and get every available caption track โ€” language, whether it is auto-generated (ASR) or human-authored, whether it is translatable, and the downloadable timedtext URL โ€” together with the video's title, channel, length and view count.

Unofficial. This Actor is not affiliated with, authorized, or endorsed by YouTube or Google LLC. It is an independent tool that retrieves publicly available data via a third-party API. Use it in compliance with YouTube's Terms of Service and all applicable laws; you are responsible for how you use the retrieved data.

What it does

  • Caption discovery โ€” for each video, lists all caption tracks YouTube exposes (e.g. English, Spanish, auto-generated English), with the language code, a human-readable name, the kind (asr = auto-generated), isTranslatable, and the transcriptUrl (a YouTube timedtext URL).
  • Video metadata โ€” every item also carries the parent video's videoTitle, channel, channelId, lengthSeconds, viewCount and shortDescription.
  • Filtering โ€” keep only certain languages, or only auto-generated tracks.
  • Transcript text (best effort) โ€” optionally tries to download and flatten the caption file into plain text. See the note below.

Input

FieldTypeDefaultDescription
videoIdsstring[]["dQw4w9WgXcQ"]Video IDs or full watch / youtu.be / shorts URLs.
languageCodesstring[][]Keep only tracks whose language code matches (e.g. en, es). Empty = all.
autoGeneratedOnlybooleanfalseKeep only ASR (auto-generated) tracks.
fetchTranscriptTextbooleanfalseAttempt to download the transcript text (best effort, see note).
maxItemsinteger50Max total caption tracks across all videos.

Example input

{
"videoIds":["dQw4w9WgXcQ","https://www.youtube.com/watch?v=jNQXAC9IVRw"],
"languageCodes":["en"],
"autoGeneratedOnly":false,
"fetchTranscriptText":true,
"maxItems":100
}

Output

One dataset item per caption track:

{
"videoId":"dQw4w9WgXcQ",
"url":"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"videoTitle":"Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)",
"channel":"Rick Astley",
"channelId":"UCuAXFkgsw1L7xaCfnd5JJOw",
"lengthSeconds":213,
"viewCount":1779355962,
"languageCode":"en",
"language":"English",
"kind":"asr",
"isAutoGenerated":true,
"isTranslatable":true,
"vssId":".en",
"transcriptUrl":"https://www.youtube.com/api/timedtext?v=dQw4w9WgXcQ&...",
"source":"video:dQw4w9WgXcQ"
}

Notes

  • Transcript text is best-effort. YouTube signs each timedtext URL against the IP that requested it, so a server-side download frequently returns an error. When fetchTranscriptText is enabled the Actor still tries, but transcriptText may come back empty. The transcriptUrl is always provided so you can fetch the caption file yourself (append &fmt=json3, &fmt=srv3, or &fmt=vtt) from the appropriate IP.
  • Data is sourced live; YouTube / the upstream edge occasionally rate-limits, so the Actor retries transient blocks with exponential backoff.
  • Video IDs are de-duplicated within a run.

Quick start

  1. Open the Actor and press Run โ€” the default input works out of the box.
  2. Adjust the input fields below to your target (keywords, IDs, or URLs) and set maxItems to cap spend.
  3. Grab results from the Dataset tab as JSON / CSV / Excel, or pull them via the Apify API and MCP from your own code.

No proxies to configure, no cookies to paste, no login โ€” the Actor handles everything server-side.

Why developers pick this transcript scraper

Transcript actors are the picks-and-shovels of the AI boom โ€” and most charge $10 per 1,000 videos or quietly fail on half their runs. This Actor fetches YouTube transcripts/captions via a direct HTTP API at $3 per 1,000 videos, returned as timestamped segments plus a ready-to-use plain-text field. It's built for piping into LLMs: no HTML to clean, no SRT parsing, no browser.

What people build with it

  • RAG knowledge bases โ€” index transcripts of conference talks, tutorials and reviews so your assistant can cite video content like documents.
  • Content repurposing โ€” turn long-form videos into newsletters, blog posts and social threads with one LLM step on top of the transcript.
  • Competitor channel analysis โ€” what topics, hooks and phrases do the top channels in your niche actually use? Transcripts answer at scale.
  • Compliance & moderation โ€” audit what's being said in sponsored or branded videos without watching hours of footage.
  • Subtitle workflows โ€” timestamped segments drop straight into translation and dubbing pipelines.
  • Research corpora โ€” build searchable text datasets from playlists or whole channels.

Tips for better results

  • Works with standard video URLs, Shorts URLs, or bare video IDs.
  • Combine with YouTube Search or YouTube Channel Videos to discover videos first, then transcript them in bulk โ€” a two-actor pipeline that turns any topic into a text corpus.
  • Each segment carries start and duration, so you can deep-link to the exact second a phrase is spoken (youtu.be/ID?t=123).

Why this Actor

  • Direct API, no headless browser โ€” fast, stable runs with nothing to babysit.
  • No login, no cookies โ€” we never touch your accounts, so there's no ban risk.
  • Fresh, real-time data โ€” every run reads the source live, not a stale cache.
  • Pay per result โ€” you're billed only for the rows actually delivered.
  • Structured JSON โ€” export to CSV, Excel, or JSON, or pull straight from the API / MCP.

Use cases

  • Build clean text corpora for LLM fine-tuning and RAG.
  • Repurpose long video into blogs, summaries, and clips.
  • Make video searchable and translatable at scale.
  • Feed transcripts into topic modeling and keyword research.

FAQ

Do I need an account, cookies, or to log in anywhere? No. The Actor talks to a fast, direct HTTP API server-side โ€” you just provide inputs and run it.

How am I billed? Pay-per-result: a fixed price per row returned, with no separate platform/compute charge. Caps like maxItems keep spend predictable.

Can I run it on a schedule or call it from my app? Yes โ€” use Apify Schedules, the REST API, the JavaScript / Python clients, or the MCP server. See the API tab.

Is this affiliated with YouTube? No. It's an independent tool that collects publicly available data. Use it in line with the platform's terms and applicable law.

More YouTube scrapers by us

Browse the full fleet โ†’ https://apify.com/ethereal_wool

You might also like