Pricing
from $3.00 / 1,000 results
Twitter / X Video Transcript Scraper
Extract transcripts from Twitter/X video posts. Returns timestamped segments using native Twitter captions (WebVTT) with automatic Whisper AI fallback for uncaptioned videos
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
1
Total users
1
Monthly active users
a month ago
Last modified
Categories
Share
Extract full, timestamped transcripts from Twitter/X video posts β automatically using native Twitter captions (WebVTT) with Whisper AI speech-to-text as a fallback for uncaptioned videos.
Features
- Native captions first β intercepts Twitter's built-in WebVTT subtitle tracks for fastest, most accurate results
- Whisper AI fallback β uses faster-whisper to transcribe audio when no native captions are available
- Timestamped segments β every output row includes
startTime,endTime, andtextfor precise video navigation - Full transcript β each row also carries the complete joined transcript for easy search
- Flexible method control β choose
auto(native β Whisper),native only, orWhisper only - Multi-language support β native captions in any language; optional language hint for Whisper
- Anti-detection β Playwright Firefox with stealth fingerprinting, randomised viewports/user-agents, and human-like delays
Input
| Field | Type | Required | Description |
|---|---|---|---|
postUrls | string[] | β | Twitter/X video post URLs (twitter.com or x.com both accepted) |
cookies | string | β | Twitter/X session cookies JSON (auth_token + ct0 required) |
transcriptionMethod | select | auto (default), native, or whisper | |
whisperModel | select | tiny, base (default), small, medium, large-v2 | |
language | string | ISO 639-1 hint for Whisper (e.g. en, es, fr) | |
proxyConfiguration | object | Apify proxy settings |
How to get Twitter cookies
- Log in to x.com in your browser
- Open DevTools β Application β Cookies β
https://x.com - Copy the
auth_tokenandct0cookie values - Export all cookies as JSON (e.g. using the EditThisCookie browser extension)
- Paste the JSON array into the
cookiesinput field
Cookies expire periodically β re-export if you see expired_cookies errors.
Output
Each dataset row represents one transcript segment. Tweet metadata is repeated on every row for easy filtering.
| Field | Type | Description |
|---|---|---|
tweetUrl | string | Canonical x.com/β¦/status/β¦ URL |
tweetId | string | Numeric tweet ID |
authorUsername | string | Twitter handle (without @) |
authorName | string | Display name |
tweetText | string | Tweet caption / body text |
publishedAt | string | ISO 8601 publish timestamp |
language | string | ISO 639-1 language code |
transcriptMethod | string | native or whisper |
transcriptAvailable | boolean | false for tweets with no extractable transcript |
segmentIndex | integer | 0-based position within the transcript |
startTime | float | Segment start time in seconds |
endTime | float | Segment end time in seconds |
text | string | Segment transcript text |
fullTranscript | string | All segments joined into one string |
scrapedAt | string | ISO 8601 scrape timestamp |
Sample output record
{"tweetUrl":"https://x.com/NASA/status/1858131747319566780","tweetId":"1858131747319566780","authorUsername":"NASA","authorName":"NASA","tweetText":"Watch our latest discovery announcementβ¦","publishedAt":"2024-11-17T18:30:00.000Z","language":"en","transcriptMethod":"native","transcriptAvailable":true,"segmentIndex":0,"startTime":0.0,"endTime":3.44,"text":"We made a remarkable discovery this week","fullTranscript":"We made a remarkable discovery this week that changes our understanding of the solar system.","scrapedAt":"2025-01-15T10:22:33.456Z"}
Transcription Methods
| Method | When to use | Speed | Accuracy |
|---|---|---|---|
auto | Default β tries native first, Whisper fallback | Fast when native available | High |
native | Only want videos with Twitter captions | Fastest | Highest (verbatim) |
whisper | All videos, including those without captions | Slower | High (model-dependent) |
Whisper Model Selection
| Model | Size | Speed | Use case |
|---|---|---|---|
tiny | 32 MB | Fastest | Quick drafts, high-volume runs |
base | 74 MB | Fast | Default β good balance |
small | 244 MB | Medium | Better accuracy for accented speech |
medium | 769 MB | Slow | High accuracy |
large-v2 | 1550 MB | Slowest | Best quality, multiple languages |
Memory Requirements for Long Videos (Whisper)
The actor automatically splits long audio into 10-minute chunks, so there is no video length limit. However, Whisper keeps the model and current chunk in RAM simultaneously:
| Video length | Recommended memory |
|---|---|
| Up to ~30 minutes | 2048 MB (default) |
| 30 min β 2 hours | 4096 MB |
| 2 hours+ | 8192 MB |
To set memory in the Apify UI: open your actor run β Input β Options β Memory. Native-caption runs have no meaningful memory requirement regardless of video length.
Limitations
- Cookies required β Twitter restricts video access to authenticated sessions
- Native captions availability β Not all Twitter videos have auto-generated captions; use
whispermethod for full coverage - Rate limits β Twitter may throttle rapid scraping; the actor applies human-like delays between requests
- Proxy recommended β For high-volume runs, use Apify residential proxy to avoid IP bans
FAQ
Q: Why do I need cookies? Twitter requires authentication to serve video pages and caption tracks. Without cookies the actor cannot access video content.
Q: What if a video has no captions and I use method=native?
The actor outputs a single row per tweet with transcriptAvailable: false and no segment fields. Switch to method=auto or method=whisper to use Whisper AI for those videos.
Q: Can I scrape multiple videos at once?
Yes β add multiple URLs to postUrls. The actor processes them sequentially with delays to avoid rate limiting.
Q: Does this work with Twitter Spaces audio? No β Twitter Spaces use a different streaming format. This actor targets video posts only.
Q: How do I filter by language?
All output rows include a language field. Use Apify's dataset filtering to select rows by language code.
