👁 Video & Audio Transcriber — Word-Level + SRT/VTT avatar

Video & Audio Transcriber — Word-Level + SRT/VTT

Pricing

from $20.00 / 1,000 transcribed minutes

👁 Video & Audio Transcriber — Word-Level + SRT/VTT

Video & Audio Transcriber — Word-Level + SRT/VTT

Transcribe any video or audio URL into accurate text with word-level and segment timestamps, plus ready-to-use SRT, VTT, and TXT files. Auto-detects language. For captions, subtitles, search & repurposing. Bring your own OpenAI API key.

Pricing

from $20.00 / 1,000 transcribed minutes

Rating

5.0

(1)

Developer

👁 Dami's Studio

Dami's Studio

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

Video & Audio Transcriber

Give it a public video or audio URL and it returns accurate text with segment and word-level timestamps, plus ready-to-use SRT, VTT, and TXT files. It detects the spoken language automatically. Built for people who need captions, searchable transcripts, or source text to repurpose into clips, articles, or show notes.

How it works

The actor downloads your media, extracts the audio track with ffmpeg, and sends it to OpenAI's Whisper on your own API key. The timestamps and subtitle files come straight from the model's segment and word data, so timing lines up with the actual speech.

Input

Field	Required	Notes
`mediaUrl`	yes	Public URL to a video or audio file (mp4, mov, mp3, wav, m4a, webm, and similar).
`language`	no	ISO code of the spoken language, or `auto` to detect it. Defaults to `auto`.
`wordTimestamps`	no	Return per-word start/end times. Useful for karaoke-style captions. On by default.
`outputFormats`	no	Which files to generate: any of `srt`, `vtt`, `txt`. Defaults to `srt` and `vtt`.
`openaiApiKey`	yes	Your OpenAI (Whisper) key. Kept private and used only for this run.

There are two advanced fields if you need them: model (defaults to whisper-1) and baseUrl for an OpenAI-compatible endpoint.

Output

One dataset record per run. It includes the detected language, the full text, segments with start/end times, and words when word timestamps are enabled, along with wordCount, segmentCount, and durationSeconds. Each requested subtitle file is saved to the key-value store and referenced by srtKey/srtUrl, vttKey/vttUrl, and txtKey/txtUrl.

Example

{
"mediaUrl":"https://example.com/podcast.mp3",
"language":"auto",
"wordTimestamps":true,
"outputFormats":["srt","vtt","txt"],
"openaiApiKey":"sk-..."
}

Pricing

$0.04 per minute of audio, pay per result, no subscription. You bring your own OpenAI key, so Whisper usage is billed by OpenAI separately.

Notes

The mediaUrl has to be directly downloadable. Pages that require login or stream behind a player won't work, so point it at the raw file. Long files take longer and cost more since billing is per minute of audio.

👁 Transcribe Video to Text & Audio to Text — 99+ Languages avatar

Transcribe Video to Text & Audio to Text — 99+ Languages

sian.agency/INCREDIBLY-FAST-audio-transcriber

Transcribe video to text and audio to text in bulk on Apify. 99+ languages, word-level timestamps, speaker diarization, SRT/VTT export. Try free.

👁 User avatar

SIÁN OÜ

5.0

TikTok Transcript Scraper - JSON, SRT, VTT

jamhimself/tiktok-transcript-scraper

Extract TikTok video transcripts and subtitles as clean JSON, text, SRT, VTT, or RAG chunks with timestamps. Native captions, bulk, no API key, pay per video.

👁 User avatar

Jaime Martinez

👁 $0.15/min REAL YouTube Transcriber & Subtitles (JSON/SRT/VTT) avatar

$0.15/min REAL YouTube Transcriber & Subtitles (JSON/SRT/VTT)

practicaltools/apify-youtube-transcribe

Download and transcribe YouTube videos into text and subtitle files – quickly, locally, and without external APIs. This Apify actor Faster-Whisper to generate transcripts and captions. It saves results in TXT, JSON, SRT, and VTT formats, plus provides a summary in the Dataset.

👁 User avatar

Practical Tools

5.0

👁 Transcribe Podcast to Text — Show Notes, SRT & Timestamps avatar

Transcribe Podcast to Text — Show Notes, SRT & Timestamps

sian.agency/transcribe-podcast-to-text

Transcribe podcast episodes to text in bulk. Speaker labels for hosts and guests, word-level timestamps, SRT/VTT for show notes. 99+ languages.

👁 User avatar

SIÁN OÜ

👁 Transcribe Interview to Text — for Journalists & Researchers avatar

Transcribe Interview to Text — for Journalists & Researchers

sian.agency/transcribe-interview-to-text

Transcribe interviews and recorded conversations to text. Speaker labels for interviewer and guest, word-level timestamps, SRT/VTT. Try free.

👁 User avatar

SIÁN OÜ

👁 Audio & Video to Text avatar

Audio & Video to Text

donjuan_mime/audio-video-to-text

Transcribes video and audio files into plain text and subtitle formats (TXT, SRT, VTT, TSV, JSON) using OpenAI's Whisper model. Supports preloaded tiny, base, and small models.

👁 User avatar

Donjuan

YouTube Transcript Scraper - JSON, SRT, VTT, RAG

jamhimself/youtube-transcript-extractor

Extract YouTube transcripts & subtitles as JSON, text, SRT, VTT, or RAG chunks - bulk, 100+ languages, timestamps & deep links. Pay per video, no subscription.

👁 User avatar

Jaime Martinez

👁 YouTube Word-Level Transcript avatar

YouTube Word-Level Transcript

zerrouki-samir/youtube-wordlevel-transcript

🎯 **UNPRECEDENTED WORD-LEVEL PRECISION** 🎯 Transform any YouTube video into precise transcripts with timestamps for EVERY SINGLE WORD. ✨ **Key Features:** • Word-level timestamps with millisecond accuracy • 99.9% reliability guaranteed • Structured JSON output

👁 User avatar

Samir Zerrouki

👁 Loom Video Transcript Scraper — TXT, SRT, VTT (No Login) avatar

Loom Video Transcript Scraper — TXT, SRT, VTT (No Login)

scrapersdelight/loom-transcript-scraper

Extract any public Loom video's transcript — no login, no ASR. Reads Loom's own auto-captions from the share page: full text, timestamped segments & SRT/VTT, plus title, owner and duration. Schedule it to transcribe new videos in a folder. $2 per 1,000 videos.

👁 User avatar

Scrapers Delight

👁 YouTube Transcript & Captions Scraper avatar

YouTube Transcript & Captions Scraper

benthepythondev/youtube-transcript-scraper

Extract transcripts from any YouTube video with captions. Supports 100+ languages, auto-generated captions, and translation. Output as plain text, SRT, VTT, or JSON with timestamps. Includes video metadata (title, channel, views). Perfect for content repurposing and AI training.

👁 User avatar

ben

139

URL: https://apify.com/dami_studio/video-audio-transcriber