AI Audio to Text Transcriber

Pricing

Pay per event

AI Audio to Text Transcriber

Transcribe audio files to text using OpenAI Whisper. Accepts public audio URLs (MP3, MP4, M4A, WAV, WEBM, OGG, FLAC) and returns full transcripts with language, duration, and timed segments. BYO OpenAI key required.

Pricing

Pay per event

Rating

0.0

(0)

Developer

👁 BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

24 days ago

Last modified

What it does

Accepts a list of public audio file URLs (MP3, MP4, M4A, WAV, WEBM, OGG, FLAC)
Downloads each file to temporary storage (max 25 MB per file — OpenAI limit)
Transcribes via OpenAI Whisper (whisper-1) with verbose_json output
Returns the full text transcript, detected language, audio duration, and segment-level timestamps
Processes up to 3 files concurrently for faster batch runs
Saves one dataset record per file, including error records for files that fail

Use cases

Podcast indexing and search
Meeting recording notes
Compliance and call-center transcription
Generating training data for NLP models
Subtitles and captions for video content
Multilingual content analysis

Input

Field	Type	Required	Description
`audioUrls`	Array	Yes	Public audio file URLs to transcribe
`openaiApiKey`	String	Yes	Your OpenAI API key (`sk-...`). Not stored.
`language`	String	No	ISO 639-1 hint (e.g. `en`, `es`, `ja`). Omit for auto-detect.
`maxItems`	Integer	No	Maximum files to transcribe per run. Default: 15.

Supported audio formats: MP3, MP4, M4A, WAV, WEBM, OGG, FLAC Max file size: 25 MB (OpenAI Whisper hard limit)

Example input

{
"audioUrls":[
"https://example.com/podcast-episode-1.mp3",
"https://example.com/meeting-recording.wav"
],
"openaiApiKey":"sk-...",
"language":"en",
"maxItems":10
}

Output

One dataset record per audio file.

Field	Type	Description
`sourceUrl`	String	Original audio file URL
`transcript`	String	Full verbatim transcription text
`language`	String	Detected language (e.g. `english`, `spanish`)
`durationSeconds`	Number	Audio duration in seconds
`segments`	String	JSON array of timed segments `[{start, end, text}]`
`model`	String	Whisper model used (`whisper-1`)
`transcribedAt`	String	ISO timestamp
`status`	String	`success` or `error`
`errorMsg`	String	Error description on failure, `null` on success

Example output record

{
"sourceUrl":"https://example.com/podcast-ep1.mp3",
"transcript":"Welcome to today's episode. Today we're discussing the future of AI...",
"language":"english",
"durationSeconds":1823.4,
"segments":"[{\"start\":0.0,\"end\":3.2,\"text\":\"Welcome to today's episode.\"}]",
"model":"whisper-1",
"transcribedAt":"2026-05-26T12:00:00Z",
"status":"success",
"errorMsg":null
}

Requirements

OpenAI API key — Bring your own key at https://platform.openai.com/api-keys. Whisper pricing is approximately $0.006 per minute of audio (billed by OpenAI to your account).
Public audio URLs — Files must be publicly accessible without authentication.

Pricing

This actor charges $0.10 per start + $0.001 per file processed (including error records). OpenAI Whisper API costs are separate and billed directly to your OpenAI account.

Error handling

Files that fail to download or transcribe are not dropped — the actor saves an error record to the dataset with status: "error" and a descriptive errorMsg. This ensures your dataset always has one row per input URL for easy reconciliation.

Common errors:

HTTP 401 — Invalid API key
HTTP 429 — OpenAI rate limit exceeded (retry with fewer files or lower concurrency)
File exceeds 25 MB limit — Source file too large for Whisper API
Download timed out — URL not reachable within 60 seconds

👁 Audio And Video Transcriber (OpenAI GPT-4o-transcribe) avatar

Audio And Video Transcriber (OpenAI GPT-4o-transcribe)

stanvanrooy6/audio-video-transcriber

Downloads videos from public URLs, extracts audio, and transcribes them using OpenAI

👁 User avatar

Stan Van Rooy

👁 Audio format converter MP3 WAV FLAC avatar

Audio format converter MP3 WAV FLAC

akash9078/audio-file-converter

Convert audio files between 10+ formats including messaging platform optimized formats. Supports Telegram (OGG), WhatsApp (AMR), Discord (OPUS), plus MP3, WAV, FLAC, AAC, M4A, 3GP, WebM. Perfect for voice messages, podcasts, and cross-platform audio compatibility.

👁 User avatar

Akash Kumar Naik

116

5.0

👁 YouTube Audio Segment Downloader avatar

YouTube Audio Segment Downloader

entertained_rattlesnake/youtube-audio-segment-downloader

Download YouTube audio tracks or selected audio segments as MP3, M4A, or WAV.

👁 User avatar

Entertained Rattlesnake

👁 Audio Noise Remover avatar

Audio Noise Remover

parseforge/noise-remover

Remove background noise from audio files with support for multiple formats. Upload any audio file (MP3, WAV, M4A, FLAC, OGG, AAC) and get a clean, professional-quality audio file. Perfect for podcasters, content creators, and anyone who needs to clean up audio recordings.

👁 User avatar

ParseForge

5.0

👁 Extract Audio From Video avatar

Extract Audio From Video

rixin/extract-audio-from-video

From $1/hr. Extract Audio from Video - Pull audio tracks from MP4, AVI, MKV, and other video formats. Output to MP3, WAV, FLAC, AAC, OGG, or M4A. Adjustable bitrate settings. Perfect for podcasts, music extraction, and transcription prep.

👁 User avatar

Rixin Sc

👁 Audio and Video Transcript (OpenAI Whisper) avatar

Audio and Video Transcript (OpenAI Whisper)

vittuhy/audio-and-video-transcript

This Actor transcribes audio or video files from publicly accessible URLs using OpenAI's Whisper API. To use this Actor, you'll need to provide your own OpenAI API key. It supports multiple languages and highly customizable parameters, enabling precise control over the transcription process.

👁 User avatar

Vít Tuhý

1.8

👁 Instagram Reel Analyzer avatar

Instagram Reel Analyzer

electrifying_haircut/instagram-reel-analyzer

Scrape Instagram reels by URL — extract metadata (caption, likes, comments), download video files, and transcribe audio using OpenAI Whisper. Perfect for content analysis, competitor research, and social media monitoring.

👁 User avatar

Gagan

359

3.0

👁 Audio & Video to Text avatar

Audio & Video to Text

donjuan_mime/audio-video-to-text

Transcribes video and audio files into plain text and subtitle formats (TXT, SRT, VTT, TSV, JSON) using OpenAI's Whisper model. Supports preloaded tiny, base, and small models.

👁 User avatar

Donjuan

👁 YouTube Video Downloader avatar

YouTube Video Downloader

lurkapi/youtube-video-downloader

Download YouTube videos and audio. Supports quality selection (up to 4K), multiple formats (MP4, WebM, MP3, M4A), and returns the file with thumbnail and metadata.

👁 User avatar

LurkAPI

👁 Instagram Audio Scraper - Reels by Audio, Song & Sound avatar

Instagram Audio Scraper - Reels by Audio, Song & Sound

khadinakbar/instagram-audio-scraper

Scrape public Instagram audio usage from audio IDs, audio URLs, Reel URLs, profile Reels, and Reel search queries. Returns audio metadata, Reel URLs, engagement metrics, media links, author metadata, and provider diagnostics. No cookies required.

👁 User avatar

Khadin Akbar

URL: https://apify.com/jungle_synthesizer/ai-audio-to-text-transcriber