Speech-to-Text Converter

Pricing

from $100.00 / 1,000 results

Speech-to-Text Converter

Introducing the Speech-to-Text Converter — Apify Actor! Transform your audio into text effortlessly with our powerful, serverless multi-engine transcription solution on Apify. Experience seamless and accurate transcription like never before!

Pricing

from $100.00 / 1,000 results

Rating

0.0

(0)

Developer

👁 Jamshaid Arif

Jamshaid Arif

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

🎙️ Speech-to-Text Converter — Apify Actor

Serverless multi-engine speech-to-text transcription on Apify.

🛠️ Engines

Engine	Cost	Internet	Best For
Whisper Local	Free	No	Long files, best accuracy, 45+ languages
Google Speech	Free	Yes	Quick short clips, real-time results
Whisper API	Paid	Yes	Fast cloud processing, large files

📁 Supported Formats

Audio: WAV, MP3, FLAC, OGG, M4A, AAC, WMA, OPUS
Video: MP4, MKV, AVI, MOV, WEBM, FLV (audio auto-extracted)
Output: TXT, SRT subtitles, WebVTT, JSON

⚡ Quick Start

Via Apify Console

Select an Engine (Whisper Local recommended)
Paste the Input File URL (direct download link)
Choose Language and Output Format
Click Start

Via API

curl-X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/runs?token=<TOKEN>"\
-H"Content-Type: application/json"\
-d'{
 "engine": "whisper_local",
 "input_file_url": "https://example.com/audio.mp3",
 "language": "en",
 "whisper_model": "small",
 "output_format": "srt"
 }'

📥 Input

Parameter	Type	Default	Description
`engine`	string	`whisper_local`	`whisper_local`, `google`, or `whisper_api`
`input_file_url`	string	—	Direct URL to media file
`language`	string	`en`	Language code or empty for auto-detect
`whisper_model`	string	`small`	`tiny`, `base`, `small`, `medium`, `large`
`openai_api_key`	string	—	OpenAI key (whisper_api only)
`output_format`	string	`txt`	`txt`, `srt`, `vtt`, `json`
`max_file_size_mb`	int	`500`	Max file size limit
`google_chunk_seconds`	int	`55`	Chunk size for Google engine

📤 Output

Dataset contains metadata:

{
"status":"success",
"engine":"whisper_local (small)",
"language_detected":"en",
"input_file":"interview.mp3",
"output_url":"https://api.apify.com/.../interview_transcription.srt",
"duration_audio_sec":342.5,
"processing_time_sec":28.3,
"character_count":4521,
"segment_count":87,
"text_preview":"First 500 characters of the transcription...",
"message":"Transcribed with Whisper 'small' on CPU."
}

Key-Value Store contains:

interview_transcription.txt/srt/vtt/json — the output file (downloadable via output_url)
FULL_RESULT — complete JSON with full text + all segments

🌍 Languages (45+)

English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Turkish, Urdu, Dutch, Polish, Swedish, Danish, Finnish, Norwegian, Greek, Czech, Romanian, Hungarian, Thai, Vietnamese, Indonesian, Malay, Ukrainian, Bulgarian, Croatian, Slovak, Slovenian, Serbian, Hebrew, Bengali, Tamil, Telugu, Malayalam, Kannada, Marathi, Gujarati, Punjabi, Swahili, Afrikaans, Filipino.

👁 Speech to Text Converter (Transcript / Captcha) avatar

Speech to Text Converter (Transcript / Captcha)

saswave/speech-to-text-converter

Transform audio records to text. Get transcription from sales or customer success teams audio files. Get Captcha text from captcha audio challenge. Speech to text converter helps you analyse, build KPI with audio records and bypass captcha.

👁 User avatar

SASWAVE

Speech To Text

vivid_astronaut/speech-to-text

Convert speech to text with high accuracy using Azure AI. Supports 100+ languages, speaker detection, and timestamps. Perfect for transcription, subtitles, and voice-to-text applications.

👁 User avatar

Fabio Suizu

👁 Google Free Text to Speech avatar

Google Free Text to Speech

jupri/google-speech

Use free Google Text to Speech to translate text into voice

👁 User avatar

cat

296

👁 Speech-to-Text Transcription avatar

Speech-to-Text Transcription

hgservices/speech-to-text

Transcribe audio and video from YouTube, TikTok, podcasts, X, and 1,000+ other sites or any direct media URL into accurate, speaker-labeled text. Uses World's best speech to text AI models with automatic language detection, multilingual support, and smart formatting.

👁 User avatar

Harish Garg

5.0

👁 Hugging Face Audio AI avatar

Hugging Face Audio AI

alizarin_refrigerator-owner/hugging-face-audio-ai

Audio w/Hugging Face models speech recognition, text-to-speech & audio analysis Speech-to-Text: Transcribe audio Text-to-Speech: Generate natural speech Audio Classification: Classify sounds Voice Activity Detection: Detect speech Speaker Diarization: Identify speakers Music Generation: Create music

👁 User avatar

The Howlers

👁 Text to Speech Generator avatar

Text to Speech Generator

moving_beacon-owner1/my-actor-30

Convert text into natural-sounding speech in multiple languages with ease.

👁 User avatar

Jamshaid Arif

Speech AI MCP Server

vivid_astronaut/pronunciation-assessment-mcp

Speech AI MCP server with 9 tools: pronunciation scoring (0-100 at phoneme/word/sentence level), speech-to-text with timestamps, text-to-speech with 12 English voices, and multilingual Whisper transcription (99 languages + speaker diarization). Sub-300ms latency. Pay-per-use: $0.02/call.

👁 User avatar

Fabio Suizu

Audio Converter API

vivid_astronaut/audio-converter

👁 User avatar

Fabio Suizu

👁 Text to speech generator avatar

Text to speech generator

akash9078/advanced-text-to-speech

Professional-grade Text-to-Speech (TTS) actor powered by advanced AI models. Convert any text into natural, human-like speech with 50+ premium voices across 9 languages. Perfect for content creation, accessibility, voiceovers, audiobooks, podcasts, and multilingual applications.

👁 User avatar

Akash Kumar Naik

YouTube Speech Dataset Builder

eternallabs/multilingual-codeswitching-scraper

Generate multilingual speech datasets from YouTube using WhisperX, transcription, language detection, and code-switch analysis for ASR training, benchmarking, and speech AI research.

👁 User avatar

Jona

URL: https://apify.com/moving_beacon-owner1/my-actor-72

⇱ Speech-to-Text Converter · Apify

Speech-to-Text Converter

🎙️ Speech-to-Text Converter — Apify Actor

🛠️ Engines

📁 Supported Formats

⚡ Quick Start

Via Apify Console

Via API

📥 Input

📤 Output

🌍 Languages (45+)

You might also like

Speech to Text Converter (Transcript / Captcha)

Speech To Text

Google Free Text to Speech

Speech-to-Text Transcription

Hugging Face Audio AI

Text to Speech Generator

Speech AI MCP Server

Audio Converter API

Text to speech generator

YouTube Speech Dataset Builder