👁 Transcribe Interview to Text — for Journalists & Researchers avatar

Transcribe Interview to Text — for Journalists & Researchers

Pricing

from $0.15 / 1,000 second of video processeds

👁 Transcribe Interview to Text — for Journalists & Researchers

Transcribe Interview to Text — for Journalists & Researchers

Transcribe interviews and recorded conversations to text. Speaker labels for interviewer and guest, word-level timestamps, SRT/VTT. Try free.

Pricing

from $0.15 / 1,000 second of video processeds

Rating

0.0

(0)

Developer

👁 SIÁN OÜ

SIÁN OÜ

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

How to transcribe an interview in 4 steps

Upload your interview recordings — drop .m4a, .mp3, .wav, .mp4, or any common format into the Upload Interview Recordings field. Bulk uploads supported.
Pick your options — auto-detect language or pick from 99+, toggle speaker diarization to separate the interviewer from each guest, optionally translate non-English interviews to English.
Run the actor — recordings process 10 at a time in parallel on the paid tier; an entire project's interviews can be transcribed in one run.
Download results — every recording lands in the dataset with the transcript, segment + word-level timestamps, speaker labels, and ready-to-use SRT/VTT subtitle strings.

Supported formats: M4A, MP3, WAV, FLAC, AAC, OPUS, OGG, MP4, MOV, WebM. Max 1 GB per file on the paid tier.

Example output — interview transcript with speaker labels

{
"transcript":"Interviewer: Tell me about the first time you realized... Guest: Honestly, it was when my mentor pulled me aside and said...",
"detected_language":"en",
"duration":1432.7,
"segments":[
{
"id":0,
"text":"Tell me about the first time you realized you wanted to do this.",
"start":0.42,
"end":4.18,
"speaker":"SPEAKER_00",
"language":"en",
"words":[
{"word":"Tell","start":0.42,"end":0.61,"speaker":"SPEAKER_00"},
{"word":"me","start":0.61,"end":0.74,"speaker":"SPEAKER_00"}
]
},
{
"id":1,
"text":"Honestly, it was when my mentor pulled me aside.",
"start":4.86,
"end":8.94,
"speaker":"SPEAKER_01",
"language":"en",
"words":[]
}
],
"srt":"1\n00:00:00,420 --> 00:00:04,180\nTell me about the first time you realized you wanted to do this.\n\n2\n00:00:04,860 --> 00:00:08,940\nHonestly, it was when my mentor pulled me aside.",
"vtt":"WEBVTT\n\n00:00:00.420 --> 00:00:04.180\nTell me about the first time you realized you wanted to do this.\n\n00:00:04.860 --> 00:00:08.940\nHonestly, it was when my mentor pulled me aside.",
"speakers":["SPEAKER_00","SPEAKER_01"],
"languages":["en"],
"fileSizeMB":12.6,
"success":true
}

Every result includes the full transcript, segment-level timestamps, word-level timestamps, language detection, recording duration in seconds, file size, ready-to-use srt and vtt subtitle strings, and (when speaker diarization is enabled) speaker labels per segment and per word.

Built for journalists, qualitative researchers, market researchers

📰 Journalists — turn phone-recorded interviews into clean transcripts ready for quote pulling
🧪 Qualitative researchers — preserve participant voices with speaker separation for thematic analysis
📊 Market researchers — bulk-transcribe focus group and 1:1 interview tapes
📚 Oral historians — searchable, time-stamped archives of long-form interviews
🎙️ Podcasters publishing interview shows — transcripts for show notes, blog repurposing, and SEO

Speaker diarization (interviewer / guest separation)

Toggle the Speaker Diarization input and the actor automatically labels every segment and every word with the speaker it came from (SPEAKER_00 for the interviewer, SPEAKER_01 for the first guest, SPEAKER_02 for the second, etc.). This makes it trivial to extract clean quote attributions for journalism or coding qualitative data for research. Powered by pyannote-audio. Charged per audio second; only billed when enabled.

Translate foreign-language interviews to English

Toggle Translate to English and the actor returns the transcript translated into English while preserving timing — perfect for conducting interviews in your subject's native language and publishing in English. Combine with Speaker Diarization to get clean, attributed quotes in both directions. Charged separately when enabled.

SRT / VTT subtitle export

Every transcription returns ready-to-use srt and vtt subtitle strings. Save the field value as a .srt or .vtt file and:

Publish a video version of the interview with subtitles for YouTube, Vimeo, or your CMS
Add HTML5 <track> accessibility captions to embedded video
Build a searchable interview archive with timestamps

Set Timestamp Granularities to word for cue precision down to individual words.

Why interviewers choose this transcriber

✅ Interviewer ↔ guest separation out of the box via pyannote-audio diarization — clean attributed quotes ready for publication
⏱️ Word-level timestamps for every word — find any quote in a 90-minute interview in seconds
🌐 Translate non-English interviews to English in the same run — perfect for international journalism and cross-cultural research
🎬 SRT and VTT subtitles included for video versions of interviews
🌍 99+ languages — automatic detection, no manual selection
🇪🇺 EU-region processing for GDPR-aligned research workflows
💰 Pay per audio second — no per-minute Rev.com markups, no Otter subscription
🚀 10× parallel on the paid tier — an entire research project's worth of interviews done in one run

Use cases

📰 Investigative journalists transcribing source interviews and pulling attributed quotes for stories
🎙️ Long-form podcasters generating publication-ready transcripts of every guest interview
🧪 Qualitative researchers coding participant transcripts in NVivo, Atlas.ti, or MAXQDA
📊 Market research firms transcribing focus groups and customer 1:1 sessions for thematic analysis
📚 Oral history projects preserving long-form recorded interviews with timestamped speaker tracks
🎓 Academic researchers conducting qualitative fieldwork in foreign-language contexts (transcribe + translate in one pass)
✍️ Authors and biographers working from hours of recorded conversations for book material
🎬 Documentary filmmakers preparing rough-cut transcripts of interview tape for editing

Pricing & tiers

Pay only for the audio seconds you actually transcribe. No subscriptions, no minimums.

FREE tier	PAID tier
Perfect for testing and small jobs	Built for production volume
Up to 5 interviews per run	Unlimited interviews per run
50 MB max per file	1 GB max per file
200 MB / 20 minutes monthly	Unlimited monthly volume
3 concurrent files	10 concurrent files (10× parallel)
No credit card required	$0.0005 per audio second

Optional add-ons (only billed when enabled):

Feature	Price
Speaker diarization	$0.0001 per audio second
Translate to English	$0.0003 per audio second
EU-region processing	$0.0007 per audio second (replaces base $0.0005)

A 60-minute interview with diarization on the paid tier costs approximately $2.16 ($1.80 transcription + $0.36 diarization). Compare to Rev.com's $1.50/min ($90 for the same interview).

Integration examples

JavaScript / Node.js

import{ ApifyClient }from'apify-client';
const client =newApifyClient({token:'YOUR_APIFY_TOKEN'});
const run =await client.actor('sian.agency/transcribe-interview-to-text').call({
audioFiles:['https://example.com/interview-with-source.m4a'],
speakerDiarization:true,
translateToEnglish:false,
});
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].transcript);
console.log(items[0].srt);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('sian.agency/transcribe-interview-to-text').call(run_input={
'audioFiles':['https://example.com/interview-with-source.m4a'],
'speakerDiarization':True,
'translateToEnglish':False,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items[0]['transcript'])
print(items[0]['vtt'])

cURL

curl-X POST 'https://api.apify.com/v2/acts/sian.agency~transcribe-interview-to-text/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN'\
-H'Content-Type: application/json'\
-d'{
 "audioFiles": ["https://example.com/interview.m4a"],
 "speakerDiarization": true
 }'

n8n / Zapier / Make

Wire this actor onto a "new file in research-recordings folder" trigger (Dropbox, Google Drive, OneDrive). The dataset record returned per item includes transcript, segments[].words[], srt, and vtt — drop them into Notion (research database), Airtable (interview log), MAXQDA/NVivo (qualitative coding), or Google Docs (story drafts).

FAQ

How accurate is interview transcription? Powered by an industrial speech-to-text pipeline tuned for natural conversation. Accuracy is typically 95–99% on clean studio or quiet-room interviews, lower on phone-recorded or noisy field interviews. Word-level timestamps are returned even when accuracy is imperfect, so you can verify and correct quote attributions quickly.

What audio and video formats are supported? M4A, MP3, WAV, FLAC, AAC, OPUS, OGG, MP4, MOV, WebM. Max 50 MB per file on the free tier, 1 GB on the paid tier.

Can I transcribe foreign-language interviews? Yes — auto-detection across 99+ languages including Spanish, French, German, Mandarin, Japanese, Portuguese, Arabic, Hindi, Russian, and many more. Toggle Translate to English to receive an English transcript alongside the timestamped original.

Is speaker diarization included? Yes, opt-in via the Speaker Diarization toggle. Each segment and word gets labeled SPEAKER_00 (interviewer), SPEAKER_01 (first guest), etc. Powered by pyannote-audio. Billed at $0.0001 per audio second only when enabled.

How does pricing work? Pay-per-audio-second. The free tier covers small jobs and testing without a credit card. The paid tier is $0.0005 per second of audio. A 1-hour interview with diarization is approximately $2.16 — versus Rev.com at ~$90 for the same length.

Can I integrate this into my qualitative research workflow? Yes. The actor exposes a standard Apify run/dataset API. The dataset record includes transcript, segments[].words[], srt, and vtt ready to feed into NVivo, MAXQDA, Atlas.ti, Dovetail, or any qualitative analysis tool that accepts plain text or VTT.

What if my interview is multi-speaker (panel, focus group)? Speaker diarization handles up to ~6 distinct speakers reliably. Each speaker is labeled SPEAKER_00 through SPEAKER_N in temporal order of first speaking turn.

How long does a transcription take? A 60-minute interview takes 1–3 minutes on the paid tier. Bulk batches of 10 interviews complete in 5–10 minutes (parallelized).

Legal disclaimer

Use this actor only on interviews you have rights to transcribe — your own recordings with subject consent, properly licensed media, or material covered by journalistic source agreements. Some jurisdictions require subject consent for recording; you are responsible for compliance with applicable laws and IRB requirements for academic research. The actor does not retain audio or transcripts beyond the run's lifetime. EU-region processing is available via the EU Processing toggle for GDPR-aligned workflows. SIÁN Agency provides this actor as-is.

Support

👁 Telegram Support
👁 Email
👁 SIÁN Agency

Join the Telegram support group, email apify@sian-agency.online, or open an issue on the SIÁN Agency Apify Store page.

More from SIÁN Agency

Platform-specific scrapers + transcribers:

Browse the full SIÁN Agency Apify Store for all available actors.

👁 Transcribe Voice Memo to Text — Speaker Labels & Timestamps avatar

Transcribe Voice Memo to Text — Speaker Labels & Timestamps

sian.agency/transcribe-voice-memo-to-text

Transcribe iPhone and Android voice memos to text. Speaker labels, word-level timestamps, SRT/VTT. Bulk upload, 99+ languages. Try free.

👁 User avatar

SIÁN OÜ

👁 Transcribe Video to Text & Audio to Text — 99+ Languages avatar

Transcribe Video to Text & Audio to Text — 99+ Languages

sian.agency/INCREDIBLY-FAST-audio-transcriber

Transcribe video to text and audio to text in bulk on Apify. 99+ languages, word-level timestamps, speaker diarization, SRT/VTT export. Try free.

👁 User avatar

SIÁN OÜ

5.0

👁 Transcribe Podcast to Text — Show Notes, SRT & Timestamps avatar

Transcribe Podcast to Text — Show Notes, SRT & Timestamps

sian.agency/transcribe-podcast-to-text

Transcribe podcast episodes to text in bulk. Speaker labels for hosts and guests, word-level timestamps, SRT/VTT for show notes. 99+ languages.

👁 User avatar

SIÁN OÜ

👁 Transcribe Zoom Meeting to Text — Bulk Meeting Transcription avatar

Transcribe Zoom Meeting to Text — Bulk Meeting Transcription

sian.agency/transcribe-zoom-meeting-to-text

Transcribe Zoom recordings to text in bulk. Speaker labels for host and participants, word-level timestamps, SRT/VTT export. 99+ languages. Try free.

👁 User avatar

SIÁN OÜ

👁 Video & Audio Transcriber — Word-Level + SRT/VTT avatar

Video & Audio Transcriber — Word-Level + SRT/VTT

dami_studio/video-audio-transcriber

Transcribe any video or audio URL into accurate text with word-level and segment timestamps, plus ready-to-use SRT, VTT, and TXT files. Auto-detects language. For captions, subtitles, search & repurposing. Bring your own OpenAI API key.

👁 User avatar

Dami's Studio

5.0

👁 YouTube Video Transcribe avatar

YouTube Video Transcribe

entertained_rattlesnake/youtube-video-transcribe

Transcribe YouTube videos by extracting subtitles and metadata, and push the results directly to the Apify Dataset.

👁 User avatar

Entertained Rattlesnake

👁 Kick VOD Transcription — Stream to Text, SRT & VTT avatar

Kick VOD Transcription — Stream to Text, SRT & VTT

scrapersdelight/kick-transcript-scraper

Transcribe Kick.com VODs (which have no captions) with AI speech-to-text — searchable transcript in TXT, SRT & VTT plus VOD metadata, by channel or VOD URL. No login or API key. Schedule it to transcribe new VODs automatically. $0.012 per audio minute.

👁 User avatar

Scrapers Delight

👁 Instagram Youtube Transcripts With Speaker Labels Full Account avatar

Instagram Youtube Transcripts With Speaker Labels Full Account

transcriptdl/instagram-youtube-transcripts-with-speaker-labels-full-account

Verified 99.4% Success. BULK generate transcripts with speaker diarization from Instagram Reels & YouTube videos. Automatically identifies speakers, outputs SRT/VTT subtitles, timestamps & full text. Perfect for podcasts, interviews & meetings. Bulk processing supported.

👁 User avatar

Transcript Downloader

👁 $0.15/min REAL YouTube Transcriber & Subtitles (JSON/SRT/VTT) avatar

$0.15/min REAL YouTube Transcriber & Subtitles (JSON/SRT/VTT)

practicaltools/apify-youtube-transcribe

Download and transcribe YouTube videos into text and subtitle files – quickly, locally, and without external APIs. This Apify actor Faster-Whisper to generate transcripts and captions. It saves results in TXT, JSON, SRT, and VTT formats, plus provides a summary in the Dataset.

👁 User avatar

Practical Tools

5.0

👁 Video To Text avatar

Video To Text

truefetch/video-to-text

Transcribe videos from 1,000+ platforms to text — auto language detection, timestamps, subtitle file download, and translation to 100+ languages. No file uploads. $0.30 per video.

👁 User avatar

TrueFetch

251

4.9

URL: https://apify.com/sian.agency/transcribe-interview-to-text