VOOZH about

URL: https://apify.com/sian.agency/transcribe-interview-to-text

โ‡ฑ Transcribe Interview to Text โ€” for Journalists & Researchers ยท Apify


๐Ÿ‘ Transcribe Interview to Text โ€” for Journalists & Researchers avatar

Transcribe Interview to Text โ€” for Journalists & Researchers

Pricing

from $0.15 / 1,000 second of video processeds

Go to Apify Store

Transcribe Interview to Text โ€” for Journalists & Researchers

Transcribe interviews and recorded conversations to text. Speaker labels for interviewer and guest, word-level timestamps, SRT/VTT. Try free.

Pricing

from $0.15 / 1,000 second of video processeds

Rating

0.0

(0)

Developer

๐Ÿ‘ SIรN Oรœ

SIรN Oรœ

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

๐Ÿ‘ SIรN Agency Store
๐Ÿ‘ Telegram Support
๐Ÿ‘ Instagram AI Transcript Extractor
๐Ÿ‘ Best TikTok AI Transcript Extractor
๐Ÿ‘ YouTube Shorts AI Transcript Extractor
๐Ÿ‘ Facebook AI Transcript Extractor

Transcribe interviews and recorded conversations to text. Built for journalists, qualitative researchers, market researchers, and anyone with hours of interview tape. Speaker labels for interviewer and guest, word-level timestamps for precise quote extraction, SRT/VTT subtitles, 99+ languages.


How to transcribe an interview in 4 steps

  1. Upload your interview recordings โ€” drop .m4a, .mp3, .wav, .mp4, or any common format into the Upload Interview Recordings field. Bulk uploads supported.
  2. Pick your options โ€” auto-detect language or pick from 99+, toggle speaker diarization to separate the interviewer from each guest, optionally translate non-English interviews to English.
  3. Run the actor โ€” recordings process 10 at a time in parallel on the paid tier; an entire project's interviews can be transcribed in one run.
  4. Download results โ€” every recording lands in the dataset with the transcript, segment + word-level timestamps, speaker labels, and ready-to-use SRT/VTT subtitle strings.

Supported formats: M4A, MP3, WAV, FLAC, AAC, OPUS, OGG, MP4, MOV, WebM. Max 1 GB per file on the paid tier.


Example output โ€” interview transcript with speaker labels

{
"transcript":"Interviewer: Tell me about the first time you realized... Guest: Honestly, it was when my mentor pulled me aside and said...",
"detected_language":"en",
"duration":1432.7,
"segments":[
{
"id":0,
"text":"Tell me about the first time you realized you wanted to do this.",
"start":0.42,
"end":4.18,
"speaker":"SPEAKER_00",
"language":"en",
"words":[
{"word":"Tell","start":0.42,"end":0.61,"speaker":"SPEAKER_00"},
{"word":"me","start":0.61,"end":0.74,"speaker":"SPEAKER_00"}
]
},
{
"id":1,
"text":"Honestly, it was when my mentor pulled me aside.",
"start":4.86,
"end":8.94,
"speaker":"SPEAKER_01",
"language":"en",
"words":[]
}
],
"srt":"1\n00:00:00,420 --> 00:00:04,180\nTell me about the first time you realized you wanted to do this.\n\n2\n00:00:04,860 --> 00:00:08,940\nHonestly, it was when my mentor pulled me aside.",
"vtt":"WEBVTT\n\n00:00:00.420 --> 00:00:04.180\nTell me about the first time you realized you wanted to do this.\n\n00:00:04.860 --> 00:00:08.940\nHonestly, it was when my mentor pulled me aside.",
"speakers":["SPEAKER_00","SPEAKER_01"],
"languages":["en"],
"fileSizeMB":12.6,
"success":true
}

Every result includes the full transcript, segment-level timestamps, word-level timestamps, language detection, recording duration in seconds, file size, ready-to-use srt and vtt subtitle strings, and (when speaker diarization is enabled) speaker labels per segment and per word.


Built for journalists, qualitative researchers, market researchers

  • ๐Ÿ“ฐ Journalists โ€” turn phone-recorded interviews into clean transcripts ready for quote pulling
  • ๐Ÿงช Qualitative researchers โ€” preserve participant voices with speaker separation for thematic analysis
  • ๐Ÿ“Š Market researchers โ€” bulk-transcribe focus group and 1:1 interview tapes
  • ๐Ÿ“š Oral historians โ€” searchable, time-stamped archives of long-form interviews
  • ๐ŸŽ™๏ธ Podcasters publishing interview shows โ€” transcripts for show notes, blog repurposing, and SEO

Speaker diarization (interviewer / guest separation)

Toggle the Speaker Diarization input and the actor automatically labels every segment and every word with the speaker it came from (SPEAKER_00 for the interviewer, SPEAKER_01 for the first guest, SPEAKER_02 for the second, etc.). This makes it trivial to extract clean quote attributions for journalism or coding qualitative data for research. Powered by pyannote-audio. Charged per audio second; only billed when enabled.


Translate foreign-language interviews to English

Toggle Translate to English and the actor returns the transcript translated into English while preserving timing โ€” perfect for conducting interviews in your subject's native language and publishing in English. Combine with Speaker Diarization to get clean, attributed quotes in both directions. Charged separately when enabled.


SRT / VTT subtitle export

Every transcription returns ready-to-use srt and vtt subtitle strings. Save the field value as a .srt or .vtt file and:

  • Publish a video version of the interview with subtitles for YouTube, Vimeo, or your CMS
  • Add HTML5 <track> accessibility captions to embedded video
  • Build a searchable interview archive with timestamps

Set Timestamp Granularities to word for cue precision down to individual words.


Why interviewers choose this transcriber

  • โœ… Interviewer โ†” guest separation out of the box via pyannote-audio diarization โ€” clean attributed quotes ready for publication
  • โฑ๏ธ Word-level timestamps for every word โ€” find any quote in a 90-minute interview in seconds
  • ๐ŸŒ Translate non-English interviews to English in the same run โ€” perfect for international journalism and cross-cultural research
  • ๐ŸŽฌ SRT and VTT subtitles included for video versions of interviews
  • ๐ŸŒ 99+ languages โ€” automatic detection, no manual selection
  • ๐Ÿ‡ช๐Ÿ‡บ EU-region processing for GDPR-aligned research workflows
  • ๐Ÿ’ฐ Pay per audio second โ€” no per-minute Rev.com markups, no Otter subscription
  • ๐Ÿš€ 10ร— parallel on the paid tier โ€” an entire research project's worth of interviews done in one run

Use cases

  • ๐Ÿ“ฐ Investigative journalists transcribing source interviews and pulling attributed quotes for stories
  • ๐ŸŽ™๏ธ Long-form podcasters generating publication-ready transcripts of every guest interview
  • ๐Ÿงช Qualitative researchers coding participant transcripts in NVivo, Atlas.ti, or MAXQDA
  • ๐Ÿ“Š Market research firms transcribing focus groups and customer 1:1 sessions for thematic analysis
  • ๐Ÿ“š Oral history projects preserving long-form recorded interviews with timestamped speaker tracks
  • ๐ŸŽ“ Academic researchers conducting qualitative fieldwork in foreign-language contexts (transcribe + translate in one pass)
  • โœ๏ธ Authors and biographers working from hours of recorded conversations for book material
  • ๐ŸŽฌ Documentary filmmakers preparing rough-cut transcripts of interview tape for editing

Pricing & tiers

Pay only for the audio seconds you actually transcribe. No subscriptions, no minimums.

FREE tierPAID tier
Perfect for testing and small jobsBuilt for production volume
Up to 5 interviews per runUnlimited interviews per run
50 MB max per file1 GB max per file
200 MB / 20 minutes monthlyUnlimited monthly volume
3 concurrent files10 concurrent files (10ร— parallel)
No credit card required$0.0005 per audio second

Optional add-ons (only billed when enabled):

FeaturePrice
Speaker diarization$0.0001 per audio second
Translate to English$0.0003 per audio second
EU-region processing$0.0007 per audio second (replaces base $0.0005)

A 60-minute interview with diarization on the paid tier costs approximately $2.16 ($1.80 transcription + $0.36 diarization). Compare to Rev.com's $1.50/min ($90 for the same interview).


Integration examples

JavaScript / Node.js

import{ ApifyClient }from'apify-client';
const client =newApifyClient({token:'YOUR_APIFY_TOKEN'});
const run =await client.actor('sian.agency/transcribe-interview-to-text').call({
audioFiles:['https://example.com/interview-with-source.m4a'],
speakerDiarization:true,
translateToEnglish:false,
});
const{ items }=await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].transcript);
console.log(items[0].srt);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('sian.agency/transcribe-interview-to-text').call(run_input={
'audioFiles':['https://example.com/interview-with-source.m4a'],
'speakerDiarization':True,
'translateToEnglish':False,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items[0]['transcript'])
print(items[0]['vtt'])

cURL

curl-X POST 'https://api.apify.com/v2/acts/sian.agency~transcribe-interview-to-text/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN'\
-H'Content-Type: application/json'\
-d'{
"audioFiles": ["https://example.com/interview.m4a"],
"speakerDiarization": true
}'

n8n / Zapier / Make

Wire this actor onto a "new file in research-recordings folder" trigger (Dropbox, Google Drive, OneDrive). The dataset record returned per item includes transcript, segments[].words[], srt, and vtt โ€” drop them into Notion (research database), Airtable (interview log), MAXQDA/NVivo (qualitative coding), or Google Docs (story drafts).


FAQ

How accurate is interview transcription? Powered by an industrial speech-to-text pipeline tuned for natural conversation. Accuracy is typically 95โ€“99% on clean studio or quiet-room interviews, lower on phone-recorded or noisy field interviews. Word-level timestamps are returned even when accuracy is imperfect, so you can verify and correct quote attributions quickly.

What audio and video formats are supported? M4A, MP3, WAV, FLAC, AAC, OPUS, OGG, MP4, MOV, WebM. Max 50 MB per file on the free tier, 1 GB on the paid tier.

Can I transcribe foreign-language interviews? Yes โ€” auto-detection across 99+ languages including Spanish, French, German, Mandarin, Japanese, Portuguese, Arabic, Hindi, Russian, and many more. Toggle Translate to English to receive an English transcript alongside the timestamped original.

Is speaker diarization included? Yes, opt-in via the Speaker Diarization toggle. Each segment and word gets labeled SPEAKER_00 (interviewer), SPEAKER_01 (first guest), etc. Powered by pyannote-audio. Billed at $0.0001 per audio second only when enabled.

How does pricing work? Pay-per-audio-second. The free tier covers small jobs and testing without a credit card. The paid tier is $0.0005 per second of audio. A 1-hour interview with diarization is approximately $2.16 โ€” versus Rev.com at ~$90 for the same length.

Can I integrate this into my qualitative research workflow? Yes. The actor exposes a standard Apify run/dataset API. The dataset record includes transcript, segments[].words[], srt, and vtt ready to feed into NVivo, MAXQDA, Atlas.ti, Dovetail, or any qualitative analysis tool that accepts plain text or VTT.

What if my interview is multi-speaker (panel, focus group)? Speaker diarization handles up to ~6 distinct speakers reliably. Each speaker is labeled SPEAKER_00 through SPEAKER_N in temporal order of first speaking turn.

How long does a transcription take? A 60-minute interview takes 1โ€“3 minutes on the paid tier. Bulk batches of 10 interviews complete in 5โ€“10 minutes (parallelized).


Legal disclaimer

Use this actor only on interviews you have rights to transcribe โ€” your own recordings with subject consent, properly licensed media, or material covered by journalistic source agreements. Some jurisdictions require subject consent for recording; you are responsible for compliance with applicable laws and IRB requirements for academic research. The actor does not retain audio or transcripts beyond the run's lifetime. EU-region processing is available via the EU Processing toggle for GDPR-aligned workflows. SIรN Agency provides this actor as-is.


Support

๐Ÿ‘ Telegram Support
๐Ÿ‘ Email
๐Ÿ‘ SIรN Agency

Join the Telegram support group, email apify@sian-agency.online, or open an issue on the SIรN Agency Apify Store page.


More from SIรN Agency

Platform-specific scrapers + transcribers:

Browse the full SIรN Agency Apify Store for all available actors.


You might also like

Transcribe Voice Memo to Text โ€” Speaker Labels & Timestamps

sian.agency/transcribe-voice-memo-to-text

Transcribe iPhone and Android voice memos to text. Speaker labels, word-level timestamps, SRT/VTT. Bulk upload, 99+ languages. Try free.

๐Ÿ‘ User avatar

SIรN Oรœ

4

Transcribe Video to Text & Audio to Text โ€” 99+ Languages

sian.agency/INCREDIBLY-FAST-audio-transcriber

Transcribe video to text and audio to text in bulk on Apify. 99+ languages, word-level timestamps, speaker diarization, SRT/VTT export. Try free.

๐Ÿ‘ User avatar

SIรN Oรœ

92

5.0

Transcribe Podcast to Text โ€” Show Notes, SRT & Timestamps

sian.agency/transcribe-podcast-to-text

Transcribe podcast episodes to text in bulk. Speaker labels for hosts and guests, word-level timestamps, SRT/VTT for show notes. 99+ languages.

๐Ÿ‘ User avatar

SIรN Oรœ

16

Transcribe Zoom Meeting to Text โ€” Bulk Meeting Transcription

sian.agency/transcribe-zoom-meeting-to-text

Transcribe Zoom recordings to text in bulk. Speaker labels for host and participants, word-level timestamps, SRT/VTT export. 99+ languages. Try free.

๐Ÿ‘ User avatar

SIรN Oรœ

5

Video & Audio Transcriber โ€” Word-Level + SRT/VTT

dami_studio/video-audio-transcriber

Transcribe any video or audio URL into accurate text with word-level and segment timestamps, plus ready-to-use SRT, VTT, and TXT files. Auto-detects language. For captions, subtitles, search & repurposing. Bring your own OpenAI API key.

3

5.0

YouTube Video Transcribe

entertained_rattlesnake/youtube-video-transcribe

Transcribe YouTube videos by extracting subtitles and metadata, and push the results directly to the Apify Dataset.

๐Ÿ‘ User avatar

Entertained Rattlesnake

4

Kick VOD Transcription โ€” Stream to Text, SRT & VTT

scrapersdelight/kick-transcript-scraper

Transcribe Kick.com VODs (which have no captions) with AI speech-to-text โ€” searchable transcript in TXT, SRT & VTT plus VOD metadata, by channel or VOD URL. No login or API key. Schedule it to transcribe new VODs automatically. $0.012 per audio minute.

๐Ÿ‘ User avatar

Scrapers Delight

7

Instagram Youtube Transcripts With Speaker Labels Full Account

transcriptdl/instagram-youtube-transcripts-with-speaker-labels-full-account

Verified 99.4% Success. BULK generate transcripts with speaker diarization from Instagram Reels & YouTube videos. Automatically identifies speakers, outputs SRT/VTT subtitles, timestamps & full text. Perfect for podcasts, interviews & meetings. Bulk processing supported.

๐Ÿ‘ User avatar

Transcript Downloader

4

$0.15/min REAL YouTube Transcriber & Subtitles (JSON/SRT/VTT)

practicaltools/apify-youtube-transcribe

Download and transcribe YouTube videos into text and subtitle files โ€“ quickly, locally, and without external APIs. This Apify actor Faster-Whisper to generate transcripts and captions. It saves results in TXT, JSON, SRT, and VTT formats, plus provides a summary in the Dataset.

๐Ÿ‘ User avatar

Practical Tools

68

5.0

Video To Text

truefetch/video-to-text

Transcribe videos from 1,000+ platforms to text โ€” auto language detection, timestamps, subtitle file download, and translation to 100+ languages. No file uploads. $0.30 per video.

251

4.9