Agora Voice Agent with AssemblyAI Universal-3 Pro Streaming

Build a real-time transcription bot that joins Agora channels, captures participant audio as PCM frames, and streams it to AssemblyAI Universal-3 Pro Streaming — with 307ms P50 latency and support for 99+ languages.

Architecture

Browser/Mobile clients
 │ WebRTC (Agora SDK)
 ▼
 Agora Channel
 │ server subscribes as bot user
 ▼
 Python Server Bot
 (agora-python-server-sdk)
 │ PcmAudioFrame per participant
 │ sample_rate=16000, pcm_s16le
 ▼
 AssemblyAI Universal-3 Pro Streaming
 wss://streaming.assemblyai.com/v3/ws
 │ Turn events with transcript
 ▼
 Your application logic
 (drive LLM, store transcript, trigger webhook)

Why Agora + AssemblyAI?

Metric	AssemblyAI Universal-3 Pro	Agora Built-in STT
P50 latency	307ms	~600–900ms
Word Error Rate	8.9%	~14–18%
Speaker diarization	✅ Real-time	❌
LLM Gateway	✅ 20+ models	❌
Languages	99+	Limited
Audio formats	PCM, μ-law, Opus	PCM only

Prerequisites

Python 3.9+
Agora account — App ID and App Certificate
AssemblyAI API key

Quick Start

git clone https://github.com/kelseyefoster/voice-agent-agora-universal-3-pro
cd voice-agent-agora-universal-3-pro

pip install -r requirements.txt
cp .env.example .env
# Fill in AGORA_APP_ID, AGORA_APP_CERT, ASSEMBLYAI_API_KEY

python bot.py --channel my-channel

Environment Setup

AGORA_APP_ID=your_agora_app_id
AGORA_APP_CERT=your_agora_certificate
AGORA_CHANNEL=my-channel
AGORA_BOT_UID=9999
ASSEMBLYAI_API_KEY=your_assemblyai_api_key

Obtain Agora credentials from the Agora Console and your AssemblyAI API key from the AssemblyAI dashboard.

Core Integration

The bot operates concurrently for each participant: pulling audio frames from Agora, forwarding them to AssemblyAI, and handling transcript events.

import asyncio
import json
import os
import websockets
from agora.rtc.agora_service import AgoraService, AgoraServiceConfig
from agora.rtc.rtc_connection import RTCConnConfig
from agora.rtc.agora_base import (
 ClientRoleType,
 ChannelProfileType,
 AudioScenarioType,
)

SAMPLE_RATE = 16000
CHANNELS = 1
AAI_WS_URL = (
 "wss://streaming.assemblyai.com/v3/ws"
 f"?sample_rate={SAMPLE_RATE}"
 "&speech_model=u3-rt-pro"
 "&format_turns=true"
)

async def stream_participant(agora_channel, uid: int, api_key: str):
 headers = {"Authorization": api_key}
 async with websockets.connect(AAI_WS_URL, additional_headers=headers) as ws:
 begin = json.loads(await ws.recv())
 print(f"[uid={uid}] AAI session: {begin['id']}")

 async def send_audio():
 async for frame in agora_channel.get_audio_frames(uid):
 await ws.send(frame.data)

 async def recv_transcripts():
 async for message in ws:
 event = json.loads(message)
 if event["type"] == "Turn" and event.get("end_of_turn"):
 print(f"[uid={uid}] {event['transcript']}")

 await asyncio.gather(send_audio(), recv_transcripts())

Audio Format

Configure Agora to output 16 kHz mono before subscribing — this eliminates resampling and matches AssemblyAI's preferred format:

agora_channel.set_playback_audio_frame_before_mixing_parameters(
 num_of_channels=1,
 sample_rate=16000,
)
agora_channel.subscribe_all_audio()

Each PcmAudioFrame contains 160 samples (10ms) of 16-bit little-endian PCM. AssemblyAI streams them directly without buffering.

Handling Transcripts

The Turn event fires at natural speech boundaries. Route it to your LLM, database, or webhook:

async def recv_transcripts(ws, uid: int):
 async for message in ws:
 event = json.loads(message)
 if event["type"] == "Turn" and event.get("end_of_turn"):
 transcript = event["transcript"]
 print(f"[uid={uid}] {transcript}")
 await send_to_llm(uid, transcript)

Terminating Cleanly

Send a Terminate message to flush the final turn:

async def close_stream(ws):
 await ws.send(json.dumps({"type": "Terminate"}))
 async for message in ws:
 event = json.loads(message)
 if event["type"] == "Termination":
 print(f"Audio processed: {event['audio_duration_seconds']}s")
 break

Production Token Generation

pip install agora-token-builder

from agora_token_builder import RtcTokenBuilder, Role_Subscriber
import time

def generate_bot_token(app_id: str, app_cert: str, channel: str, uid: int) -> str:
 expire = int(time.time()) + 3600
 return RtcTokenBuilder.buildTokenWithUid(
 app_id, app_cert, channel, uid, Role_Subscriber, expire
 )

token = generate_bot_token(
 os.environ["AGORA_APP_ID"],
 os.environ["AGORA_APP_CERT"],
 channel,
 bot_uid,
)
connection.connect(token, channel, str(bot_uid))

URL: https://dev.to/martschweiger/agora-voice-agent-with-assemblyai-universal-3-pro-streaming-40ki