.@DecagonAI cut voice agent cost per turn nearly 6x with Together AI.
They moved from closed models to fine-tuned open models, while keeping latency low enough for real-time voice:
β <400ms p95 model latency per turn
β custom speculators and prompt caching
β optimized serving on NVIDIA Blackwell
β weekly, sometimes daily model deployment velocity
This is the closed-to-open shift: more control, better tokenomics, and production performance without being locked into proprietary APIs.
