Voozh

.@DecagonAI cut voice agent cost per turn nearly 6x with Together AI. They moved from closed models to fine-tuned open models, while keeping latency low enough for real-time voice: → <400ms p95 model latency per turn → custom speculators and prompt caching → optimized serving on NVIDIA Blackwell → weekly, sometimes daily model deployment velocity This is the closed-to-open shift: more control, better tokenomics, and production performance without being locked into proprietary APIs.

👁 user avatar

Together AI

@togethercompute

Jun 16

👁 Article cover image

Article

How Decagon Engineered Sub-Second Voice AI with Together AI

The challenge with running voice agents Voice latency is audible. Decagon’s leadership frames voice as the most demanding surface because latency is immediately perceptible. Long pauses create awkward...

5:29 PM · Jun 16, 20264.6KViews

URL: https://x.com/togethercompute/status/2066936299836039645

Post

Post