VOOZH about

URL: https://www.together.ai/customers

⇱ Customer Stories | Together AI


Customer stories

The teams shipping AI at production scale

Together AI is the end-to-end platform trusted for reliability, leading price economics, and research-backed performance. Hear from the teams building on the AI Native Cloud.

πŸ‘ Image

How Cursor partnered with Together AI to deliver real-time, low-latency inference at scale

Inference, GPU clusters, RESEARCH  β€’  Enterprise

πŸ‘ Decagon company logo with geometric shape and text.

How Decagon Engineered Sub-Second Voice AI with Together AI

Inference, GPU clusters, RESEARCH

6x

Cost reduction per turn vs. gpt-5 mini

All customer stories

πŸ‘ Image

<500ms

time-to-first-token

How Deep Cogito trained and deployed frontier reasoning models on Together AI

GPU Clusters
Inference
πŸ‘ Image

2x

faster inference

How Yutori runs browser-use AI agents at production scale on Together AI’s inference platform

Inference
πŸ‘ Image

90ms

Model latency

How Cartesia Runs Real-Time Voice AI on Together AI’s GPU Infrastructure

GPU Clusters
Training
Inference

How XY.AI Labs Built Customer-Specific EOB Parsers with Serverless Fine-Tuning

Fine-Tuning
πŸ‘ Image

6Γ—

cost per turn

How Decagon Engineered Sub-Second Voice AI with Together AI

Inference
Fine-Tuning
πŸ‘ Image

72 GPUs

GB200 NVL72 topology

Learn how Cursor partnered with Together AI to deliver real-time, low-latency inference at scale

Inference
πŸ‘ Image

~3 months

time saved

How Scaled Cognition Trains APT-1 on Together AI GPU Clusters

GPU Clusters
Training
πŸ‘ Image

5-10Γ—

vs. competitors

How Runware Scales Generative Video & Image APIs with Together AI's Flexible GPU Infrastructure

GPU Clusters
Inference
πŸ‘ Image

7Γ—

training cost

Together AI’s Instant Clusters Enable Latent Health to Build Clinical AI That Outperforms GPT-4

GPU Clusters
πŸ‘ Image

2 seconds

response time

How The Washington Post Achieved AI Independence with Reliable Inference

Inference
πŸ‘ Image

3x

training frequency

How Slingshot AI Accelerated Mental Health AI with Fine-tuning at Together AI

Fine-Tuning
πŸ‘ Image

10Γ—

faster launch

How HeroUI Chat launched 10x faster with Together Code Sandbox

Code Sandbox
πŸ‘ Image

60%

cost savings

How Hedra Scales Viral AI Video Generation with 60% Cost Savings

Inference
GPU Clusters
Training
πŸ‘ Image

95%

faster TTFT

From AWS to Together Dedicated Endpoints: Arcee AI's journey to greater inference flexibility

Inference
πŸ‘ Image

3 months

faster launch

How LegionEdge Built a Real-Time AI Prototyping Platform with Together Code Sandbox

Code Sandbox
πŸ‘ Image

50%

cloud savings

Building World-Class Thai Language Models with Purpose-Built AI Infrastructure

Training
GPU Clusters
Inference
πŸ‘ Image

92%

vs. OpenAI

When Standard Inference Frameworks Failed, Together AI Enabled 5x Performance Breakthrough

Inference
πŸ‘ Image

2x

CSAT score

How Zomato built an AI customer support bot that doubled customer satisfaction and scaled to over 1,000 messages per minute

Inference

Scaling AI Companions: How Dippy AI Reached Over 4 Million Tokens/Minute with Together Dedicated Endpoints

Inference

Scale your infrastructure with Together

"Together AI offers optimized performance at scale, and at a lower cost than closed-source providers – all while maintaining strict privacy standards."

Vineet Khosla

CTO, The Washington Post

πŸ‘ Image
  • ~33%

    Cost savings

  • 2x

    Latency reduction

"We’ve been thoroughly impressed with Together. They delivered a 2x reduction in latency and cut our costs by approximately a third."

Caiming Xiong

VP, Salesforce AI Research

"Together GPU Clusters provided a combination of amazing training performance, expert support, and the ability to scale to meet our rapid growth to help us serve our growing community of AI creators."

β€œTogether AI provides the performance and reliability we need for real-time, high-quality image and video generation at scale. We value that Together AI is much more than an infrastructure provider β€” they're a true innovation partner, enabling us to push creative boundaries without compromise.”

β€œTogether AI’s infrastructure has the capacity to soak up our viral moments without breaking a sweat. During major traffic surges, Dedicated Container Inference scales seamlessly while maintaining performance. And because we trained on Together’s Accelerated Compute, deploying to production was frictionlessβ€”one platform, zero artifact transfers, no deployment headaches.”

Terrance Wang

Founding ML Engineer, Hedra

Why Together AI?

πŸ‘ Image
workload at scale

72 GPUs

NVIDIA Blackwell on Together Managed Clusters

πŸ‘ Image
Lower latency

95Γ—

With Together Inference