Customer stories

The teams shipping AI at production scale

Together AI is the end-to-end platform trusted for reliability, leading price economics, and research-backed performance. Hear from the teams building on the AI Native Cloud.

👁 Image

How Cursor partnered with Together AI to deliver real-time, low-latency inference at scale

Inference, GPU clusters, RESEARCH • Enterprise

👁 Image

👁 Decagon company logo with geometric shape and text.

How Decagon Engineered Sub-Second Voice AI with Together AI

Inference, GPU clusters, RESEARCH

Cost reduction per turn vs. gpt-5 mini

👁 Image

👁 Vercept logo with stylized V and text.

11x

Faster Inference

👁 Image

All customer stories

👁 Image

<500ms

time-to-first-token

How Deep Cogito trained and deployed frontier reasoning models on Together AI

GPU Clusters

Inference

👁 Image

faster inference

How Yutori runs browser-use AI agents at production scale on Together AI’s inference platform

Inference

👁 Image

90ms

Model latency

How Cartesia Runs Real-Time Voice AI on Together AI’s GPU Infrastructure

GPU Clusters

Training

Inference

👁 XyAI Labs logo with slogan The AI native company for healthcare in black text and design.

87%

EOB accuracy

How XY.AI Labs Built Customer-Specific EOB Parsers with Serverless Fine-Tuning

Fine-Tuning

👁 Image

6×

cost per turn

How Decagon Engineered Sub-Second Voice AI with Together AI

Inference

Fine-Tuning

👁 Image

72 GPUs

GB200 NVL72 topology

Learn how Cursor partnered with Together AI to deliver real-time, low-latency inference at scale

Inference

👁 Image

~3 months

time saved

How Scaled Cognition Trains APT-1 on Together AI GPU Clusters

GPU Clusters

Training

👁 Image

5-10×

vs. competitors

How Runware Scales Generative Video & Image APIs with Together AI's Flexible GPU Infrastructure

GPU Clusters

Inference

👁 Image

7×

training cost

Together AI’s Instant Clusters Enable Latent Health to Build Clinical AI That Outperforms GPT-4

GPU Clusters

👁 Image

2 seconds

response time

How The Washington Post Achieved AI Independence with Reliable Inference

Inference

👁 Image

training frequency

How Slingshot AI Accelerated Mental Health AI with Fine-tuning at Together AI

Fine-Tuning

👁 Image

10×

faster launch

How HeroUI Chat launched 10x faster with Together Code Sandbox

Code Sandbox

👁 Image

60%

cost savings

How Hedra Scales Viral AI Video Generation with 60% Cost Savings

Inference

GPU Clusters

Training

👁 Image

95%

faster TTFT

From AWS to Together Dedicated Endpoints: Arcee AI's journey to greater inference flexibility

Inference

👁 Image

3 months

faster launch

How LegionEdge Built a Real-Time AI Prototyping Platform with Together Code Sandbox

Code Sandbox

👁 Image

50%

cloud savings

Building World-Class Thai Language Models with Purpose-Built AI Infrastructure

Training

GPU Clusters

Inference

👁 Image

92%

vs. OpenAI

When Standard Inference Frameworks Failed, Together AI Enabled 5x Performance Breakthrough

Inference

👁 Image

CSAT score

How Zomato built an AI customer support bot that doubled customer satisfaction and scaled to over 1,000 messages per minute

Inference

👁 Black cartoon cat with large green eyes and pink nose on a gradient blue-green background.

0.4s

median TTFT

Scaling AI Companions: How Dippy AI Reached Over 4 Million Tokens/Minute with Together Dedicated Endpoints

Inference

Scale your infrastructure with Together

View All Stories

👁 Smiling man with short dark hair wearing a black shirt and dark gray blazer against a light background.

👁 Image

"Together AI offers optimized performance at scale, and at a lower cost than closed-source providers – all while maintaining strict privacy standards."

Vineet Khosla

CTO, The Washington Post

👁 Smiling man with black hair and glasses wearing a light blue collared shirt against a white background.

👁 Image

~33%
Cost savings
2x
Latency reduction

"We’ve been thoroughly impressed with Together. They delivered a 2x reduction in latency and cut our costs by approximately a third."

Caiming Xiong

VP, Salesforce AI Research

👁 Image

"Together GPU Clusters provided a combination of amazing training performance, expert support, and the ability to scale to meet our rapid growth to help us serve our growing community of AI creators."

👁 Young woman with long dark hair smiling outdoors wearing a white turtleneck and statement earrings.

Demi Guo

CEO, Pika

👁 KREA logo with abstract figure to left of the bold uppercase text.

“Together AI provides the performance and reliability we need for real-time, high-quality image and video generation at scale. We value that Together AI is much more than an infrastructure provider — they're a true innovation partner, enabling us to push creative boundaries without compromise.”

👁 Young man wearing a cap sprays graffiti on a wall with a spray paint can in black and white.

Victor Perez

Co-Founder, Krea

👁 Young man with dark hair and earrings smiling, wearing a gray plaid jacket against a blurred brick background.

👁 Image

“Together AI’s infrastructure has the capacity to soak up our viral moments without breaking a sweat. During major traffic surges, Dedicated Container Inference scales seamlessly while maintaining performance. And because we trained on Together’s Accelerated Compute, deploying to production was frictionless—one platform, zero artifact transfers, no deployment headaches.”

Terrance Wang

Founding ML Engineer, Hedra