VOOZH

URL: https://www.together.ai/fine-tuning

⇱ Fine-Tuning | Together AI

🚀 Now serving MiniMax-M3 for efficient inference →

⚡ On-demand B200s now available on Together GPU Clusters →

📊 Delivering 31% more TPS than the next-fastest OSS engine for production coding agent workloads →

💬 How Together built the world's fastest speech-to-text stack →

🇫🇷 Join us at RAISE 2026 in Paris →

Fine-Tuning

Fine-tune open-source models for real production use

Improve accuracy, reduce hallucinations, and control behavior — without managing training infrastructure.

Start fine-tuning

Talk to an expert

Why fine-tune models with Together AI?

Build models that are faster, more accurate, and fully yours

Reliable infrastructure at any scale

Multi-node orchestration that eliminates job failures. Fine-tune 100B+ models (DeepSeek-V3, Qwen3-235B) that break other platforms, with the reliability to experiment rapidly.

Research-driven performance gains

ML systems research built into every job. Train with 2-4x longer contexts at no extra cost, advanced DPO variants from SOTA recipes, and continuous optimizations that make your runs faster over time.

Universal model compatibility

Fine-tune any open-source model from Hugging Face Hub. No vendor lock-in, no format conversions — seamless integration with your existing workflows.

Fine-tune leading models

Explore top-performing models across text, image, video, code, and voice.

Deploy own model

new

Chat

DeepSeek V4 Pro

👁 Black stylized letter P with a semicircle shape on the left side on white background.

Chat

Gemma-4-31B-it-Pearl

new

Chat

Qwen3.7-Max

new

Chat

NVIDIA Nemotron 3 Ultra

new

Chat

MiniMax M3

new

Chat

Kimi K2.7 Code

new

Chat

Qwen3.7-Plus

new

Chat

GLM-5.2

Chat

gpt-oss-120B

new

Chat

LFM2 24B A2B

Chat

Qwen3.5-397B-A17B

Chat

MiniMax M2.5

Chat

GLM-5

Chat

Qwen3-Coder-Next

Chat

Kimi K2.5

Image

Wan 2.6 Image

Image

GPT Image 1.5

New

Chat

Qwen3.5 9B

new

Chat

GLM-5.1

new

Chat

Gemma 4 31B

Have your own model?

Deploy custom containers on Together’s managed GPU infrastructure with automatic scaling, job queues, and built-in observability.

Fine-tuning options

Choose how fine-tuned models are trained and hosted based on dataset size, cost, and control.

LoRA fine-tuning
Lightweight fine-tuning for fast iteration and lower cost.
Best for
Small to medium datasets
Fast training & deployment
Easy to update or roll back
Get started
Full fine-tuning
Train the entire model for maximum control and quality.
Best for
Large or complex datasets
Deeper behavior changes
Dedicated infrastructure
Get started

Everything you need to fine-tune at scale

Fine-tune any open-source model on your data. Deploy securely onto scalable infrastructure.

- 👁 Image
  Large frontier model support
  👁 Image
  100B+ param models
  👁 Image
  Multi-GPU training
  👁 Image
  Faster training
  Fine-tune large open-source models like Kimi-K2 and GLM-4.7 for tool use, reasoning, and agentic tasks. Drive advanced model behavior through a single API without managing underlying training infrastructure.
  Explore the docs
- 👁 Image
  Vision fine-tuning
  👁 Image
  LoRA & full fine-tuning
  👁 Image
  PNG, JPEG, WEBP
  👁 Image
  Deploy instantly
  Train vision models directly on raw image data without format changes or special preprocessing. Include images alongside text to fine-tune Llama-4, Qwen3-VL, and Gemma-3 via standard APIS.
  Explore the docs
- 👁 Image
  Tool-calling training
  👁 Image
  Use existing agent logs
  👁 Image
  Native function calling
  Train models for precise tool execution by integrating function definitions and tool calls directly into datasets. Process existing agent logs as-is to improve accuracy without manually restructuring data.
  Explore the docs
- 👁 Image
  Cost estimation
  👁 Image
  See cost estimates
  👁 Image
  No surprises
  Estimate training costs before launching any job directly from the UI or CLI. Evaluate resource requirements upfront to eliminate budget surprises.
  Explore the docs
👁 Image
👁 Image
👁 Image
{ "model": "zai-org/GLM-5", "messages": [ { "role": "user", "content": "What is the best GPU provider?" } ], "tools": [ { "type": "function", "function": { "name": "web_search", "description": "Search the web for real-time information", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "The search query" } }, "required": ["query"] } } } ] }
👁 Image

Powered by leading research

Our fine-tuning infrastructure is built on research and optimized for scale, efficiency, and production performance.

👁 Image
👁 Image
UPipe
👁 Image
👁 Image
FFT Optimizer

Throughput (TPS)
- Upipe
- FPDT
- ALST
👁 Image
👁 Image
UPipe vs other SOTA Approaches
82.5% less memory
Long-context training hits a memory wall at the attention layer. UPipe processes attention heads in smaller chunks, cutting peak activation memory by up to 82.5% — enabling 5M token context lengths on a single 8×H100 node.
learn more
Context parallelism approaches on long-context training
- Together AI (DCT)
- Baseline (LD)
👁 Image
👁 Image
FFT Optimizer results
25% less memory
Fine-tuning large models is memory-hungry. Our FFT-based optimizer replaces expensive SVD projections with fast Fourier transforms, reducing optimizer memory by up to 25% with no loss in training quality.
learn more

Advanced model shaping capabilities

For teams pushing models beyond standard fine-tuning

Speculative decoding
Accelerate inference with custom speculative decoding, training lightweight draft models to predict multiple tokens
Quantization
Apply FP8 and NVFP4 quantization to push the limits of model efficiency, maximizing hardware utilization with minimal quality loss.
Reinforcement learning
Leverage PyTorch-based reinforcement learning to shape model policies for reasoning, tool use, and long-horizon agentic behavior.

Production-grade
security and data privacy

We take security and compliance seriously, with strict data privacy controls to keep your information protected. Your data and models remain fully under your ownership, safeguarded by robust security measures.

SOC 2 Type II. HIPAA-aligned options available. Encryption in transit and at rest. Deploy storage in regions matching your data residency requirements—North America, Europe, or Asia/Middle East based on your compliance needs.

👁 NVIDIA logo with text Preferred Partner on a black background.
NVIDIA preferred partner
👁 Image
👁 Image
AICPA SOC 2 Type II
👁 Image

Customers running inference in production

View All Stories

👁 Logo with stylized letters XYAI followed by the text LABS in dark gradient color.

2-3x
Cost savings
13%
Better accuracy

"Together AI does for fine-tuning and inference what Vercel does for LLM-based apps—it removes the infrastructure layer so we can focus on our product. We fine‑tune and deploy customer‑specific models through simple API calls. That lets our existing team move from weekly to daily iteration, cut costs by 2–3×, and improve accuracy from 77% to 87%."

👁 Bearded man with a shaved head wearing a black shirt smiling in a blurred indoor setting.

Lamara De Brouwer

Co-Founder & CTO, XY.AI Labs

👁 Smiling man with dark hair and beard wearing a black shirt standing indoors.

"The technical challenge was running our multi-stage pipeline reliably at the conversation lengths our therapy models require," explains Daniel Cahn. "Together's platform eliminated the context length constraints and job failures we hit elsewhere, letting us experiment rapidly."

Daniel Cahn

Co-founder & CEO, Slingshot AI

👁 Man wearing glasses and a dark suit jacket over a white shirt outdoors with green trees in background.

"After thoroughly evaluating multiple LLM infrastructure providers, we’re thrilled to be partnering with Together AI for fine-tuning. The new ability to resume from a checkpoint combined with LoRA serving has enabled our customers to deeply tune our foundation model, ShieldLlama, for their enterprise’s precise risk posture. The level of accuracy would never be possible with vanilla open source or prompt engineering."

Alex Chung

Founder, Protege AI

View All Stories