VOOZH about

URL: https://deepinfra.com/moonshotai/Kimi-K2.6

⇱ moonshotai/Kimi-K2.6 - Demo - DeepInfra


We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud β€” read the announcement

Kimi-K2.6

$0.75

in

$3.50

out

$0.15

cached

/ 1M tokens

TierInputOutputCached input
Priority (1.5Γ—)Learn More
$1.125$5.25$0.225

per 1M tokens

Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration.

Deploy Private Endpoint
Supports Priority Tier
Public
fp4
262,144
JSON
Function
Multimodal
0.00s

Settings

Model Information

πŸ“°  Tech Blog

1. Model Introduction

Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration.

Key Features

  • Long-Horizon Coding: K2.6 achieves significant improvements on complex, end-to-end coding tasks, generalizing robustly across programming languages (Rust, Go, Python) and domains spanning front-end, DevOps, and performance optimization.
  • Coding-Driven Design: K2.6 is capable of transforming simple prompts and visual inputs into production-ready interfaces and lightweight full-stack workflows, generating structured layouts, interactive elements, and rich animations with deliberate aesthetic precision.
  • Elevated Agent Swarm: Scaling horizontally to 300 sub-agents executing 4,000 coordinated steps, K2.6 can dynamically decompose tasks into parallel, domain-specialized subtasks, delivering end-to-end outputs from documents to websites to spreadsheets in a single autonomous run.
  • Proactive & Open Orchestration: For autonomous tasks, K2.6 demonstrates strong performance in powering persistent, 24/7 background agents that proactively manage schedules, execute code, and orchestrate cross-platform operations without human oversight.

2. Model Summary

ArchitectureMixture-of-Experts (MoE)
Total Parameters1T
Activated Parameters32B
Number of Layers (Dense layer included)61
Number of Dense Layers1
Attention Hidden Dimension7168
MoE Hidden Dimension (per Expert)2048
Number of Attention Heads64
Number of Experts384
Selected Experts per Token8
Number of Shared Experts1
Vocabulary Size160K
Context Length256K
Attention MechanismMLA
Activation FunctionSwiGLU
Vision EncoderMoonViT
Parameters of Vision Encoder400M

3. Evaluation Results

BenchmarkKimi K2.6GPT-5.4
(xhigh)
Claude Opus 4.6
(max effort)
Gemini 3.1 Pro
(thinking high)
Kimi K2.5
Agentic
HLE-Full
(w/ tools)
54.052.153.051.450.2
BrowseComp83.282.783.785.974.9
BrowseComp
(Agent Swarm)
86.378.4
DeepSearchQA
(f1-score)
92.578.691.381.989.0
DeepSearchQA
(accuracy)
83.063.780.660.277.1
WideSearch
(item-f1)
80.8---72.7
Toolathlon50.054.647.248.827.8
MCPMark55.962.5*56.7*55.9*29.5
Claw Eval (pass^3)62.360.370.457.852.3
Claw Eval (pass@3)80.978.482.482.975.4
APEX-Agents27.933.333.032.011.5
OSWorld-Verified73.175.072.7-63.3
Coding
Terminal-Bench 2.0
(Terminus-2)
66.765.4*65.468.550.8
SWE-Bench Pro58.657.753.454.250.7
SWE-Bench Multilingual76.7-77.876.9*73.0
SWE-Bench Verified80.2-80.880.676.8
SciCode52.256.651.958.948.7
OJBench (python)60.6-60.370.754.7
LiveCodeBench (v6)89.6-88.891.785.0
Reasoning & Knowledge
HLE-Full34.739.840.044.430.1
AIME 202696.499.296.798.395.8
HMMT 2026 (Feb)92.797.796.294.787.1
IMO-AnswerBench86.091.475.391.0*81.8
GPQA-Diamond90.592.891.394.387.6
Vision
MMMU-Pro79.481.273.983.0*78.5
MMMU-Pro (w/ python)80.182.177.385.3*77.7
CharXiv (RQ)80.482.8*69.180.2*77.5
CharXiv (RQ) (w/ python)86.790.0*84.789.9*78.7
MathVision87.492.0*71.2*89.8*84.2
MathVision (w/ python)93.296.1*84.6*95.7*85.0
BabyVision39.849.714.851.636.5
BabyVision (w/ python)68.580.2*38.4*68.3*40.5
V* (w/ python)96.998.4*86.4*96.9*86.9

4. Native INT4 Quantization

Kimi-K2.6 adopts the same native int4 quantization method as Kimi-K2-Thinking.

πŸ‘ Built With Love in Palo Alto

Β© 2026 DeepInfra. All rights reserved.