![]() |
VOOZH | about |
On January 27, 2026, Chinese AI company Moonshot AI released Kimi K2.5 — and the tech world took notice.
Within days, independent evaluations confirmed what the company claimed: Kimi K2.5 performs on par with the best AI models from OpenAI, Google, and Anthropic on many key benchmarks. But unlike ChatGPT, Claude, or Gemini, Kimi K2.5 is open source. That means developers can download it, modify it, and build with it freely.
Kimi K2.5 has arrived! 🥝
— Kimi.ai (@Kimi_Moonshot) January 27, 2026
Here are 2 things to know: Aesthetic Coding x Agent Swarm. pic.twitter.com/cqfrlaTrZF
This matters because, until now, the most capable AI models have been locked behind expensive APIs controlled by a handful of American companies. Kimi K2.5 represents a shift — proof that open-source models can compete at the frontier of AI capability.
According to Artificial Analysis, an independent AI benchmarking firm, “Moonshot’s Kimi K2.5 is the new leading open weights model, now closer than ever to the frontier — with only OpenAI, Anthropic, and Google models ahead.”
Let’s break down what makes this model significant, how it compares to the competition, and what it means for you.
Moonshot AI (月之暗面, meaning “Dark Side of the Moon” in Chinese) is a Beijing-based artificial intelligence company founded in March 2023. The company was started by Yang Zhilin, a former researcher at Google and Meta AI, along with co-founders Zhou Xinyu and Wu Yuxin — all schoolmates from Tsinghua University, China’s top engineering school.
Despite being less than three years old, Moonshot has become one of China’s “AI Tigers” — a group of startups leading the country’s charge in artificial intelligence. The company has attracted backing from some of the biggest names in Chinese tech:
Funding History:
As of late January 2026, reports suggest Moonshot is already being valued at $4.8 billion in fresh funding discussions. The company now holds more than 10 billion yuan (approximately $1.4 billion) in cash reserves.
Unlike some competitors rushing to IPO, Moonshot’s founder Yang Zhilin stated in an internal letter that the company is “in no rush for an IPO in the short term,” preferring to focus on model development.
Here's a short video from our founder, Zhilin Yang.
— Kimi.ai (@Kimi_Moonshot) January 27, 2026
(It's his first time speaking on camera like this, and he really wanted to share Kimi K2.5 with you!) pic.twitter.com/2uDSOjCjly
For those unfamiliar with AI terminology, here’s a plain-language explanation of what Kimi K2.5 actually is.
Kimi K2.5 uses what’s called a Mixture-of-Experts (MoE) architecture. Think of it like a company with many specialized departments. When you ask a question, instead of routing it through every department, the system identifies which specialists are best suited to help and sends your request only to them.
The numbers:
This design makes Kimi K2.5 extremely efficient. It has the knowledge of a trillion-parameter model but only uses the computational power of a 32-billion-parameter model for any given task.
One of Kimi K2.5’s standout features is its native multimodal capability. This means the model can understand text, images, and videos — not as an afterthought, but as a core part of its design.
Many AI models add vision capabilities later, bolting image understanding onto a text-based system. Kimi K2.5 was trained from the ground up on approximately 15 trillion mixed visual and text tokens, making its understanding of images and videos more natural and integrated.
The vision system is powered by MoonViT, Moonshot’s proprietary vision encoder with 400 million parameters. This component translates visual information into a format the AI can understand and reason about.
Supported formats:
Kimi K2.5 supports a 256,000-token context window. In practical terms, this means the model can process and remember roughly 150,000-200,000 words of text in a single conversation — enough to analyze entire books, lengthy legal documents, or hours of conversation history.
The model offers multiple ways to interact with it:
Kimi K2.5 isn’t just another chatbot that can “see” images. Its visual understanding is deeply integrated with its reasoning capabilities.
Image Understanding:
Video Understanding:
According to VentureBeat, “This is the first time that the leading open weights model has supported image input, removing a critical barrier to the adoption of open weights models compared to proprietary models from the frontier labs.”
Previously, if you wanted an AI that could see and reason about images at a high level, your only options were expensive proprietary models from OpenAI, Google, or Anthropic. Kimi K2.5 changes that equation.
On the MMMU Pro visual reasoning benchmark, Kimi K2.5 scores 78.5% — slightly behind Google’s Gemini 3 Pro (81%) but competitive with GPT-5.2.
On VideoMMMU, which measures video understanding, Kimi K2.5 achieves 86.6%, slightly ahead of GPT-5.2 and just behind Gemini 3 Pro.
This is perhaps the most innovative feature of Kimi K2.5, and the hardest to explain in simple terms.
In AI, an “agent” is a system that can take actions autonomously — not just answer questions, but actually do things. An agent might browse the web, write files, run code, or interact with software tools.
Most AI assistants today are single agents. You give them a task, they work on it step by step, and return a result.
Kimi K2.5’s Agent Swarm takes a different approach. When given a complex task, instead of working through it sequentially, the model can:
Think of it like a project manager who can instantly hire and coordinate a team of specialists, rather than doing everything themselves.
Moonshot developed a training method called Parallel-Agent Reinforcement Learning (PARL). The model learns not just how to complete tasks, but how to effectively break them down and delegate.
The system creates specialized roles dynamically — “AI Researcher,” “Physics Researcher,” “Fact Checker” — based on what each task requires. These roles aren’t predefined; the model learns to create whatever specialists it needs.
Moonshot demonstrated the system with a task: identify the top three YouTube creators in 100 different niches.
A single agent would need to research each niche sequentially — a time-consuming process. Kimi K2.5’s Agent Swarm created 100 sub-agents that researched all niches simultaneously, compiled the results, and delivered a structured table.
According to Moonshot’s benchmarks:
Agent Swarm is currently in beta and available to paying users. Free users can experiment with it using provided credits.
Kimi K2.5 positions itself as a particularly strong coding assistant, with several unique capabilities.
Like other frontier models, Kimi K2.5 can generate code from plain-English descriptions. You describe what you want, and it writes the code.
Here’s where it gets interesting. Because Kimi K2.5 understands images and videos natively, you can:
Moonshot calls this “coding with vision” — the ability to communicate what you want through visual examples rather than precise technical specifications.
According to TechCrunch, this enables “a new class of vibe coding experiences” where “interfaces, layouts, and interactions that are difficult to describe precisely in language can be communicated through screenshots or screen recordings.”
Kimi K2.5 can also debug visually. It can:
This happens without human intervention — the model catches and fixes its own visual mistakes.
To leverage these capabilities, Moonshot released Kimi Code — a command-line coding assistant similar to Anthropic’s Claude Code or GitHub Copilot.
Features:
| Benchmark | Kimi K2.5 | Claude Opus 4.5 | GPT-5.2 | Gemini 3 Pro |
|---|---|---|---|---|
| SWE-Bench Verified | 76.8% | 80.9% | 80.0% | 74.2% |
| SWE-Bench Multilingual | 73.0% | 77.5% | 71.8% | 69.3% |
| LiveCodeBench v6 | 85.0% | 83.2% | 84.1% | 81.7% |
Kimi K2.5 doesn’t beat Claude or GPT on pure software engineering benchmarks, but it’s competitive — and significantly cheaper.
Let’s look at how Kimi K2.5 performs against the leading AI models across various categories.
| Benchmark | Kimi K2.5 | Claude Opus 4.5 | GPT-5.2 | Gemini 3 Pro |
|---|---|---|---|---|
| Humanity’s Last Exam (w/ tools) | 50.2% | 49.8% | 49.1% | 47.3% |
| GPQA Diamond | 87.6% | 89.1% | 92.4% | 88.3% |
| AIME 2025 (Math) | 96.1% | 97.8% | 100% | 95.2% |
On “Humanity’s Last Exam” — a challenging test designed by experts — Kimi K2.5 actually edges out both GPT-5.2 and Claude Opus 4.5. However, on pure mathematical reasoning (AIME) and general knowledge (GPQA), the American models maintain an edge.
This is where Kimi K2.5 shines.
| Benchmark | Kimi K2.5 | Claude Opus 4.5 | GPT-5.2 | Gemini 3 Pro |
|---|---|---|---|---|
| BrowseComp | 74.9% | 71.2% | 65.8% | 59.2% |
| DeepSearchQA | 77.1% | 76.1% | 74.3% | 72.8% |
| GDPval-AA (Artificial Analysis) | 1309 Elo | 1342 Elo | 1328 Elo | 1287 Elo |
On web browsing and search tasks, Kimi K2.5 outperforms all competitors. On Artificial Analysis’s agentic benchmark, it trails only Claude and GPT — impressive for an open-source model.
| Benchmark | Kimi K2.5 | Claude Opus 4.5 | GPT-5.2 | Gemini 3 Pro |
|---|---|---|---|---|
| MMMU Pro | 78.5% | 77.2% | 78.1% | 81.0% |
| VideoMMMU | 86.6% | 84.1% | 85.9% | 87.2% |
Kimi K2.5 is competitive on visual benchmarks, matching or slightly exceeding GPT-5.2 and Claude Opus 4.5, though Gemini 3 Pro maintains a slight lead.
One notable strength: Kimi K2.5 has a comparatively low hallucination rate of 64% — meaning when the model doesn’t know something, it’s more likely to admit uncertainty rather than make up an answer. This is reduced from Kimi K2 Thinking’s 74% hallucination rate.
This is where Kimi K2.5 becomes particularly attractive.
| Model | Input (Cache Miss) | Input (Cache Hit) | Output |
|---|---|---|---|
| Kimi K2.5 | $0.60/M tokens | $0.10/M tokens | $3.00/M tokens |
| Claude Opus 4.5 | $15.00/M tokens | $1.50/M tokens | $75.00/M tokens |
| GPT-5.2 | $10.00/M tokens | $1.00/M tokens | $30.00/M tokens |
| DeepSeek V3.2 | $0.27/M tokens | $0.07/M tokens | $1.10/M tokens |
For a typical request generating 5,000 output tokens:
| Model | Cost Per Request |
|---|---|
| DeepSeek V3.2 | $0.0095 |
| Kimi K2.5 | $0.0138 |
| GPT-5.2 | $0.0190 |
| Claude Opus 4.5 | $0.0210 |
Kimi K2.5 costs roughly 27-35% less than GPT-5.2 and Claude Opus 4.5 for similar tasks.
According to CNBC, Kimi K2.5’s training cost was approximately $4.6 million — slightly less than DeepSeek V3’s $5.6 million training cost.
For context, training costs for American frontier models are estimated in the hundreds of millions of dollars. The efficiency of Chinese labs in producing competitive models at a fraction of the cost has become a major talking point in the AI industry.
The easiest way to try Kimi K2.5 is through Moonshot’s consumer products:
Four modes are available:
Developers can access Kimi K2.5 through Moonshot’s API, which is fully compatible with OpenAI’s API format. If you’ve built applications using OpenAI’s API, switching to Kimi requires minimal code changes.
Base URL: https://api.moonshot.ai/v1
As an open-source model, Kimi K2.5 weights are available on:
The model is released in native INT4 precision, making it approximately 595GB — large, but manageable for enterprise deployments.
Don’t want to deal with API keys, code, or technical setup?
Fello AI gives you access to the world’s best AI models — all in one app. Chat with ChatGPT, Claude, Gemini, and more without switching between apps or managing multiple subscriptions. Kimi K2.5 is coming to Fello AI the first week in February! You’ll be able to:
Whether you’re curious about the hype or looking for a cheaper alternative to ChatGPT Plus, Fello AI makes it easy to explore.
No model is perfect. Here’s an honest assessment of where Kimi K2.5 falls short.
On competition-level math problems, Kimi K2.5 lags behind. On AIME 2025, it scores 96.1% compared to GPT-5.2’s perfect 100%. If you need an AI for math olympiad-level problems, GPT-5.2 remains the better choice.
While competitive, Kimi K2.5 doesn’t beat Claude Opus 4.5 on software engineering benchmarks (76.8% vs 80.9% on SWE-Bench Verified). Reviews note occasional “logic errors in generated code — syntactically correct but functionally broken.”
Like other multimodal models, Kimi K2.5 can miss exact design specifications. Testing revealed that “exact border radii, specific color values, or subtle spacing adjustments may require iterative refinement.”
Some users have reported unstable tool calls, particularly when not using the Thinking mode. The model performs best when chain-of-thought reasoning is enabled.
Video input currently works reliably only through Moonshot’s official API. Third-party deployments via Ollama, vLLM, or SGLang may not support video processing.
The Agent Swarm feature — arguably the model’s most innovative capability — remains in beta with limited free access.
As one technical review noted, the model excels as “a tireless and outstanding workhorse” but struggles with tasks requiring VP-level strategic thinking. “It’s like telling an intern to write a report with strategic height; they still can’t produce a VP-level report.”
Kimi K2.5 represents a milestone: the first open-source model to seriously compete with frontier closed models across multiple dimensions — reasoning, coding, vision, and agentic tasks.
Previously, open-source models like Llama trailed proprietary models by significant margins. Kimi K2.5 closes that gap dramatically.
Chinese AI labs — Moonshot, DeepSeek, Alibaba (Qwen), and others — are consistently producing models that match American capabilities at a fraction of the cost. This pattern has major implications:
Kimi K2.5’s Agent Swarm points to where AI is heading: systems that don’t just answer questions, but coordinate complex workflows across multiple parallel processes.
VentureBeat suggests this architecture “suggests a future where the primary constraint on an engineering team is no longer the number of hands on keyboards, but the ability of its leaders to choreograph a swarm.”
Kimi K2.5 is not perfect. It doesn’t beat Claude Opus 4.5 at coding. It doesn’t match GPT-5.2 at math. It’s not the cheapest option (that’s still DeepSeek).
But it offers something no other single model does: frontier-level performance across reasoning, vision, and agentic tasks in an open-source package at a reasonable price.
For many users and businesses, that combination is more valuable than being best-in-class at any single task.
The AI landscape continues to shift rapidly. Two years ago, Moonshot didn’t exist. Today, it’s producing models that compete with the best that Silicon Valley has to offer.
Stay ahead with expert AI insights trusted by top tech professionals!
Join thousands of AI fans & professionals benefiting from exclusive tips and insights from industry leaders.