Pricing: The Cost Argument
As of June 3, M3 is live on OpenRouter at $0.60 per million input tokens and $2.40 per million output tokens. A 50% promotional discount applied at launch reduced effective rates to approximately $0.30 input / $1.20 output per million tokens — though promotional pricing rarely persists.
| Model | Input ($/M tokens) | Output ($/M tokens) | Max Context |
| MiniMax M3 | $0.60 ($0.30 promo) | $2.40 ($1.20 promo) | 1M tokens |
| Gemini 3.5 Flash | $1.50 | $9.00 | 128K tokens |
| Claude Sonnet 4.6 | ~$3.00 | ~$15.00 | 200K tokens |
| GPT-5.5 | significantly higher | significantly higher | 256K tokens |
At $0.60 input, M3 undercuts Gemini 3.5 Flash by 60% on input tokens. For document-heavy workflows — contract review, codebase analysis, RAG with large retrieved contexts — where input tokens dominate cost, the economics are compelling if quality holds. The 1M context window amplifies the savings: instead of chunking and re-querying (which multiplies API calls), a single M3 call can process what would have required 5–10 calls at shorter-context pricing, eliminating the retrieval overhead entirely.
Developer Guide: API Access Today
M3 is available immediately via OpenRouter. The endpoint follows the standard OpenAI chat completions spec, so migration from an existing model requires changing two lines:
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY"
)
response = client.chat.completions.create(
model="minimax/minimax-m3",
messages=[
{
"role": "system",
"content": "You are an expert software engineer. Review code for bugs, performance issues, and design problems."
},
{
"role": "user",
"content": "Review this Python class for thread safety issues and memory leaks:\n\nclass DataProcessor:\n def __init__(self):\n self.cache = {}\n \n def process(self, key, data):\n if key in self.cache:\n return self.cache[key]\n result = expensive_operation(data)\n self.cache[key] = result\n return result"
}
],
max_tokens=2048
)
print(response.choices[0].message.content)
For direct access, MiniMax’s platform at api.minimax.chat exposes the full multimodal capability — text, image, and video inputs. The OpenRouter integration handles text only at launch. If your workflow requires analyzing image frames or video alongside code or documents, use the MiniMax API directly.
For the 1M context path, pass the full content in a single message. No chunking, no summarization layers, no retrieval pipeline needed:
with open("codebase_dump.txt", "r") as f:
full_codebase = f.read()
# Single call, full codebase in context
response = client.chat.completions.create(
model="minimax/minimax-m3",
messages=[
{
"role": "user",
"content": (
"Here is the complete codebase:\n\n"
+ full_codebase
+ "\n\nTrace the data flow from POST /api/checkout "
"through to the payment processing module. Identify "
"race conditions or input validation gaps."
)
}
],
max_tokens=4096
)
What 1M Context Actually Unlocks
Long context windows get announced constantly. The M3 version is more interesting than most because MSA makes serving 1M tokens economically viable for the provider — which means MiniMax can price it comparably to standard-context inference instead of charging a premium surcharge.
Full-codebase review. 1M tokens is 25,000–40,000 lines of code depending on language verbosity. A mid-size production application fits in a single call. Trace a bug across the full dependency graph, audit all authentication paths, or generate comprehensive documentation — without chunking and the context fragmentation it introduces.
Complete contract analysis. A 500-page legal agreement is roughly 250,000 words. Send the whole document, ask M3 to identify all indemnification clauses, flag every defined term that appears inconsistently, or summarize obligations by party. Previous 200K-context models required chunking with retrieval layers that introduced relevance errors on cross-section references.
Agent session persistence. In multi-step agentic workflows, context accumulates with every tool call. At 1M tokens, an agent maintains 20–30x more interaction history before needing to compress or summarize. That difference matters in tasks with long planning horizons — a 15-step research workflow versus one that forgets step 3 by step 8.
Multi-source video analysis. The native video input at MiniMax’s direct API allows simultaneous analysis of multiple video segments in a single call — useful for content moderation pipelines, multi-camera production review, or surveillance workflows where temporal context across clips matters.
Where M3 Is Not the Right Choice
At launch, M3 has specific gaps worth knowing before you build anything on it.
No independent benchmark verification yet. If your application requires provable accuracy thresholds — medical diagnosis support, legal compliance screening, financial risk scoring — don’t deploy on vendor numbers. Wait for community evaluation after the weights drop June 10–11.
Multimodal requires the direct API. OpenRouter text-only at launch means image and video input needs the MiniMax API directly, adding integration complexity if you already route through a provider. For text-only workloads this is a non-issue.
Short-context tasks see no architectural advantage. MSA is optimized for long-context efficiency. For tasks under 10K tokens, M3 performs like a standard frontier model — competitive, but without the 15x speed multiplier. Gemini 3.5 Flash or Claude Haiku 4.5 may deliver better value at very short contexts given their established optimization for that regime.
Enterprise SLAs not yet published. For teams needing contractual uptime guarantees, DPA agreements, or dedicated infrastructure, MiniMax’s enterprise support tier details were not available at launch. The OpenRouter path provides availability SLAs through OpenRouter’s own infrastructure guarantees, not MiniMax’s directly.
Open Weights: Why June 10 Matters More Than the Launch
MiniMax committed to releasing model weights and a full technical report around June 10–11 on Hugging Face and GitHub. Three things happen when weights drop that don’t happen on API launch day.
The ML community benchmarks independently. Within 48 hours of a major model weights release, LMSys, EleutherAI, and independent researchers typically publish their own evaluations. This is when vendor claims get confirmed, corrected, or revised significantly. MiniMax M2 held up reasonably well under independent evaluation. M3’s claims are larger, in a more competitive environment, and the community appetite for scrutinizing SWE-Bench methodology is higher than ever.
Self-hosted deployment becomes available. For teams with data sovereignty requirements or on-premise constraints, open weights eliminate the API pricing conversation. A model that costs $0.60/million tokens via API costs compute-only when self-hosted on your hardware. For high-volume applications — processing thousands of documents per day — self-hosting frontier-class weights is typically 5–15x cheaper than API pricing at scale.
Fine-tuning becomes viable. A frontier-capable base model that can be adapted on private datasets is more valuable for specialized deployments than an API-only model at any price. Legal document analysis, domain-specific code review, proprietary knowledge integration — these workflows improve meaningfully with fine-tuning, and the base model capability determines the ceiling.
The Honest Take
MiniMax has delivered before. M2 was independently validated post-weights-release and performed close to announced numbers. M3 is a larger claim — surpassing GPT-5.5 and Gemini 3.1 Pro on SWE-Bench Pro is not a minor upgrade story — in an environment where benchmark methodology scrutiny is at an all-time high.
The VentureBeat headline framing — “eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5–10% of the cost” — is the kind of claim that attracts attention and skepticism in roughly equal measure. Frontier models from OpenAI and Google have dedicated evaluation infrastructure and months of post-launch hardening. An open-weight model matching them on day one at a fraction of the cost would be a structural shift, not a typical product launch.
The answer arrives June 10–11. Until then: access the API via OpenRouter today, build your own evaluation suite against your task distribution, and make the deployment decision on your data rather than the vendor’s. If M3 is as capable as claimed, you’ll know from your results before the community verdict lands. If it isn’t, you’ll have saved yourself a premature architecture decision.