![]() |
VOOZH | about |
Active Parameters
397B
Context Length
262K
Modality
Multimodal
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
24 Feb 2026
Knowledge Cutoff
-
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
32
Key-Value Heads
2
Attention Head Dimension
256
Position Embedding
ROPE
RoPE Theta
10,000,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
4,096
Number of Layers
60
FFN Intermediate Size (Dense)
1,024
Multi-Token Prediction Heads
1
Tokenizer
Vocabulary Size
248,320
Mixture of Experts
Total Expert Parameters
17.0B
Number of Experts
512
Active Experts
11
Shared Experts
-
FFN Intermediate Size (per Expert)
1,024
Dense Layers Before MoE
-
Qwen3.5-397B-A17B is Alibaba Cloud's largest and most capable multimodal foundation model, released February 2026. With 397B total parameters and 17B activated through a Mixture-of-Experts architecture (512 experts), it achieves state-of-the-art scores on MMLU-Pro (87.8%), GPQA Diamond (88.4%), SWE-bench Verified (80.0%), and Terminal-Bench 2.0 (54.0%). It features unified vision-language capabilities, extended context up to 1M tokens, and excels in coding agents, general agents, multimodal reasoning, and multilingual understanding across 201 languages.
Qwen 3.5 is Alibaba Cloud's latest-generation foundation model family, released February 2026. It represents a significant leap forward, integrating breakthroughs in multimodal learning (unified vision-language foundation), efficient hybrid architecture (Gated Delta Networks with sparse Mixture-of-Experts), scalable reinforcement learning across million-agent environments, and global linguistic coverage spanning 201 languages. Available under Apache 2.0 license with open weights.
Rank
#42
| Benchmark | Score | Rank |
|---|---|---|
StackUnseen ProLLM Stack Unseen | 0.763 | 14 |
General Text Text Arena | 1445 | 33 |
Web Development WebDev Arena | 1395 | 38 |
Overall Rank
#42
Coding Rank
#40
Total Score
B
66 / 100
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
©2025 ApX Machine Learning
APX AI
Online