![]() |
VOOZH | about |
Active Parameters
35B
Context Length
262K
Modality
Multimodal
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
15 Apr 2026
Knowledge Cutoff
-
Attention
Attention Structure
Grouped-Query Attention
Attention Heads
16
Key-Value Heads
2
Attention Head Dimension
256
Position Embedding
ROPE
RoPE Theta
10,000,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
2,048
Number of Layers
40
FFN Intermediate Size (Dense)
512
Multi-Token Prediction Heads
1
Tokenizer
Vocabulary Size
248,320
Mixture of Experts
Total Expert Parameters
3.0B
Number of Experts
256
Active Experts
9
Shared Experts
-
FFN Intermediate Size (per Expert)
512
Dense Layers Before MoE
-
Qwen3.6-35B-A3B is Alibaba's open-source hybrid MoE model with 35B total parameters and only 3B active per token. Built on a novel architecture combining Gated DeltaNet linear attention with standard Gated Attention and sparse MoE (256 experts, 8 routed + 1 shared active), it delivers exceptional agentic coding performance rivaling much larger dense models. Achieves 73.4% on SWE-bench Verified, 51.5% on Terminal-Bench 2.0, and 92.6% on AIME 2026. Natively multimodal (text, image, video), supports 262K context natively (up to 1M with YaRN), includes thinking preservation for agentic tasks, and is trained with Multi-Token Prediction. Available via Alibaba Cloud Model Studio API as qwen3.6-flash. Released April 15, 2026 under Apache 2.0.
Qwen 3.6 is Alibaba's latest generation of hybrid sparse Mixture-of-Experts (MoE) models featuring a novel architecture that combines Gated DeltaNet linear attention layers with standard Gated Attention layers and MoE feed-forward networks. The family delivers substantial improvements in agentic coding, multimodal perception, and reasoning, with native support for thinking and non-thinking modes, thinking preservation across turns, and a 262K native context window.
Rank
#43
| Benchmark | Score | Rank |
|---|---|---|
Reasoning LiveBench Reasoning | 0.76 | 23 |
Overall Rank
#43
Coding Rank
-
Total Score
B+
70 / 100
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
©2025 ApX Machine Learning
APX AI
Online