VOOZH about

URL: https://apxml.com/models/qwen36-35b-a3b

⇱ Qwen3.6 35B A3B: Specifications and GPU VRAM Requirements


Qwen3.6 35B A3B

Active Parameters

35B

Context Length

262K

Modality

Multimodal

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

15 Apr 2026

Knowledge Cutoff

-

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

16

Key-Value Heads

2

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

10,000,000

Sliding Window Attention

No

Sliding Window Size

-

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

2,048

Number of Layers

40

FFN Intermediate Size (Dense)

512

Multi-Token Prediction Heads

1

Tokenizer

Vocabulary Size

248,320

Mixture of Experts

Total Expert Parameters

3.0B

Number of Experts

256

Active Experts

9

Shared Experts

-

FFN Intermediate Size (per Expert)

512

Dense Layers Before MoE

-

Architecture Diagram

Qwen3.6 35B A3B

Qwen3.6-35B-A3B is Alibaba's open-source hybrid MoE model with 35B total parameters and only 3B active per token. Built on a novel architecture combining Gated DeltaNet linear attention with standard Gated Attention and sparse MoE (256 experts, 8 routed + 1 shared active), it delivers exceptional agentic coding performance rivaling much larger dense models. Achieves 73.4% on SWE-bench Verified, 51.5% on Terminal-Bench 2.0, and 92.6% on AIME 2026. Natively multimodal (text, image, video), supports 262K context natively (up to 1M with YaRN), includes thinking preservation for agentic tasks, and is trained with Multi-Token Prediction. Available via Alibaba Cloud Model Studio API as qwen3.6-flash. Released April 15, 2026 under Apache 2.0.

About Qwen 3.6

Qwen 3.6 is Alibaba's latest generation of hybrid sparse Mixture-of-Experts (MoE) models featuring a novel architecture that combines Gated DeltaNet linear attention layers with standard Gated Attention layers and MoE feed-forward networks. The family delivers substantial improvements in agentic coding, multimodal perception, and reasoning, with native support for thinking and non-thinking modes, thinking preservation across turns, and a 262K native context window.


Other Qwen 3.6 Models
  • No related models available

Evaluation Benchmarks

Rank

#43

BenchmarkScoreRank

0.76

23

Rankings

Overall Rank

#43

Coding Rank

-

Model Integrity

Total Score

B+

70 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

1k
128k
256k

VRAM Required:

Recommended GPUs