Qwen3.6 35B A3B

Active Parameters

35B

Context Length

262K

Modality

Multimodal

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

15 Apr 2026

Knowledge Cutoff

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

10,000,000

Sliding Window Attention

Sliding Window Size

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

2,048

Number of Layers

FFN Intermediate Size (Dense)

512

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

248,320

Mixture of Experts

Total Expert Parameters

3.0B

Number of Experts

256

Active Experts

Shared Experts

FFN Intermediate Size (per Expert)

512

Dense Layers Before MoE

Architecture Diagram

Qwen3.6 35B A3B

Qwen3.6-35B-A3B is Alibaba's open-source hybrid MoE model with 35B total parameters and only 3B active per token. Built on a novel architecture combining Gated DeltaNet linear attention with standard Gated Attention and sparse MoE (256 experts, 8 routed + 1 shared active), it delivers exceptional agentic coding performance rivaling much larger dense models. Achieves 73.4% on SWE-bench Verified, 51.5% on Terminal-Bench 2.0, and 92.6% on AIME 2026. Natively multimodal (text, image, video), supports 262K context natively (up to 1M with YaRN), includes thinking preservation for agentic tasks, and is trained with Multi-Token Prediction. Available via Alibaba Cloud Model Studio API as qwen3.6-flash. Released April 15, 2026 under Apache 2.0.

About Qwen 3.6

Qwen 3.6 is Alibaba's latest generation of hybrid sparse Mixture-of-Experts (MoE) models featuring a novel architecture that combines Gated DeltaNet linear attention layers with standard Gated Attention layers and MoE feed-forward networks. The family delivers substantial improvements in agentic coding, multimodal perception, and reasoning, with native support for thinking and non-thinking modes, thinking preservation across turns, and a 262K native context window.

Other Qwen 3.6 Models

No related models available

Evaluation Benchmarks

Rank

#43

Benchmark	Score	Rank
Reasoning LiveBench Reasoning	0.76	23

Rankings

Overall Rank

#43

Coding Rank

Model Integrity

Total Score

B+

70 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

128k

256k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Download Weights

About Contact Compute Efficiency Content Integrity Terms of Use Privacy Policy

URL: https://apxml.com/models/qwen36-35b-a3b

⇱ Qwen3.6 35B A3B: Specifications and GPU VRAM Requirements

Qwen3.6 35B A3B

Technical Specifications

Architecture Diagram

Qwen3.6 35B A3B

About Qwen 3.6

Other Qwen 3.6 Models

Evaluation Benchmarks

Rankings

Model Integrity

GPU Requirements

VRAM Required:

Recommended GPUs

Resources