Qwen3.5-397B-A17B

Active Parameters

397B

Context Length

262K

Modality

Multimodal

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

24 Feb 2026

Knowledge Cutoff

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

256

Position Embedding

ROPE

RoPE Theta

10,000,000

Sliding Window Attention

Sliding Window Size

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

4,096

Number of Layers

FFN Intermediate Size (Dense)

1,024

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

248,320

Mixture of Experts

Total Expert Parameters

17.0B

Number of Experts

512

Active Experts

Shared Experts

FFN Intermediate Size (per Expert)

1,024

Dense Layers Before MoE

Architecture Diagram

Qwen3.5-397B-A17B

Qwen3.5-397B-A17B is Alibaba Cloud's largest and most capable multimodal foundation model, released February 2026. With 397B total parameters and 17B activated through a Mixture-of-Experts architecture (512 experts), it achieves state-of-the-art scores on MMLU-Pro (87.8%), GPQA Diamond (88.4%), SWE-bench Verified (80.0%), and Terminal-Bench 2.0 (54.0%). It features unified vision-language capabilities, extended context up to 1M tokens, and excels in coding agents, general agents, multimodal reasoning, and multilingual understanding across 201 languages.

About Qwen 3.5

Qwen 3.5 is Alibaba Cloud's latest-generation foundation model family, released February 2026. It represents a significant leap forward, integrating breakthroughs in multimodal learning (unified vision-language foundation), efficient hybrid architecture (Gated Delta Networks with sparse Mixture-of-Experts), scalable reinforcement learning across million-agent environments, and global linguistic coverage spanning 201 languages. Available under Apache 2.0 license with open weights.

Other Qwen 3.5 Models

Evaluation Benchmarks

Rank

#42

Benchmark	Score	Rank
StackUnseen ProLLM Stack Unseen	0.763	14
General Text Text Arena	1445	33
Web Development WebDev Arena	1395	38

Rankings

Overall Rank

#42

Coding Rank

#40

Model Integrity

Total Score

66 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

128k

256k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Download Weights

About Contact Compute Efficiency Content Integrity Terms of Use Privacy Policy

URL: https://apxml.com/models/qwen35-397b-a17b

⇱ Qwen3.5-397B-A17B: Specifications and GPU VRAM Requirements

Qwen3.5-397B-A17B

Technical Specifications

Architecture Diagram

Qwen3.5-397B-A17B

About Qwen 3.5

Other Qwen 3.5 Models

Evaluation Benchmarks

Rankings

Model Integrity

GPU Requirements

VRAM Required:

Recommended GPUs

Resources