Qwen3 Coder 480B A35B

Active Parameters

480B

Context Length

262K

Modality

Text

Architecture

Mixture of Experts (MoE)

License

Apache 2.0

Release Date

22 Jul 2025

Knowledge Cutoff

Dec 2024

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

128

Position Embedding

Absolute Position Embedding

RoPE Theta

10,000,000

Sliding Window Attention

Sliding Window Size

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

6,144

Number of Layers

FFN Intermediate Size (Dense)

2,560

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

151,936

Mixture of Experts

Total Expert Parameters

35.0B

Number of Experts

160

Active Experts

Shared Experts

FFN Intermediate Size (per Expert)

2,560

Dense Layers Before MoE

Architecture Diagram

Qwen3 Coder 480B A35B

Qwen3 Coder 480B A35B is Alibaba's advanced agentic artificial intelligence model, specifically engineered for high-performance software development and autonomous coding workflows. As a specialized variant of the Qwen 3 family, it is designed to manage complex multi-turn programming tasks, including comprehensive repository analysis, cross-file reasoning, and automated pull request generation. The model serves as the primary engine for autonomous software engineering, enabling deep integration with developer tools and terminal-based agents like Qwen Code.

Architecturally, the model utilizes a sparse Mixture-of-Experts (MoE) decoder-only transformer framework. It comprises a total of 480 billion parameters, while maintaining computational efficiency by activating only 35 billion parameters per inference query. This configuration employs 160 total experts, with 8 active experts selected via a gating mechanism for each token. The underlying structure features 62 transformer layers and incorporates Grouped Query Attention (GQA) with 96 query heads and 8 key-value heads to optimize memory bandwidth and inference speed. It utilizes Rotary Position Embeddings (RoPE) and is optimized for long-horizon context through techniques such as YaRN, supporting a native context window of 262,144 tokens that can be extended up to one million.

The model is trained on a massive dataset of 7.5 trillion tokens, with a 70% concentration on source code and technical content across multiple programming languages including Python, JavaScript, C++, and Rust. Its post-training phase leverages long-horizon reinforcement learning, specifically Agent RL and Code RL, to improve multi-step planning and interaction with external tools such as browsers and CLI environments. This specialization allows the model to function as a sophisticated coding agent capable of executing complex engineering tasks and managing entire codebases with high precision.

About Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.

Other Qwen 3 Models

Evaluation Benchmarks

Rank

#91

Benchmark	Score	Rank
General Text Text Arena	1388	60
Web Development WebDev Arena	1282	83

Rankings

Overall Rank

#91

Coding Rank

#92

Model Integrity

Total Score

68 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

128k

256k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Read the Paper Download Weights Source Code

About Contact Compute Efficiency Content Integrity Terms of Use Privacy Policy

URL: https://apxml.com/models/qwen3-coder-480b-a35b

⇱

Qwen3 Coder 480B A35B

Technical Specifications

Architecture Diagram

Qwen3 Coder 480B A35B

About Qwen 3

Other Qwen 3 Models

Evaluation Benchmarks

Rankings

Model Integrity

GPU Requirements

VRAM Required:

Recommended GPUs

Resources