![]() |
VOOZH | about |
Active Parameters
480B
Context Length
262K
Modality
Text
Architecture
Mixture of Experts (MoE)
License
Apache 2.0
Release Date
22 Jul 2025
Knowledge Cutoff
Dec 2024
Attention
Attention Structure
Multi-Head Attention
Attention Heads
96
Key-Value Heads
8
Attention Head Dimension
128
Position Embedding
Absolute Position Embedding
RoPE Theta
10,000,000
Sliding Window Attention
No
Sliding Window Size
-
Normalization
RMS Normalization
Activation Function
SwigLU
Dimensions
Hidden Dimension Size
6,144
Number of Layers
62
FFN Intermediate Size (Dense)
2,560
Multi-Token Prediction Heads
-
Tokenizer
Vocabulary Size
151,936
Mixture of Experts
Total Expert Parameters
35.0B
Number of Experts
160
Active Experts
8
Shared Experts
-
FFN Intermediate Size (per Expert)
2,560
Dense Layers Before MoE
-
Qwen3 Coder 480B A35B is Alibaba's advanced agentic artificial intelligence model, specifically engineered for high-performance software development and autonomous coding workflows. As a specialized variant of the Qwen 3 family, it is designed to manage complex multi-turn programming tasks, including comprehensive repository analysis, cross-file reasoning, and automated pull request generation. The model serves as the primary engine for autonomous software engineering, enabling deep integration with developer tools and terminal-based agents like Qwen Code.
Architecturally, the model utilizes a sparse Mixture-of-Experts (MoE) decoder-only transformer framework. It comprises a total of 480 billion parameters, while maintaining computational efficiency by activating only 35 billion parameters per inference query. This configuration employs 160 total experts, with 8 active experts selected via a gating mechanism for each token. The underlying structure features 62 transformer layers and incorporates Grouped Query Attention (GQA) with 96 query heads and 8 key-value heads to optimize memory bandwidth and inference speed. It utilizes Rotary Position Embeddings (RoPE) and is optimized for long-horizon context through techniques such as YaRN, supporting a native context window of 262,144 tokens that can be extended up to one million.
The model is trained on a massive dataset of 7.5 trillion tokens, with a 70% concentration on source code and technical content across multiple programming languages including Python, JavaScript, C++, and Rust. Its post-training phase leverages long-horizon reinforcement learning, specifically Agent RL and Code RL, to improve multi-step planning and interaction with external tools such as browsers and CLI environments. This specialization allows the model to function as a sophisticated coding agent capable of executing complex engineering tasks and managing entire codebases with high precision.
The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.
Rank
#91
| Benchmark | Score | Rank |
|---|---|---|
General Text Text Arena | 1388 | 60 |
Web Development WebDev Arena | 1282 | 83 |
Overall Rank
#91
Coding Rank
#92
Total Score
B
68 / 100
Full Calculator
Choose the quantization method for model weights
Context Size: 1,024 tokens
©2025 ApX Machine Learning
APX AI
Online