Qwen3 Next 80B A3B

Active Parameters

80B

Context Length

66K

Modality

Reasoning

Architecture

Mixture of Experts (MoE)

License

Apache-2.0

Release Date

1 Feb 2026

Knowledge Cutoff

Jun 2025

Technical Specifications

Attention

Attention Structure

Multi-Head Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

256

Position Embedding

Absolute Position Embedding

RoPE Theta

10,000,000

Sliding Window Attention

Sliding Window Size

Normalization

RMS Normalization

Activation Function

SwigLU

Dimensions

Hidden Dimension Size

2,048

Number of Layers

FFN Intermediate Size (Dense)

512

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

151,936

Mixture of Experts

Total Expert Parameters

79.0B

Number of Experts

512

Active Experts

Shared Experts

FFN Intermediate Size (per Expert)

512

Dense Layers Before MoE

Architecture Diagram

Qwen3 Next 80B A3B

Qwen3-Next-80B-A3B is a high-capacity sparse Mixture-of-Experts (MoE) foundation model developed by Alibaba's Qwen team. It belongs to the next-generation Qwen3-Next series, specifically designed to address the computational demands of long-context sequence modeling and large-scale parameter efficiency. The model features a unique hybrid attention mechanism that integrates Gated DeltaNet with Gated Attention, allowing the system to maintain high performance across extended token sequences while significantly reducing the quadratic complexity typically associated with standard Transformer architectures.

The technical architecture employs a high-sparsity MoE layout consisting of 48 layers with a hidden dimension of 2048. While the model contains 80 billion total parameters, its gating mechanism activates only approximately 3 billion parameters per token during inference. This sparse activation strategy, combined with a total of 512 experts and a multi-token prediction (MTP) objective, facilitates improved throughput and reduced FLOPs per token. The model also incorporates stability-focused architectural refinements, such as zero-centered and weight-decayed layer normalization, to ensure robust convergence during both pre-training on 15 trillion tokens and subsequent reinforcement learning stages.

Optimized for complex reasoning and agentic workflows, Qwen3-Next-80B-A3B is capable of processing a native context window of 262,144 tokens, which can be extended to over 1 million tokens using specialized scaling techniques like YaRN. Its primary use cases include multi-step logical analysis, mathematical proofs, and code synthesis. By separating the 'Thinking' variant, which outputs structured reasoning traces, from the standard 'Instruct' variant, the model provides specialized paths for either high-efficiency general-purpose interaction or intensive, transparent problem-solving tasks.

About Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.

Other Qwen 3 Models

Evaluation Benchmarks

Rank

#132

Benchmark	Score	Rank
Mathematics LiveBench Mathematics	0.74	31
Graduate-Level QA GPQA	0.772	33
Web Development WebDev Arena	1402	35
Data Analysis LiveBench Data Analysis	0.50	36
Coding LiveBench Coding	0.68	41
Reasoning LiveBench Reasoning	0.58	42
General Text Text Arena	1402	51
Agentic Coding LiveBench Agentic	0.10	53
Professional Knowledge MMLU Pro	0.83	56

Rankings

Overall Rank

#132

Coding Rank

#77

Model Integrity

Total Score

B+

72 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

32k

64k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code

About Contact Compute Efficiency Content Integrity Terms of Use Privacy Policy

URL: https://apxml.com/models/qwen3-next-80b-a3b

⇱

Qwen3 Next 80B A3B

Technical Specifications

Architecture Diagram

Qwen3 Next 80B A3B

About Qwen 3

Other Qwen 3 Models

Evaluation Benchmarks

Rankings

Model Integrity

GPU Requirements

VRAM Required:

Recommended GPUs

Resources