Qwen3-4B

Parameters

Context Length

33K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

29 Apr 2025

Knowledge Cutoff

Mar 2025

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

128

Position Embedding

ROPE

RoPE Theta

1,000,000

Sliding Window Attention

Sliding Window Size

Normalization

RMS Normalization

Activation Function

Swish

Dimensions

Hidden Dimension Size

4,096

Number of Layers

FFN Intermediate Size (Dense)

9,728

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

151,936

Architecture Diagram

Qwen3-4B

Qwen3-4B is a 4-billion parameter dense causal language model developed by Alibaba, belonging to the third generation of the Qwen series. A fundamental innovation in this model is its unified architecture that supports dual-mode operation, allowing for dynamic switching between 'thinking' and 'non-thinking' states. In the thinking mode, the model performs extensive, multi-step logical reasoning similar to chain-of-thought processing, making it effective for complex mathematical problems and intricate code generation. Conversely, the non-thinking mode is optimized for low-latency, direct responses in general conversational contexts, providing an efficient alternative for tasks where depth of reasoning is secondary to speed.

Technically, the model is built on a transformer architecture with 36 layers and 4.0 billion total parameters. It utilizes Grouped Query Attention (GQA) with 32 attention heads for queries and 8 key-value heads, ensuring high computational throughput during inference. The model employs Rotary Position Embeddings (RoPE) and is natively trained on a 32,768-token context window, which can be extended up to 131,072 tokens using YaRN scaling. This architectural foundation is further refined through a three-stage pre-training pipeline involving 36 trillion tokens across 119 languages, prioritizing a mix of high-quality STEM, coding, and multilingual data to ensure broad-spectrum proficiency.

Qwen3-4B is designed for versatility in deployment, particularly in environments requiring sophisticated reasoning within a compact parameter footprint. Its native support for thinking modes allows it to function as a reasoning engine for complex instruction following and agentic workflows without requiring a separate specialized model. The integration of SwiGLU activations and RMSNorm ensures stable training dynamics, while the inclusion of 'tied embeddings' specifically in the smaller variants like the 4B model helps optimize memory usage. It is highly effective for cross-lingual tasks, tool-based interactions, and structured output generation across a wide variety of domains.

About Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.

Other Qwen 3 Models

Evaluation Benchmarks

Rank

#67

Benchmark	Score	Rank
General Knowledge MMLU	0.815	20

Rankings

Overall Rank

#67

Coding Rank

Model Integrity

Total Score

B+

76 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

16k

32k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights Source Code

About Contact Compute Efficiency Content Integrity Terms of Use Privacy Policy

URL: https://apxml.com/models/qwen3-4b

⇱ Qwen3-4B: Specifications and GPU VRAM Requirements

Qwen3-4B

Technical Specifications

Architecture Diagram

Qwen3-4B

About Qwen 3

Other Qwen 3 Models

Evaluation Benchmarks

Rankings

Model Integrity

GPU Requirements

VRAM Required:

Recommended GPUs

Resources