Qwen3-0.6B

Parameters

600M

Context Length

33K

Modality

Text

Architecture

Dense

License

Apache 2.0

Release Date

29 Apr 2025

Knowledge Cutoff

Technical Specifications

Attention

Attention Structure

Grouped-Query Attention

Attention Heads

Key-Value Heads

Attention Head Dimension

128

Position Embedding

ROPE

RoPE Theta

1,000,000

Sliding Window Attention

Sliding Window Size

Normalization

Layer Normalization

Activation Function

Swish

Dimensions

Hidden Dimension Size

1,024

Number of Layers

FFN Intermediate Size (Dense)

3,072

Multi-Token Prediction Heads

Tokenizer

Vocabulary Size

151,936

Architecture Diagram

Qwen3-0.6B

Qwen3-0.6B is a foundational large language model developed by Alibaba Cloud, forming part of the dense architecture variants within the Qwen3 model family. This model is engineered for efficient processing and generation of human language, addressing a spectrum of natural language understanding and generation tasks. Its compact parameter count is optimized for deployment in environments where computational efficiency is a primary design constraint, while maintaining capabilities for diverse applications such as logical reasoning, mathematical problem-solving, code synthesis, creative writing, and natural dialogue.

The Qwen3 series introduces a hybrid reasoning system that integrates both a 'thinking' mode for complex, multi-step reasoning and a 'non-thinking' mode for rapid, context-driven responses within a unified framework. This allows for dynamic mode switching based on user queries or chat templates, enabling a balance between latency and performance adaptable to task complexity. The architecture of the Qwen3 dense models, including Qwen3-0.6B, is built upon refinements observed in previous iterations, incorporating features such as Grouped Query Attention (GQA), SwiGLU activation, Rotary Positional Embeddings (RoPE), and RMSNorm with pre-normalization.

Qwen3-0.6B has been trained on an expansive corpus of approximately 36 trillion tokens, covering 119 languages and dialects. This extensive multilingual capability supports a wide range of international applications, including translation and cross-lingual information retrieval. The training regimen involves a three-stage pretraining process: an initial stage for general language competence, a second stage focused on knowledge-intensive data (e.g., STEM, coding, reasoning), and a third stage for enhancing long-context comprehension by extending training sequence lengths up to 32,768 tokens. This model also demonstrates strong agent capabilities, facilitating integration with external tools for automation and complex workflow orchestration.

About Qwen 3

The Alibaba Qwen 3 model family comprises dense and Mixture-of-Experts (MoE) architectures, with parameter counts from 0.6B to 235B. Key innovations include a hybrid reasoning system, offering 'thinking' and 'non-thinking' modes for adaptive processing, and support for extensive context windows, enhancing efficiency and scalability.

Other Qwen 3 Models

Evaluation Benchmarks

No evaluation benchmarks for Qwen3-0.6B available.

Rankings

Overall Rank

Coding Rank

Model Integrity

Total Score

B+

73 / 100

GPU Requirements

Full Calculator

Choose the quantization method for model weights

Context Size: 1,024 tokens

16k

32k

VRAM Required:

Recommended GPUs

Resources

Official Documentation Release Notes Read the Paper Download Weights

About Contact Compute Efficiency Content Integrity Terms of Use Privacy Policy

URL: https://apxml.com/models/qwen3-0-6b

⇱

Qwen3-0.6B

Technical Specifications

Architecture Diagram

Qwen3-0.6B

About Qwen 3

Other Qwen 3 Models

Evaluation Benchmarks

Rankings

Model Integrity

GPU Requirements

VRAM Required:

Recommended GPUs

Resources