Qwen3.5-0.8B-MTP-GGUF

Qwen3.5-0.8B from Alibaba's Qwen team is the smallest model in the Qwen3.5 family, an ultra-compact 0.8B-parameter dense multimodal language model with a hybrid Gated DeltaNet + sparse MoE architecture, 24 layers, 1024 hidden dimension, 248K vocabulary spanning 201 languages, multi-token prediction, and a massive 262K native context window (extensible to 1M+ tokens via YaRN) for unified text and image understanding at extreme efficiency. Designed under the "More Size, Less Waste" philosophy, it achieves 10.5 on the Artificial Analysis Intelligence Index—ranking #383 overall but exceptional for sub-1B models—while running at blazing-fast latencies (0.00s time-to-first-token #12 globally) with ~1.6GB VRAM requirement (BF16) or ~0.5GB in 4-bit quantization, making it ideal for Raspberry Pi, mobile phones, and embedded IoT devices. Apache 2.0-licensed with Ollama/vLLM/llama.cpp support, it excels at lightweight OCR, document parsing, multilingual chatbots, visual QA, and basic coding tasks as the most accessible entry point for on-device multimodal AI without requiring cloud dependencies.

Multi-Token Prediction (MTP) GGUF is a specialized GGUF model file format extension that integrates speculative decoding directly into the model weights to significantly accelerate local inference. Unlike traditional speculative decoding which requires a separate, smaller "draft" model, MTP GGUF files include additional output heads within the main model architecture that predict multiple future tokens in a single forward pass.

Model Files

File Name	Quant Type	File Size	File Link
Qwen3.5-0.8B.BF16.gguf	BF16	1.56 GB	Download
Qwen3.5-0.8B.F16.gguf	F16	1.56 GB	Download
Qwen3.5-0.8B.Q2_K.gguf	Q2_K	430 MB	Download
Qwen3.5-0.8B.Q3_K_L.gguf	Q3_K_L	502 MB	Download
Qwen3.5-0.8B.Q3_K_M.gguf	Q3_K_M	476 MB	Download
Qwen3.5-0.8B.Q3_K_S.gguf	Q3_K_S	444 MB	Download
Qwen3.5-0.8B.Q4_0.gguf	Q4_0	513 MB	Download
Qwen3.5-0.8B.Q4_K_M.gguf	Q4_K_M	542 MB	Download
Qwen3.5-0.8B.Q4_K_S.gguf	Q4_K_S	517 MB	Download
Qwen3.5-0.8B.Q5_0.gguf	Q5_0	578 MB	Download
Qwen3.5-0.8B.Q5_K_M.gguf	Q5_K_M	593 MB	Download
Qwen3.5-0.8B.Q5_K_S.gguf	Q5_K_S	578 MB	Download
Qwen3.5-0.8B.Q6_K.gguf	Q6_K	647 MB	Download
Qwen3.5-0.8B.Q8_0.gguf	Q8_0	834 MB	Download
Qwen3.5-0.8B.mmproj-bf16.gguf	mmproj-bf16	207 MB	Download
Qwen3.5-0.8B.mmproj-f16.gguf	mmproj-f16	207 MB	Download
Qwen3.5-0.8B.mmproj-q8_0.gguf	mmproj-q8_0	116 MB	Download

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

👁 image.png

Downloads last month: 1,940

GGUF

Model size

0.8B params

Architecture

qwen35

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for prithivMLmods/Qwen3.5-0.8B-MTP-GGUF

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Quantized

(154)

this model

Collection including prithivMLmods/Qwen3.5-0.8B-MTP-GGUF

Collection of Qwen 3.5/3.6 MoE | MTP Featuring GGUF • 5 items • Updated 2 days ago • 4

URL: https://huggingface.co/prithivMLmods/Qwen3.5-0.8B-MTP-GGUF

⇱ prithivMLmods/Qwen3.5-0.8B-MTP-GGUF · Hugging Face

Qwen3.5-0.8B-MTP-GGUF

Model Files

Quants Usage

Model tree for prithivMLmods/Qwen3.5-0.8B-MTP-GGUF

Collection including prithivMLmods/Qwen3.5-0.8B-MTP-GGUF