Qwen3.6-35B-A3B-MTP-GGUF

Qwen3.6-35B-A3B from Alibaba's Qwen team is an open-weight sparse Mixture-of-Experts (MoE) multimodal model with 35B total parameters but only 3B active per token, combining Gated DeltaNet linear attention with standard gated attention layers for efficient inference at a fraction of compute cost while supporting 262K native context (extensible to 1M via YaRN) across text, image, and video inputs. Released under Apache 2.0, it delivers flagship agentic coding performance at 73.4% SWE-Bench Verified, excels at frontend workflows and repository-level reasoning, and introduces hybrid thinking modes with "thinking preservation" to retain reasoning context across multi-turn conversations for streamlined iterative development. The model features native tool calling, structured output, vision input, and integrated function calling, running locally on ~21-24GB VRAM via GGUF quantization with vLLM/Ollama/LM Studio support while maintaining 38.6B weekly tokens processed on OpenRouter, making it ideal for production coding agents and multimodal workflows at minimal cost.

Multi-Token Prediction (MTP) GGUF is a specialized GGUF model file format extension that integrates speculative decoding directly into the model weights to significantly accelerate local inference. Unlike traditional speculative decoding which requires a separate, smaller "draft" model, MTP GGUF files include additional output heads within the main model architecture that predict multiple future tokens in a single forward pass.

Model Files

File Name	Quant Type	File Size	File Link
Qwen3.6-35B-A3B.BF16.gguf	BF16	71.1 GB	Download
Qwen3.6-35B-A3B.F16.gguf	F16	71.1 GB	Download
Qwen3.6-35B-A3B.Q2_K.gguf	Q2_K	13.2 GB	Download
Qwen3.6-35B-A3B.Q3_K_L.gguf	Q3_K_L	18.6 GB	Download
Qwen3.6-35B-A3B.Q3_K_M.gguf	Q3_K_M	17.2 GB	Download
Qwen3.6-35B-A3B.Q3_K_S.gguf	Q3_K_S	15.5 GB	Download
Qwen3.6-35B-A3B.Q4_0.gguf	Q4_0	20.2 GB	Download
Qwen3.6-35B-A3B.Q4_K_M.gguf	Q4_K_M	21.7 GB	Download
Qwen3.6-35B-A3B.Q4_K_S.gguf	Q4_K_S	20.4 GB	Download
Qwen3.6-35B-A3B.Q5_0.gguf	Q5_0	24.6 GB	Download
Qwen3.6-35B-A3B.Q5_K_M.gguf	Q5_K_M	25.3 GB	Download
Qwen3.6-35B-A3B.Q5_K_S.gguf	Q5_K_S	24.6 GB	Download
Qwen3.6-35B-A3B.Q6_K.gguf	Q6_K	29.2 GB	Download
Qwen3.6-35B-A3B.Q8_0.gguf	Q8_0	37.8 GB	Download
Qwen3.6-35B-A3B.mmproj-bf16.gguf	mmproj-bf16	903 MB	Download
Qwen3.6-35B-A3B.mmproj-f16.gguf	mmproj-f16	903 MB	Download
Qwen3.6-35B-A3B.mmproj-q8_0.gguf	mmproj-q8_0	614 MB	Download

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

👁 image.png

Downloads last month: 3,411

GGUF

Model size

36B params

Architecture

qwen35moe

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for prithivMLmods/Qwen3.6-35B-A3B-MTP-GGUF

Base model

Qwen/Qwen3.6-35B-A3B

Quantized

(490)

this model

Collections including prithivMLmods/Qwen3.6-35B-A3B-MTP-GGUF

Collection of Qwen 3.5/3.6 MTP Featuring GGUF • 5 items • Updated 2 days ago • 2

Collection of Qwen 3.5/3.6 MoE | MTP Featuring GGUF • 5 items • Updated 2 days ago • 4

URL: https://huggingface.co/prithivMLmods/Qwen3.6-35B-A3B-MTP-GGUF

⇱ prithivMLmods/Qwen3.6-35B-A3B-MTP-GGUF · Hugging Face

Qwen3.6-35B-A3B-MTP-GGUF

Model Files

Quants Usage

Model tree for prithivMLmods/Qwen3.6-35B-A3B-MTP-GGUF

Collections including prithivMLmods/Qwen3.6-35B-A3B-MTP-GGUF