Qwen3.6-35B-A3B-MTP-GGUF
Qwen3.6-35B-A3B from Alibaba's Qwen team is an open-weight sparse Mixture-of-Experts (MoE) multimodal model with 35B total parameters but only 3B active per token, combining Gated DeltaNet linear attention with standard gated attention layers for efficient inference at a fraction of compute cost while supporting 262K native context (extensible to 1M via YaRN) across text, image, and video inputs. Released under Apache 2.0, it delivers flagship agentic coding performance at 73.4% SWE-Bench Verified, excels at frontend workflows and repository-level reasoning, and introduces hybrid thinking modes with "thinking preservation" to retain reasoning context across multi-turn conversations for streamlined iterative development. The model features native tool calling, structured output, vision input, and integrated function calling, running locally on ~21-24GB VRAM via GGUF quantization with vLLM/Ollama/LM Studio support while maintaining 38.6B weekly tokens processed on OpenRouter, making it ideal for production coding agents and multimodal workflows at minimal cost.
Multi-Token Prediction (MTP) GGUF is a specialized GGUF model file format extension that integrates speculative decoding directly into the model weights to significantly accelerate local inference. Unlike traditional speculative decoding which requires a separate, smaller "draft" model, MTP GGUF files include additional output heads within the main model architecture that predict multiple future tokens in a single forward pass.
Model Files
| File Name | Quant Type | File Size | File Link |
|---|---|---|---|
| Qwen3.6-35B-A3B.BF16.gguf | BF16 | 71.1 GB | Download |
| Qwen3.6-35B-A3B.F16.gguf | F16 | 71.1 GB | Download |
| Qwen3.6-35B-A3B.Q2_K.gguf | Q2_K | 13.2 GB | Download |
| Qwen3.6-35B-A3B.Q3_K_L.gguf | Q3_K_L | 18.6 GB | Download |
| Qwen3.6-35B-A3B.Q3_K_M.gguf | Q3_K_M | 17.2 GB | Download |
| Qwen3.6-35B-A3B.Q3_K_S.gguf | Q3_K_S | 15.5 GB | Download |
| Qwen3.6-35B-A3B.Q4_0.gguf | Q4_0 | 20.2 GB | Download |
| Qwen3.6-35B-A3B.Q4_K_M.gguf | Q4_K_M | 21.7 GB | Download |
| Qwen3.6-35B-A3B.Q4_K_S.gguf | Q4_K_S | 20.4 GB | Download |
| Qwen3.6-35B-A3B.Q5_0.gguf | Q5_0 | 24.6 GB | Download |
| Qwen3.6-35B-A3B.Q5_K_M.gguf | Q5_K_M | 25.3 GB | Download |
| Qwen3.6-35B-A3B.Q5_K_S.gguf | Q5_K_S | 24.6 GB | Download |
| Qwen3.6-35B-A3B.Q6_K.gguf | Q6_K | 29.2 GB | Download |
| Qwen3.6-35B-A3B.Q8_0.gguf | Q8_0 | 37.8 GB | Download |
| Qwen3.6-35B-A3B.mmproj-bf16.gguf | mmproj-bf16 | 903 MB | Download |
| Qwen3.6-35B-A3B.mmproj-f16.gguf | mmproj-f16 | 903 MB | Download |
| Qwen3.6-35B-A3B.mmproj-q8_0.gguf | mmproj-q8_0 | 614 MB | Download |
Quants Usage
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
- Downloads last month
- 3,411
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for prithivMLmods/Qwen3.6-35B-A3B-MTP-GGUF
Base model
Qwen/Qwen3.6-35B-A3B