VOOZH about

URL: https://huggingface.co/prithivMLmods/Qwen3.6-35B-A3B-MTP-GGUF

⇱ prithivMLmods/Qwen3.6-35B-A3B-MTP-GGUF · Hugging Face


Qwen3.6-35B-A3B-MTP-GGUF

Qwen3.6-35B-A3B from Alibaba's Qwen team is an open-weight sparse Mixture-of-Experts (MoE) multimodal model with 35B total parameters but only 3B active per token, combining Gated DeltaNet linear attention with standard gated attention layers for efficient inference at a fraction of compute cost while supporting 262K native context (extensible to 1M via YaRN) across text, image, and video inputs. Released under Apache 2.0, it delivers flagship agentic coding performance at 73.4% SWE-Bench Verified, excels at frontend workflows and repository-level reasoning, and introduces hybrid thinking modes with "thinking preservation" to retain reasoning context across multi-turn conversations for streamlined iterative development. The model features native tool calling, structured output, vision input, and integrated function calling, running locally on ~21-24GB VRAM via GGUF quantization with vLLM/Ollama/LM Studio support while maintaining 38.6B weekly tokens processed on OpenRouter, making it ideal for production coding agents and multimodal workflows at minimal cost.

Multi-Token Prediction (MTP) GGUF is a specialized GGUF model file format extension that integrates speculative decoding directly into the model weights to significantly accelerate local inference. Unlike traditional speculative decoding which requires a separate, smaller "draft" model, MTP GGUF files include additional output heads within the main model architecture that predict multiple future tokens in a single forward pass.

Model Files

File Name Quant Type File Size File Link
Qwen3.6-35B-A3B.BF16.gguf BF16 71.1 GB Download
Qwen3.6-35B-A3B.F16.gguf F16 71.1 GB Download
Qwen3.6-35B-A3B.Q2_K.gguf Q2_K 13.2 GB Download
Qwen3.6-35B-A3B.Q3_K_L.gguf Q3_K_L 18.6 GB Download
Qwen3.6-35B-A3B.Q3_K_M.gguf Q3_K_M 17.2 GB Download
Qwen3.6-35B-A3B.Q3_K_S.gguf Q3_K_S 15.5 GB Download
Qwen3.6-35B-A3B.Q4_0.gguf Q4_0 20.2 GB Download
Qwen3.6-35B-A3B.Q4_K_M.gguf Q4_K_M 21.7 GB Download
Qwen3.6-35B-A3B.Q4_K_S.gguf Q4_K_S 20.4 GB Download
Qwen3.6-35B-A3B.Q5_0.gguf Q5_0 24.6 GB Download
Qwen3.6-35B-A3B.Q5_K_M.gguf Q5_K_M 25.3 GB Download
Qwen3.6-35B-A3B.Q5_K_S.gguf Q5_K_S 24.6 GB Download
Qwen3.6-35B-A3B.Q6_K.gguf Q6_K 29.2 GB Download
Qwen3.6-35B-A3B.Q8_0.gguf Q8_0 37.8 GB Download
Qwen3.6-35B-A3B.mmproj-bf16.gguf mmproj-bf16 903 MB Download
Qwen3.6-35B-A3B.mmproj-f16.gguf mmproj-f16 903 MB Download
Qwen3.6-35B-A3B.mmproj-q8_0.gguf mmproj-q8_0 614 MB Download

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

👁 image.png

Downloads last month
3,411
GGUF
Model size
36B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for prithivMLmods/Qwen3.6-35B-A3B-MTP-GGUF

Quantized
(490)
this model

Collections including prithivMLmods/Qwen3.6-35B-A3B-MTP-GGUF