VOOZH about

URL: https://huggingface.co/mlx-community/Qwen3.5-4B-MLX-4bit

⇱ mlx-community/Qwen3.5-4B-MLX-4bit · Hugging Face


Qwen3.5-4B-MLX-4bit

This is a 4-bit quantized MLX version of Qwen/Qwen3.5-4B for Apple Silicon.

Model Details

  • Original Model: Qwen/Qwen3.5-4B
  • Quantization: 4-bit (5.347 bits per weight)
  • Group Size: 64
  • Format: MLX SafeTensors
  • Framework: mlx-vlm
  • Disk Size: ~2.9G

Conversion Details

This model was converted using mlx-vlm from the pc/fix-qwen35-predicate branch, which includes fixes for Qwen3.5 model support (proper handling of MoE gate layers, shared_expert_gate, and A_log casting).

Conversion command:

python3 -m mlx_vlm convert \
 --hf-path "Qwen/Qwen3.5-4B" \
 --mlx-path "./Qwen3.5-4B-MLX-4bit" \
 -q --q-bits 4 --q-group-size 64

Important Note

A better, more optimized conversion may be available from @Prince (@Blaizzy) in the MLX VLM community. Check the mlx-community organization for updated versions as official Qwen3.5 support is merged into the main mlx-vlm branch.

Related Models

Usage

from mlx_vlm import load, generate

model, processor = load("mlx-community/Qwen3.5-4B-MLX-4bit")

output = generate(
 model,
 processor,
 prompt="Describe this image.",
 image="path/to/image.jpg",
 max_tokens=512
)
print(output)

CLI:

python3 -m mlx_vlm.generate \
 --model mlx-community/Qwen3.5-4B-MLX-4bit \
 --image path/to/image.jpg \
 --prompt "Describe this image."

License

This model inherits the Apache 2.0 license from the original Qwen model.

Downloads last month
29,183
Safetensors
Model size
1.0B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/Qwen3.5-4B-MLX-4bit

Finetuned
Qwen/Qwen3.5-4B
Quantized
(256)
this model
Adapters
1 model
Quantizations
1 model

Space using mlx-community/Qwen3.5-4B-MLX-4bit 1