VOOZH about

URL: https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MLX-4bit

⇱ ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MLX-4bit · Hugging Face


Qwen3.6-27B-Omnimerge-v4 — MLX 4-bit

4-bit MLX quantization of ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 for native Apple Silicon inference via mlx-lm.

The base model is a same-base DARE-TIES (Omnimerge_v2 method) merge of Qwen/Qwen3.6-27B with three Qwen3.6 fine-tunes (rico03, Esper3.1, kai-os Opus-Reasoning-anchor) plus an MLP-passthrough surgery that fixes Qwen3.6's reasoning-tag-emission fragility. Benchmark numbers and the full method writeup live on the base model card.

Quantization

  • Type: MLX 4-bit (-q --q-bits 4)
  • Group size: 64
  • Effective bits/weight: 4.501 (per the mlx_lm.convert quantizer)
  • Shape on disk: 3 safetensors shards, ~15 GB total
  • Build env: mlx==0.30.0 + mlx-cuda==0.30.0 + mlx-lm==0.30.7 on Linux + RTX 3090 (CUDA backend used only for the conversion step; end users run the native Apple Silicon MLX runtime, which has no CUDA dependency).

Conversion recipe: omnimergekit/scripts/mlx_convert.sh — the canonical OmniMergeKit MLX-conversion runner. See MLX_CONVERT.md for the full pin rationale and disk-budget notes.

Usage

Tested with mlx-lm >= 0.30.7 on macOS (M1/M2/M3/M4):

pip install -U mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MLX-4bit")
prompt = "Write a Rust function that returns the n-th Fibonacci number iteratively."
print(generate(model, tokenizer, prompt=prompt, max_tokens=512, verbose=True))

The base model emits Qwen3.6 reasoning tags (<think>...</think>); strip them in post-processing or use the chat template that wraps them appropriately.

Memory & speed

Empirically (M-series, 32 GB+ recommended):

  • Resident memory: ~16-17 GB
  • Speed: comparable to other Qwen3 27B 4-bit MLX builds; depends on chip generation
  • Context length: inherits the base model's 256k context window (RAM permitting)

The vision tower is not included in the MLX export — this is a text-only build. For multimodal use, prefer the GGUF release with bartowski/Qwen_Qwen3.6-27B-GGUF's mmproj, or run BF16 via transformers.

Related

License

Apache 2.0 — inherits from Qwen3.6 base. See the base model card for the full attribution list (Qwen team, rico03, ValiantLabs, kai-os, mergekit community).

Downloads last month
128
Safetensors
Model size
27B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Model tree for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MLX-4bit

Quantized
(6)
this model