Qwen3.6-27B-Omnimerge-v4 — MLX 4-bit

4-bit MLX quantization of ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 for native Apple Silicon inference via mlx-lm.

The base model is a same-base DARE-TIES (Omnimerge_v2 method) merge of Qwen/Qwen3.6-27B with three Qwen3.6 fine-tunes (rico03, Esper3.1, kai-os Opus-Reasoning-anchor) plus an MLP-passthrough surgery that fixes Qwen3.6's reasoning-tag-emission fragility. Benchmark numbers and the full method writeup live on the base model card.

Quantization

Type: MLX 4-bit (-q --q-bits 4)
Group size: 64
Effective bits/weight: 4.501 (per the mlx_lm.convert quantizer)
Shape on disk: 3 safetensors shards, ~15 GB total
Build env: mlx==0.30.0 + mlx-cuda==0.30.0 + mlx-lm==0.30.7 on Linux + RTX 3090 (CUDA backend used only for the conversion step; end users run the native Apple Silicon MLX runtime, which has no CUDA dependency).

Conversion recipe: omnimergekit/scripts/mlx_convert.sh — the canonical OmniMergeKit MLX-conversion runner. See MLX_CONVERT.md for the full pin rationale and disk-budget notes.

Usage

Tested with mlx-lm >= 0.30.7 on macOS (M1/M2/M3/M4):

pip install -U mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MLX-4bit")
prompt = "Write a Rust function that returns the n-th Fibonacci number iteratively."
print(generate(model, tokenizer, prompt=prompt, max_tokens=512, verbose=True))

The base model emits Qwen3.6 reasoning tags (<think>...</think>); strip them in post-processing or use the chat template that wraps them appropriately.

Memory & speed

Empirically (M-series, 32 GB+ recommended):

Resident memory: ~16-17 GB
Speed: comparable to other Qwen3 27B 4-bit MLX builds; depends on chip generation
Context length: inherits the base model's 256k context window (RAM permitting)

The vision tower is not included in the MLX export — this is a text-only build. For multimodal use, prefer the GGUF release with bartowski/Qwen_Qwen3.6-27B-GGUF's mmproj, or run BF16 via transformers.

Base merge: ManniX-ITA/Qwen3.6-27B-Omnimerge-v4
GGUF release: ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF
Ollama tags: mannix/omnimerge-v4
Methodology + scripts: mann1x/omnimergekit

License

Apache 2.0 — inherits from Qwen3.6 base. See the base model card for the full attribution list (Qwen team, rico03, ValiantLabs, kai-os, mergekit community).

Downloads last month: 128

Safetensors

Model size

27B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MLX-4bit

Base model

ManniX-ITA/Qwen3.6-27B-Omnimerge-v4

Quantized

(6)

this model

URL: https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MLX-4bit