Qwen3.6-27B-Omnimerge-v4 — MLX 4-bit
4-bit MLX quantization of ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 for native Apple Silicon inference via mlx-lm.
The base model is a same-base DARE-TIES (Omnimerge_v2 method) merge of Qwen/Qwen3.6-27B with three Qwen3.6 fine-tunes (rico03, Esper3.1, kai-os Opus-Reasoning-anchor) plus an MLP-passthrough surgery that fixes Qwen3.6's reasoning-tag-emission fragility. Benchmark numbers and the full method writeup live on the base model card.
Quantization
- Type: MLX 4-bit (
-q --q-bits 4) - Group size: 64
- Effective bits/weight: 4.501 (per the
mlx_lm.convertquantizer) - Shape on disk: 3 safetensors shards, ~15 GB total
- Build env:
mlx==0.30.0+mlx-cuda==0.30.0+mlx-lm==0.30.7on Linux + RTX 3090 (CUDA backend used only for the conversion step; end users run the native Apple Silicon MLX runtime, which has no CUDA dependency).
Conversion recipe: omnimergekit/scripts/mlx_convert.sh — the canonical OmniMergeKit MLX-conversion runner. See MLX_CONVERT.md for the full pin rationale and disk-budget notes.
Usage
Tested with mlx-lm >= 0.30.7 on macOS (M1/M2/M3/M4):
pip install -U mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MLX-4bit")
prompt = "Write a Rust function that returns the n-th Fibonacci number iteratively."
print(generate(model, tokenizer, prompt=prompt, max_tokens=512, verbose=True))
The base model emits Qwen3.6 reasoning tags (<think>...</think>); strip them in post-processing or use the chat template that wraps them appropriately.
Memory & speed
Empirically (M-series, 32 GB+ recommended):
- Resident memory: ~16-17 GB
- Speed: comparable to other Qwen3 27B 4-bit MLX builds; depends on chip generation
- Context length: inherits the base model's 256k context window (RAM permitting)
The vision tower is not included in the MLX export — this is a text-only build. For multimodal use, prefer the GGUF release with bartowski/Qwen_Qwen3.6-27B-GGUF's mmproj, or run BF16 via transformers.
Related
- Base merge:
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 - GGUF release:
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF - Ollama tags:
mannix/omnimerge-v4 - Methodology + scripts:
mann1x/omnimergekit
License
Apache 2.0 — inherits from Qwen3.6 base. See the base model card for the full attribution list (Qwen team, rico03, ValiantLabs, kai-os, mergekit community).
- Downloads last month
- 128
4-bit
Model tree for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MLX-4bit
Base model
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4