Nex-N2-mini, OptiQ 4-bit MLX
This is nex-agi/Nex-N2-mini converted to MLX format and quantized with oMLX's oQ4 mixed-precision scheme (4-bit base, sensitivity-driven bit allocation, MSE-optimal clipping). The result is about 19 GB on disk, roughly 4.7 bits per weight effective, with sensitive tensors such as the linear-attention projections kept at higher precision.
Nex-N2-mini is an agentic model built around what its authors call Agentic Thinking: it interleaves reasoning, tool use, and environment feedback rather than treating them as separate stages. The architecture is a hybrid MoE (qwen3_5_moe): 40 layers alternating linear attention with full attention every fourth layer, 256 experts with 8 active per token, and a 262k-token context window.
The original checkpoint includes a vision tower. MLX text inference does not use it, so the vision weights were dropped during conversion; this copy is text-only. Expect around 21 GB of memory in use during inference.
Usage
With mlx-lm, either directly:
mlx_lm.generate --model Nex-N2-mini-OptiQ-4bit --prompt "Hello"
or as an OpenAI-compatible server:
mlx_lm.server --model Nex-N2-mini-OptiQ-4bit
It also works out of the box with oMLX.
Tool calling works without any extra configuration. The chat template uses the
Qwen3-Coder XML style, which mlx-lm and oMLX both detect automatically, so servers
return proper structured tool_calls, and thinking ends up in the reasoning field
instead of leaking into the response content. Tested end to end with
Swival as the harness, including multi-step tasks that
exercise file edits, search, and shell commands while the model is thinking.
An 8-bit companion is available at jedisct1/Nex-N2-mini-mlx-8bit.
- Downloads last month
- 121
4-bit
Model tree for jedisct1/Nex-N2-mini-mlx-OptiQ-4bit
Base model
nex-agi/Nex-N2-mini