Nex-N2-mini, 8-bit MLX
This is nex-agi/Nex-N2-mini converted to MLX format and quantized to 8 bits (group size 64) with mlx-lm 0.31.3.
Nex-N2-mini is an agentic model built around what its authors call Agentic Thinking: it interleaves reasoning, tool use, and environment feedback rather than treating them as separate stages. The architecture is a hybrid MoE (qwen3_5_moe): 40 layers alternating linear attention with full attention every fourth layer, 256 experts with 8 active per token, and a 262k-token context window.
The original checkpoint includes a vision tower. MLX text inference does not use it, so the vision weights were dropped during conversion; this copy is text-only. Expect roughly 37 GB of memory in use during inference.
Usage
With mlx-lm, either directly:
mlx_lm.generate --model jedisct1/Nex-N2-mini-mlx-8bit --prompt "Hello"
or as an OpenAI-compatible server:
mlx_lm.server --model jedisct1/Nex-N2-mini-mlx-8bit
It also works out of the box with oMLX.
Tool calling works without any extra configuration. The chat template uses the
Qwen3-Coder XML style, which mlx-lm and oMLX both detect automatically, so servers
return proper structured tool_calls, and thinking ends up in the reasoning field
instead of leaking into the response content. Tested end to end with
Swival as the harness, including multi-step tasks that
exercise file edits, search, and shell commands while the model is thinking.
- Downloads last month
- 83
8-bit
Model tree for jedisct1/Nex-N2-mini-mlx-8bit
Base model
nex-agi/Nex-N2-mini