Nex-N2-mini, 8-bit MLX

This is nex-agi/Nex-N2-mini converted to MLX format and quantized to 8 bits (group size 64) with mlx-lm 0.31.3.

Nex-N2-mini is an agentic model built around what its authors call Agentic Thinking: it interleaves reasoning, tool use, and environment feedback rather than treating them as separate stages. The architecture is a hybrid MoE (qwen3_5_moe): 40 layers alternating linear attention with full attention every fourth layer, 256 experts with 8 active per token, and a 262k-token context window.

The original checkpoint includes a vision tower. MLX text inference does not use it, so the vision weights were dropped during conversion; this copy is text-only. Expect roughly 37 GB of memory in use during inference.

Usage

With mlx-lm, either directly:

mlx_lm.generate --model jedisct1/Nex-N2-mini-mlx-8bit --prompt "Hello"

or as an OpenAI-compatible server:

mlx_lm.server --model jedisct1/Nex-N2-mini-mlx-8bit

It also works out of the box with oMLX.

Tool calling works without any extra configuration. The chat template uses the Qwen3-Coder XML style, which mlx-lm and oMLX both detect automatically, so servers return proper structured tool_calls, and thinking ends up in the reasoning field instead of leaking into the response content. Tested end to end with Swival as the harness, including multi-step tasks that exercise file edits, search, and shell commands while the model is thinking.

Downloads last month: 83

Safetensors

Model size

35B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for jedisct1/Nex-N2-mini-mlx-8bit

Base model

nex-agi/Nex-N2-mini

Quantized

(51)

this model

Collection including jedisct1/Nex-N2-mini-mlx-8bit

4 items • Updated 9 days ago

URL: https://huggingface.co/jedisct1/Nex-N2-mini-mlx-8bit

⇱ jedisct1/Nex-N2-mini-mlx-8bit · Hugging Face

Nex-N2-mini, 8-bit MLX

Usage

Model tree for jedisct1/Nex-N2-mini-mlx-8bit

Collection including jedisct1/Nex-N2-mini-mlx-8bit