MiniCPM5-1B-Agent

A tiny agentic coding agent for CPU: a full fine-tune (large dataset capacity) of openbmb/MiniCPM5-1B (RL+OPD checkpoint, 4 iteration or ~6d of training) specialized to reason in <think>, call a small tool set (bash/read/write/edit/glob/grep), and run -> read output -> debug -> patch -> verify. Runs the whole loop on a free CPU.

Reproduce

The training scripts are in code/ (see code/README.md). This is the recipe + code, not a one-command runner: it also needs the 26 source HF datasets (listed below), the abliterated openbmb/MiniCPM5-1B base, a CUDA PyTorch env (torch cu128 + liger-kernel), and llama.cpp for the GGUF step. The final v4 data this produces is already bundled at dataset/. Full fine-tunes fit under ~18 GB VRAM. The pipeline:

# 1) BUILD DATA -> train_v4.jsonl (45,762 rows). Keeps the proven v2 backbone WHOLE (42,224 rows) + ~3,538
# CURATED rows: served-vocab gate, drop non-terminating / explore-only / over-long traces, solution-aware
# MinHash dedup. Converters: code/data/converters/*.py; canonical render + assistant-span mask: code/data/schema.py
python code/data/build_v4.py

# 2) SFT - full fine-tune the abliterated base on the v4 mix (1 epoch; Liger fused CE + mem-efficient SDPA)
python code/train/sft.py --model <abliterated-base> \
 --train_file dataset/train_v4.jsonl --out outputs/sft_v4 \
 --epochs 1 --bsz 1 --accum 24 --lr 1e-5 --max_len 24576 --train_cap 24576

# 3) BUILD DPO PAIRS - ON-POLICY: run the SFT model over the training prompts, capture its OWN behaviour.
# chosen = a VALID <function> tool call (the model's own correct format, else the gold call);
# rejected = its real miss (rambles in <think> / answers in prose with no tool call). ~649 pairs.
python code/data/build_prefs_onpolicy_gpu.py --model outputs/sft_v4 \
 --src dataset/train_v4.jsonl --out dataset/dpo_onpolicy_v4.jsonl

# 4) DPO - full fine-tune (custom completion-only loop; fits 32 GB), reference = the SFT-v4 model
python code/train/dpo.py --model outputs/sft_v4 \
 --data dataset/dpo_onpolicy_v4.jsonl --out outputs/dpo_v4 \
 --beta 0.1 --lr 1e-6 --epochs 3 --accum 8

# 5) GGUF for CPU serving (f16 + Q8_0) - using llama.cpp (github.com/ggerganov/llama.cpp)
python llama.cpp/convert_hf_to_gguf.py outputs/dpo_v4 --outfile dpo_v4-f16.gguf --outtype f16
llama-quantize dpo_v4-f16.gguf dpo_v4-Q8_0.gguf Q8_0

Credits / inspiration (repos & tools)

opencode and claw-code (open coding-agent frameworks), smallcode (small-LLM agent patterns); DataClaw (agent traces Claude Code); TeichAI (distilled agent-trace datasets + their Datagen tool), Unsloth.

Downloads last month: 1,501

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

8-bit

16-bit

Model tree for Luminia/MiniCPM5-1B-Agent-GGUF

Base model

openbmb/MiniCPM5-1B

Quantized

(40)

this model

URL: https://huggingface.co/Luminia/MiniCPM5-1B-Agent-GGUF

⇱ Luminia/MiniCPM5-1B-Agent-GGUF · Hugging Face

MiniCPM5-1B-Agent

Reproduce

Credits / inspiration (repos & tools)

Model tree for Luminia/MiniCPM5-1B-Agent-GGUF

Datasets used to train Luminia/MiniCPM5-1B-Agent-GGUF

Space using Luminia/MiniCPM5-1B-Agent-GGUF 1