LFM2.5-230M Fable-5 GGUF

Fine-tuned GGUF release of LiquidAI/LFM2.5-230M on Glint-Research/Fable-5-traces.

Files

lfm2.5-230m-fable-5-f16.gguf — highest quality, largest file
lfm2.5-230m-fable-5-q8_0.gguf — high quality, smaller
lfm2.5-230m-fable-5-q4_k_m.gguf — best default for local inference

Training

Base model: LiquidAI/LFM2.5-230M
Dataset: Glint-Research/Fable-5-traces
File used: fable5_cot_merged.jsonl
Method: PEFT LoRA SFT
Max sequence length: 4096
Epochs: 1
LoRA rank: 32
LoRA alpha: 64
LoRA dropout: 0.05
Precision: FP16 base model, FP32 LoRA trainable weights
Hardware: Google Colab T4
Format: Chat template system/user/assistant, preserving Fable context -> completion

Final training loss samples

step 555: 1.7037
step 560: 1.5968
step 565: 1.6435
step 570: 1.6109
step 575: 1.6589
step 580: 1.6439

Evaluation

We evaluated AKMESSI/lfm2.5-230m-fable-5:F16 against the original base model, LiquidAI/LFM2.5-230M-GGUF:BF16, using local llama.cpp server inference.

These are not official leaderboard submissions. They are lightweight local evaluations intended to compare the fine-tuned model against the base model under the same prompts, decoding settings, and hardware setup.

Summary

The Fable-5 fine-tune improves repository-context code continuation on RepoBench-C-lite Python, while mostly preserving the base model's generic function-calling behavior on BFCL-lite Simple.

Benchmark	Result
RepoBench-C-lite Python	Fine-tuned model outperforms base model
BFCL-lite Simple	Fine-tuned model mostly preserves base function-calling ability
CodeXGLUE Line Completion Python	Neutral / unchanged
CRUXEval-lite	Not a good fit for this trace-style model

RepoBench-C-lite Python

RepoBench-C-style next-line code completion was used to evaluate repository-context code continuation. We sampled 100 examples each from python_if, python_cff, and python_cfr, for 300 total examples.

Model	Examples	Exact Match	Prefix Match	Edit Similarity
`LiquidAI/LFM2.5-230M-GGUF:BF16`	300	10.33%	10.67%	46.85%
`AKMESSI/lfm2.5-230m-fable-5:F16`	300	14.67%	15.33%	50.17%

Compared with the base model, the Fable-5 fine-tune improved:

Exact match by +4.33 percentage points
Prefix match by +4.67 percentage points
Edit similarity by +3.32 points

Breakdown by config:

Config	Base Exact	Fable Exact	Base Edit Sim	Fable Edit Sim
`python_if`	21.00%	27.00%	55.14%	57.31%
`python_cff`	3.00%	5.00%	37.45%	38.10%
`python_cfr`	7.00%	12.00%	47.96%	55.10%

BFCL-lite Simple

We also ran a local BFCL-lite Simple function-calling evaluation over 400 examples as a generic tool-calling control.

Model	Examples	Parse-valid JSON	Function-name Match	Argument Recall	Rough Score
`LiquidAI/LFM2.5-230M-GGUF:BF16`	400	97.75%	97.50%	71.60%	88.44%
`AKMESSI/lfm2.5-230m-fable-5:F16`	400	98.25%	95.00%	67.70%	85.44%

The fine-tuned model preserves most of the base model's generic function-calling behavior, but does not improve BFCL-style API-schema-to-JSON calling. This is expected because the training data consists of coding-agent traces rather than clean function-calling examples.

CodeXGLUE Line Completion Python

We ran a 1,000-example local CodeXGLUE line-completion evaluation as a general code-completion control.

Model	Examples	Exact Match	Prefix Match	Edit Similarity
`LiquidAI/LFM2.5-230M-GGUF:BF16`	1000	23.60%	0.00%	23.60%
`AKMESSI/lfm2.5-230m-fable-5:F16`	1000	23.50%	0.00%	23.50%

This result is effectively neutral. The Fable-5 fine-tune does not materially change general line-completion performance on this setup.

CRUXEval-lite

We also tried a 200-example CRUXEval-lite run for Python execution reasoning.

Model	Task O Accuracy	Task I Accuracy	Overall Accuracy
`LiquidAI/LFM2.5-230M-GGUF:BF16`	8.50%	4.00%	6.25%
`AKMESSI/lfm2.5-230m-fable-5:F16`	0.00%	0.00%	0.00%

This benchmark was not a good fit for the fine-tuned model. The Fable-5 model often entered explanation or trace-style response mode instead of returning only the exact literal Python value expected by CRUXEval.

Interpretation

The Fable-5 fine-tune appears to shift the base model toward coding-agent and repository-context continuation behavior.

It improves RepoBench-C-lite Python next-line completion, while mostly preserving generic function-calling ability on BFCL-lite Simple. The main regression is in exact BFCL-style argument filling, which is not the main target of the Fable-5 trace dataset.

The model is best understood as a tiny coding-agent trace model, not a general-purpose reasoning model or a benchmark-specialized function-calling model.

Evaluation Caveats

These are local lightweight evaluations, not official leaderboard submissions.
Results were produced with llama.cpp server inference.
Scores may vary with prompting, decoding settings, quantization level, and benchmark harness details.
BFCL-lite and RepoBench-C-lite use simplified local scoring scripts rather than official leaderboard infrastructure.
Only the F16 model was benchmarked here; quantized GGUF variants may differ slightly.

Usage

Recommended local file:

lfm2.5-230m-fable-5-q4_k_m.gguf

Caveats

This model is trained on coding-agent trace telemetry. It may emit tool-call-like actions, shell commands, file paths, or long reasoning-style continuations. Review outputs before executing commands.

The dataset contains coding-agent traces and should not be treated as a clean benchmark or a safety-filtered assistant dataset.

License notes

Base model: LiquidAI LFM Open License v1.0
Dataset: AGPL-3.0
This repo preserves upstream license notices. Check compatibility before commercial or closed-source use.

Downloads last month: 107

GGUF

Model size

0.2B params

Architecture

lfm2

Hardware compatibility

4-bit

8-bit

16-bit

Model tree for AKMESSI/lfm2.5-230m-fable-5

Base model

LiquidAI/LFM2.5-230M-Base

Finetuned

LiquidAI/LFM2.5-230M

Adapter

(1)

this model

URL: https://huggingface.co/AKMESSI/lfm2.5-230m-fable-5

⇱ AKMESSI/lfm2.5-230m-fable-5 · Hugging Face