LFM2.5-230M Fable-5 GGUF
Fine-tuned GGUF release of LiquidAI/LFM2.5-230M on Glint-Research/Fable-5-traces.
Files
lfm2.5-230m-fable-5-f16.gguf— highest quality, largest filelfm2.5-230m-fable-5-q8_0.gguf— high quality, smallerlfm2.5-230m-fable-5-q4_k_m.gguf— best default for local inference
Training
- Base model:
LiquidAI/LFM2.5-230M - Dataset:
Glint-Research/Fable-5-traces - File used:
fable5_cot_merged.jsonl - Method: PEFT LoRA SFT
- Max sequence length: 4096
- Epochs: 1
- LoRA rank: 32
- LoRA alpha: 64
- LoRA dropout: 0.05
- Precision: FP16 base model, FP32 LoRA trainable weights
- Hardware: Google Colab T4
- Format: Chat template system/user/assistant, preserving Fable
context -> completion
Final training loss samples
- step 555: 1.7037
- step 560: 1.5968
- step 565: 1.6435
- step 570: 1.6109
- step 575: 1.6589
- step 580: 1.6439
Evaluation
We evaluated AKMESSI/lfm2.5-230m-fable-5:F16 against the original base model, LiquidAI/LFM2.5-230M-GGUF:BF16, using local llama.cpp server inference.
These are not official leaderboard submissions. They are lightweight local evaluations intended to compare the fine-tuned model against the base model under the same prompts, decoding settings, and hardware setup.
Summary
The Fable-5 fine-tune improves repository-context code continuation on RepoBench-C-lite Python, while mostly preserving the base model's generic function-calling behavior on BFCL-lite Simple.
| Benchmark | Result |
|---|---|
| RepoBench-C-lite Python | Fine-tuned model outperforms base model |
| BFCL-lite Simple | Fine-tuned model mostly preserves base function-calling ability |
| CodeXGLUE Line Completion Python | Neutral / unchanged |
| CRUXEval-lite | Not a good fit for this trace-style model |
RepoBench-C-lite Python
RepoBench-C-style next-line code completion was used to evaluate repository-context code continuation. We sampled 100 examples each from python_if, python_cff, and python_cfr, for 300 total examples.
| Model | Examples | Exact Match | Prefix Match | Edit Similarity |
|---|---|---|---|---|
LiquidAI/LFM2.5-230M-GGUF:BF16 |
300 | 10.33% | 10.67% | 46.85% |
AKMESSI/lfm2.5-230m-fable-5:F16 |
300 | 14.67% | 15.33% | 50.17% |
Compared with the base model, the Fable-5 fine-tune improved:
- Exact match by +4.33 percentage points
- Prefix match by +4.67 percentage points
- Edit similarity by +3.32 points
Breakdown by config:
| Config | Base Exact | Fable Exact | Base Edit Sim | Fable Edit Sim |
|---|---|---|---|---|
python_if |
21.00% | 27.00% | 55.14% | 57.31% |
python_cff |
3.00% | 5.00% | 37.45% | 38.10% |
python_cfr |
7.00% | 12.00% | 47.96% | 55.10% |
BFCL-lite Simple
We also ran a local BFCL-lite Simple function-calling evaluation over 400 examples as a generic tool-calling control.
| Model | Examples | Parse-valid JSON | Function-name Match | Argument Recall | Rough Score |
|---|---|---|---|---|---|
LiquidAI/LFM2.5-230M-GGUF:BF16 |
400 | 97.75% | 97.50% | 71.60% | 88.44% |
AKMESSI/lfm2.5-230m-fable-5:F16 |
400 | 98.25% | 95.00% | 67.70% | 85.44% |
The fine-tuned model preserves most of the base model's generic function-calling behavior, but does not improve BFCL-style API-schema-to-JSON calling. This is expected because the training data consists of coding-agent traces rather than clean function-calling examples.
CodeXGLUE Line Completion Python
We ran a 1,000-example local CodeXGLUE line-completion evaluation as a general code-completion control.
| Model | Examples | Exact Match | Prefix Match | Edit Similarity |
|---|---|---|---|---|
LiquidAI/LFM2.5-230M-GGUF:BF16 |
1000 | 23.60% | 0.00% | 23.60% |
AKMESSI/lfm2.5-230m-fable-5:F16 |
1000 | 23.50% | 0.00% | 23.50% |
This result is effectively neutral. The Fable-5 fine-tune does not materially change general line-completion performance on this setup.
CRUXEval-lite
We also tried a 200-example CRUXEval-lite run for Python execution reasoning.
| Model | Task O Accuracy | Task I Accuracy | Overall Accuracy |
|---|---|---|---|
LiquidAI/LFM2.5-230M-GGUF:BF16 |
8.50% | 4.00% | 6.25% |
AKMESSI/lfm2.5-230m-fable-5:F16 |
0.00% | 0.00% | 0.00% |
This benchmark was not a good fit for the fine-tuned model. The Fable-5 model often entered explanation or trace-style response mode instead of returning only the exact literal Python value expected by CRUXEval.
Interpretation
The Fable-5 fine-tune appears to shift the base model toward coding-agent and repository-context continuation behavior.
It improves RepoBench-C-lite Python next-line completion, while mostly preserving generic function-calling ability on BFCL-lite Simple. The main regression is in exact BFCL-style argument filling, which is not the main target of the Fable-5 trace dataset.
The model is best understood as a tiny coding-agent trace model, not a general-purpose reasoning model or a benchmark-specialized function-calling model.
Evaluation Caveats
- These are local lightweight evaluations, not official leaderboard submissions.
- Results were produced with llama.cpp server inference.
- Scores may vary with prompting, decoding settings, quantization level, and benchmark harness details.
- BFCL-lite and RepoBench-C-lite use simplified local scoring scripts rather than official leaderboard infrastructure.
- Only the F16 model was benchmarked here; quantized GGUF variants may differ slightly.
Usage
Recommended local file:
lfm2.5-230m-fable-5-q4_k_m.gguf
Caveats
This model is trained on coding-agent trace telemetry. It may emit tool-call-like actions, shell commands, file paths, or long reasoning-style continuations. Review outputs before executing commands.
The dataset contains coding-agent traces and should not be treated as a clean benchmark or a safety-filtered assistant dataset.
License notes
- Base model: LiquidAI LFM Open License v1.0
- Dataset: AGPL-3.0
- This repo preserves upstream license notices. Check compatibility before commercial or closed-source use.
- Downloads last month
- 107
4-bit
8-bit
16-bit
