Try LFM • Docs • LEAP • Discord

LFM2.5-8B-A1B

⚠️Important: The tokenizer was updated after the original release to fix tool-calling issues in llama.cpp. If you downloaded LFM2.5-8B-A1B before commit feb5e04, please re-download the tokenizer files. The GGUF files have also been re-converted with the updated tokenizer.

LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.

On-device personal assistant: Designed to power real-life applications, chaining tool calls, and following complex instructions on all devices.
Compressed performance: Competitive with much larger dense and MoE models on instruction following and agentic tasks.
Unmatched throughput: Fastest in its size class on both CPU and GPU inference, with day-one support for llama.cpp, MLX, vLLM, and SGLang.

Find more information about LFM2.5-8B-A1B in our blog post.

👁 image

*AA-Omniscience Index (higher is better) rewards correct answers and penalizes hallucinations. Scores range from -100 to 100. See more results on Artificial Analysis.

🗒️ Model Details

Model	Parameters	Description
LFM2.5-8B-A1B-Base	8.3B total / 1.5B active	Pre-trained base model for fine-tuning
LFM2.5-8B-A1B	8.3B total / 1.5B active	Reasoning-tuned general-purpose model

LFM2.5-8B-A1B is a general-purpose text-only model with the following features:

Total parameters: 8.3B
Active parameters: 1.5B
Number of layers: 24 (18 double-gated LIV conv + 6 GQA)
Training budget: 38 trillion tokens
Context length: 128,000
Vocabulary size: 128,000
Languages: English, Arabic, Chinese, French, German, Italian, Japanese, Korean, Portuguese, Spanish
Generation parameters: We recommend the following parameters:
- temperature: 0.2
- top_k: 80
- repetition_penalty: 1.05

Model	Description
LFM2.5-8B-A1B	Original model checkpoint in native format. Best for fine-tuning or inference with Transformers, vLLM, and SGLang.
LFM2.5-8B-A1B-GGUF	Quantized format for llama.cpp and compatible tools. Optimized for edge inference and local deployment.
LFM2.5-8B-A1B-ONNX	ONNX Runtime format for cross-platform deployment.
LFM2.5-8B-A1B-MLX	MLX format for Apple Silicon. Optimized for fast inference on Mac devices.

We recommend using LFM2.5-8B-A1B for agentic workflows, tool use, structured outputs, multilingual assistants, and on-device personal-assistant applications. It is not the best fit for heavy programming or knowledge-intensive question answering without retrieval.

Chat Template

LFM2.5 uses a ChatML-like format. See the Chat Template documentation for details. Example:

<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
What is C. elegans?<|im_end|>
<|im_start|>assistant

Because LFM2.5-8B-A1B is a reasoning model, assistant turns contain an explicit chain of thought before the final answer. You can use tokenizer.apply_chat_template() to format your messages automatically.

Tool Use

LFM2.5 supports function calling in four steps:

Function definition: Provide the list of tools as a JSON object in the system prompt, or use tokenizer.apply_chat_template() with tools=....
Function call: By default, LFM2.5 writes Pythonic function calls (a Python list between <|tool_call_start|> and <|tool_call_end|> special tokens), as the assistant answer. You can override this behavior by asking the model to output JSON function calls in the system prompt.
Function execution: Execute the call and return the result with the tool role.
Final answer: LFM2.5 interprets the tool output and returns a plain-text answer addressing the original prompt.

See the Tool Use documentation for the full guide. Example:

<|startoftext|><|im_start|>system
List of tools: [{"name": "get_candidate_status", "description": "Retrieves the current status of a candidate in the recruitment process", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "Unique identifier for the candidate"}}, "required": ["candidate_id"]}}]<|im_end|>
<|im_start|>user
What is the current status of candidate ID 12345?<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|>
<|im_start|>tool
[{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}]<|im_end|>
<|im_start|>assistant
The candidate with ID 12345 is currently in the "Interview Scheduled" stage for the position of Clinical Research Associate, with an interview date set for 2023-11-20.<|im_end|>

🏃 Inference

LFM2.5-8B-A1B is supported by many inference frameworks. See the Inference documentation for the full list.

Name	Description	Docs	Notebook
Transformers	Simple inference with direct access to model internals.	Link	👁 Colab link
vLLM	High-throughput production deployments with GPU.	Link	👁 Colab link
llama.cpp	Cross-platform inference with CPU offloading.	Link	👁 Colab link
MLX	Apple's machine learning framework optimized for Apple Silicon.	Link	—
LM Studio	Desktop application for running LLMs locally.	Link	—

Quick start with Transformers (compatible with transformers>=5.0.0):

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_id = "LiquidAI/LFM2.5-8B-A1B"
model = AutoModelForCausalLM.from_pretrained(
 model_id,
 device_map="auto",
 dtype="bfloat16",
# attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "What is C. elegans?"

input_ids = tokenizer.apply_chat_template(
 [{"role": "user", "content": prompt}],
 add_generation_prompt=True,
 return_tensors="pt",
 tokenize=True,
)["input_ids"].to(model.device)

output = model.generate(
 input_ids,
 do_sample=True,
 temperature=0.2,
 top_k=80,
 repetition_penalty=1.05,
 max_new_tokens=8192,
 streamer=streamer,
)

🔧 Fine-Tuning

We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.

Name	Description	Docs	Notebook
CPT (Unsloth)	Continued Pre-Training using Unsloth for text completion.	Link	👁 Colab link
CPT (Unsloth)	Continued Pre-Training using Unsloth for translation.	Link	👁 Colab link
SFT (Unsloth)	Supervised Fine-Tuning with LoRA using Unsloth.	Link	👁 Colab link
SFT (TRL)	Supervised Fine-Tuning with LoRA using TRL.	Link	👁 Colab link
DPO (TRL)	Direct Preference Optimization with LoRA using TRL.	Link	👁 Colab link
GRPO (Unsloth)	GRPO with LoRA using Unsloth.	Link	👁 Colab link
GRPO (TRL)	GRPO with LoRA using TRL.	Link	👁 Colab link

📊 Performance

Improvements over LFM2-8B-A1B

Thanks to reasoning, scaled-up pre-training, and large-scale RL, LFM2.5-8B-A1B improves over its predecessor across the board:

Benchmark	LFM2-8B-A1B	LFM2.5-8B-A1B	Δ
AA-Omniscience Index	-78.42	-24.70	+53.62
AA-Omniscience Accuracy	7.33	8.67	+1.34
AA-Omniscience Non-Hallucination Rate	7.46	63.47	+56.01
IFEval	79.44	91.84	+12.40
IFBench	26.00	56.47	+30.47
Multi-IF	58.54	79.93	+21.39
MATH500	74.80	88.76	+13.96
AIME25	20.00	42.53	+22.53
BFCLv3	45.07	64.36	+19.29
BFCLv4	25.52	48.50	+22.98
Tau² Telecom	13.60	88.07	+74.47
Tau² Retail	7.02	39.82	+32.80

Knowledge and instruction following

Model	Parameters	AA-Omni. Index	AA-Omni. Accuracy	AA-Omni. Non-Halluc.	IFEval	IFBench	Multi-IF
LFM2.5-8B-A1B	8B/A1B	-24.70	8.67	63.47	91.84	56.47	79.93
Granite-4.0-H-Tiny	7B/A1B	-75.50	9.37	6.38	82.23	21.28	59.00
Qwen3.5-4B	4B	-51.53	17.20	16.99	87.80	50.38	67.43
Qwen3-30B-A3B-Thinking-2507	30.5B/3.3B	-51.31	18.80	13.87	90.82	51.11	79.04
Gemma-4-E2B-IT	5.1B	-72	7.00	15.05	82.93	33.53	69.70
Gemma-4-E4B-IT	8B	-50.67	8.10	36.06	87.74	39.48	77.58
Gemma-4-26B-A4B-IT	26B/4B	-62.07	14.37	10.75	91.40	47.25	82.06
gpt-oss-20b	21B/3.6B	-49.17	14.57	24.50	86.73	58.65	76.64

Math and agentic workflows

Model	Parameters	MATH500	AIME25	AIME26	BFCLv3	BFCLv4	Tau² Telecom	Tau² Retail
LFM2.5-8B-A1B	8B/A1B	88.76	42.53	50.00	64.79	49.73	88.07	39.82
Granite-4.0-H-Tiny	7B/A1B	59.20	4.93	3.33	56.89	28.52	16.67	18.42
Qwen3.5-4B	4B	80.76	54.28	58.33	71.06	54.01	87.72	71.93
Qwen3-30B-A3B-Thinking-2507	30.5B/3.3B	86.48	71.67	66.67	73.39	50.53	21.93	56.14
Gemma-4-E2B-IT	5.1B	64.00	26	30	56.44	31.91	22.37	18.95
Gemma-4-E4B-IT	8B	65.00	34.33	40.67	57.31	33.92	26.75	42.11

CPU Inference

👁 image

GPU Inference

LFM2.5-8B-A1B is the fastest model in its size class, reaching 18.5K output tokens per second at high concurrency, over 1.6B tokens per day on a single H100.

👁 image

📬 Contact

Got questions or want to connect? Join our Discord community.
If you are interested in custom solutions with edge deployment, please contact our sales team.

Citation

@article{liquidAI20268BA1B,
 author = {Liquid AI},
 title = {LFM2.5-8B-A1B: Personal Assistant On Your Laptop},
 journal = {Liquid AI Blog},
 year = {2026},
 note = {www.liquid.ai/blog/lfm2-5-8b-a1b},
}

@article{liquidai2025lfm2,
 title = {LFM2 Technical Report},
 author = {Liquid AI},
 journal = {arXiv preprint arXiv:2511.23404},
 year = {2025}
}

Downloads last month: 94,062

Safetensors

Model size

8B params

Tensor type

F32

BF16

Model tree for LiquidAI/LFM2.5-8B-A1B

Base model

LiquidAI/LFM2.5-8B-A1B-Base

Finetuned

(17)

this model

Adapters

6 models

Finetunes

24 models

Quantizations

50 models

Spaces using LiquidAI/LFM2.5-8B-A1B 7

Collection including LiquidAI/LFM2.5-8B-A1B

Collection of post-trained and base LFM2.5 models. • 12 items • Updated 4 days ago • 154

Paper for LiquidAI/LFM2.5-8B-A1B

Paper • 2511.23404 • Published Nov 28, 2025 • 61

Evaluation results

MathArena/aime_2026 · MathArena Aime 2026 View evaluation results leaderboard
50

URL: https://huggingface.co/LiquidAI/LFM2.5-8B-A1B

⇱ LiquidAI/LFM2.5-8B-A1B · Hugging Face

LFM2.5-8B-A1B

🗒️ Model Details

Chat Template

Tool Use

🏃 Inference

🔧 Fine-Tuning

📊 Performance

Improvements over LFM2-8B-A1B

Knowledge and instruction following

Math and agentic workflows

CPU Inference

GPU Inference

📬 Contact

Citation

Model tree for LiquidAI/LFM2.5-8B-A1B

Spaces using LiquidAI/LFM2.5-8B-A1B 7

Collection including LiquidAI/LFM2.5-8B-A1B

Paper for LiquidAI/LFM2.5-8B-A1B

Evaluation results