LFM2.5-8B-A1B-Base
LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.
- On-device personal assistant: Designed to power real-life applications, chaining tool calls, and following complex instructions on all devices.
- Compressed performance: Competitive with much larger dense and MoE models on instruction following and agentic tasks.
- Unmatched throughput: Fastest in its size class on both CPU and GPU inference, with day-one support for llama.cpp, MLX, vLLM, and SGLang.
Find more information about LFM2.5-8B-A1B in our blog post.
*AA-Omniscience Index (higher is better) rewards correct answers and penalizes hallucinations. Scores range from -100 to 100. See more results on Artificial Analysis.
๐๏ธ Model Details
| Model | Parameters | Description |
|---|---|---|
| LFM2.5-8B-A1B-Base | 8.3B total / 1.5B active | Pre-trained base model for fine-tuning |
| LFM2.5-8B-A1B | 8.3B total / 1.5B active | Reasoning-tuned general-purpose model |
LFM2.5-8B-A1B is a general-purpose text-only model with the following features:
- Total parameters: 8.3B
- Active parameters: 1.5B
- Number of layers: 24 (18 double-gated LIV conv + 6 GQA)
- Training budget: 38 trillion tokens
- Context length: 131,072
- Vocabulary size: 128,000
- Languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, Spanish
- Generation parameters: We recommend the following parameters:
temperature: 0.2top_p: 80repetition_penalty: 1.05
| Model | Description |
|---|---|
| LFM2.5-8B-A1B | Original model checkpoint in native format. Best for fine-tuning or inference with Transformers, vLLM, and SGLang. |
| LFM2.5-8B-A1B-GGUF | Quantized format for llama.cpp and compatible tools. Optimized for edge inference and local deployment. |
| LFM2.5-8B-A1B-ONNX | ONNX Runtime format for cross-platform deployment. |
| LFM2.5-8B-A1B-MLX | MLX format for Apple Silicon. Optimized for fast inference on Mac devices. |
We recommend using LFM2.5-8B-A1B for agentic workflows, tool use, structured outputs, multilingual assistants, and on-device personal-assistant applications. It is not the best fit for heavy programming or knowledge-intensive question answering without retrieval.
๐ Inference
LFM2.5-8B-A1B is supported by many inference frameworks. See the Inference documentation for the full list.
| Name | Description | Docs | Notebook |
|---|---|---|---|
| Transformers | Simple inference with direct access to model internals. | Link | ๐ Colab link |
| vLLM | High-throughput production deployments with GPU. | Link | ๐ Colab link |
| llama.cpp | Cross-platform inference with CPU offloading. | Link | ๐ Colab link |
| MLX | Apple's machine learning framework optimized for Apple Silicon. | Link | โ |
| LM Studio | Desktop application for running LLMs locally. | Link | โ |
Quick start with Transformers (compatible with transformers>=5.0.0):
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id = "LiquidAI/LFM2.5-8B-A1B-Base"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
dtype="bfloat16",
# attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = "What is C. elegans?"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
).to(model.device)
output = model.generate(
input_ids,
do_sample=True,
temperature=0.2,
top_k=80,
repetition_penalty=1.05,
max_new_tokens=8192,
streamer=streamer,
)
๐ง Fine-Tuning
We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.
| Name | Description | Docs | Notebook |
|---|---|---|---|
| CPT (Unsloth) | Continued Pre-Training using Unsloth for text completion. | Link | ๐ Colab link |
| CPT (Unsloth) | Continued Pre-Training using Unsloth for translation. | Link | ๐ Colab link |
| SFT (Unsloth) | Supervised Fine-Tuning with LoRA using Unsloth. | Link | ๐ Colab link |
| SFT (TRL) | Supervised Fine-Tuning with LoRA using TRL. | Link | ๐ Colab link |
| DPO (TRL) | Direct Preference Optimization with LoRA using TRL. | Link | ๐ Colab link |
| GRPO (Unsloth) | GRPO with LoRA using Unsloth. | Link | ๐ Colab link |
| GRPO (TRL) | GRPO with LoRA using TRL. | Link | ๐ Colab link |
๐ฌ Contact
- Got questions or want to connect? Join our Discord community.
- If you are interested in custom solutions with edge deployment, please contact our sales team.
Citation
@article{liquidAI20268BA1B,
author = {Liquid AI},
title = {LFM2.5-8B-A1B: Personal Assistant On Your Laptop},
journal = {Liquid AI Blog},
year = {2026},
note = {www.liquid.ai/blog/lfm2-5-8b-a1b},
}
@article{liquidai2025lfm2,
title = {LFM2 Technical Report},
author = {Liquid AI},
journal = {arXiv preprint arXiv:2511.23404},
year = {2025}
}
- Downloads last month
- 4,712
