Try LFM • Docs • LEAP • Discord

LFM2.5-8B-A1B-Base

LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.

On-device personal assistant: Designed to power real-life applications, chaining tool calls, and following complex instructions on all devices.
Compressed performance: Competitive with much larger dense and MoE models on instruction following and agentic tasks.
Unmatched throughput: Fastest in its size class on both CPU and GPU inference, with day-one support for llama.cpp, MLX, vLLM, and SGLang.

Find more information about LFM2.5-8B-A1B in our blog post.

👁 image

*AA-Omniscience Index (higher is better) rewards correct answers and penalizes hallucinations. Scores range from -100 to 100. See more results on Artificial Analysis.

🗒️ Model Details

Model	Parameters	Description
LFM2.5-8B-A1B-Base	8.3B total / 1.5B active	Pre-trained base model for fine-tuning
LFM2.5-8B-A1B	8.3B total / 1.5B active	Reasoning-tuned general-purpose model

LFM2.5-8B-A1B is a general-purpose text-only model with the following features:

Total parameters: 8.3B
Active parameters: 1.5B
Number of layers: 24 (18 double-gated LIV conv + 6 GQA)
Training budget: 38 trillion tokens
Context length: 131,072
Vocabulary size: 128,000
Languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, Spanish
Generation parameters: We recommend the following parameters:
- temperature: 0.2
- top_p: 80
- repetition_penalty: 1.05

Model	Description
LFM2.5-8B-A1B	Original model checkpoint in native format. Best for fine-tuning or inference with Transformers, vLLM, and SGLang.
LFM2.5-8B-A1B-GGUF	Quantized format for llama.cpp and compatible tools. Optimized for edge inference and local deployment.
LFM2.5-8B-A1B-ONNX	ONNX Runtime format for cross-platform deployment.
LFM2.5-8B-A1B-MLX	MLX format for Apple Silicon. Optimized for fast inference on Mac devices.

We recommend using LFM2.5-8B-A1B for agentic workflows, tool use, structured outputs, multilingual assistants, and on-device personal-assistant applications. It is not the best fit for heavy programming or knowledge-intensive question answering without retrieval.

🏃 Inference

LFM2.5-8B-A1B is supported by many inference frameworks. See the Inference documentation for the full list.

Name	Description	Docs	Notebook
Transformers	Simple inference with direct access to model internals.	Link	👁 Colab link
vLLM	High-throughput production deployments with GPU.	Link	👁 Colab link
llama.cpp	Cross-platform inference with CPU offloading.	Link	👁 Colab link
MLX	Apple's machine learning framework optimized for Apple Silicon.	Link	—
LM Studio	Desktop application for running LLMs locally.	Link	—

Quick start with Transformers (compatible with transformers>=5.0.0):

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_id = "LiquidAI/LFM2.5-8B-A1B-Base"
model = AutoModelForCausalLM.from_pretrained(
 model_id,
 device_map="auto",
 dtype="bfloat16",
# attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "What is C. elegans?"

input_ids = tokenizer.apply_chat_template(
 [{"role": "user", "content": prompt}],
 add_generation_prompt=True,
 return_tensors="pt",
 tokenize=True,
).to(model.device)

output = model.generate(
 input_ids,
 do_sample=True,
 temperature=0.2,
 top_k=80,
 repetition_penalty=1.05,
 max_new_tokens=8192,
 streamer=streamer,
)

🔧 Fine-Tuning

We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.

Name	Description	Docs	Notebook
CPT (Unsloth)	Continued Pre-Training using Unsloth for text completion.	Link	👁 Colab link
CPT (Unsloth)	Continued Pre-Training using Unsloth for translation.	Link	👁 Colab link
SFT (Unsloth)	Supervised Fine-Tuning with LoRA using Unsloth.	Link	👁 Colab link
SFT (TRL)	Supervised Fine-Tuning with LoRA using TRL.	Link	👁 Colab link
DPO (TRL)	Direct Preference Optimization with LoRA using TRL.	Link	👁 Colab link
GRPO (Unsloth)	GRPO with LoRA using Unsloth.	Link	👁 Colab link
GRPO (TRL)	GRPO with LoRA using TRL.	Link	👁 Colab link

📬 Contact

Got questions or want to connect? Join our Discord community.
If you are interested in custom solutions with edge deployment, please contact our sales team.

Citation

@article{liquidAI20268BA1B,
 author = {Liquid AI},
 title = {LFM2.5-8B-A1B: Personal Assistant On Your Laptop},
 journal = {Liquid AI Blog},
 year = {2026},
 note = {www.liquid.ai/blog/lfm2-5-8b-a1b},
}

@article{liquidai2025lfm2,
 title = {LFM2 Technical Report},
 author = {Liquid AI},
 journal = {arXiv preprint arXiv:2511.23404},
 year = {2025}
}

Downloads last month: 4,712

Safetensors

Model size

8B params

Tensor type

F32

BF16

Model tree for LiquidAI/LFM2.5-8B-A1B-Base

Adapters

1 model

Finetunes

17 models

Quantizations

2 models

Paper for LiquidAI/LFM2.5-8B-A1B-Base

Paper • 2511.23404 • Published Nov 28, 2025 • 61

URL: https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-Base

⇱ LiquidAI/LFM2.5-8B-A1B-Base · Hugging Face