VOOZH about

URL: https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-Base

โ‡ฑ LiquidAI/LFM2.5-8B-A1B-Base ยท Hugging Face


๐Ÿ‘ Liquid AI
Try LFM โ€ข Docs โ€ข LEAP โ€ข Discord

LFM2.5-8B-A1B-Base

LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.

  • On-device personal assistant: Designed to power real-life applications, chaining tool calls, and following complex instructions on all devices.
  • Compressed performance: Competitive with much larger dense and MoE models on instruction following and agentic tasks.
  • Unmatched throughput: Fastest in its size class on both CPU and GPU inference, with day-one support for llama.cpp, MLX, vLLM, and SGLang.

Find more information about LFM2.5-8B-A1B in our blog post.

๐Ÿ‘ image

*AA-Omniscience Index (higher is better) rewards correct answers and penalizes hallucinations. Scores range from -100 to 100. See more results on Artificial Analysis.

๐Ÿ—’๏ธ Model Details

Model Parameters Description
LFM2.5-8B-A1B-Base 8.3B total / 1.5B active Pre-trained base model for fine-tuning
LFM2.5-8B-A1B 8.3B total / 1.5B active Reasoning-tuned general-purpose model

LFM2.5-8B-A1B is a general-purpose text-only model with the following features:

  • Total parameters: 8.3B
  • Active parameters: 1.5B
  • Number of layers: 24 (18 double-gated LIV conv + 6 GQA)
  • Training budget: 38 trillion tokens
  • Context length: 131,072
  • Vocabulary size: 128,000
  • Languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, Spanish
  • Generation parameters: We recommend the following parameters:
    • temperature: 0.2
    • top_p: 80
    • repetition_penalty: 1.05
Model Description
LFM2.5-8B-A1B Original model checkpoint in native format. Best for fine-tuning or inference with Transformers, vLLM, and SGLang.
LFM2.5-8B-A1B-GGUF Quantized format for llama.cpp and compatible tools. Optimized for edge inference and local deployment.
LFM2.5-8B-A1B-ONNX ONNX Runtime format for cross-platform deployment.
LFM2.5-8B-A1B-MLX MLX format for Apple Silicon. Optimized for fast inference on Mac devices.

We recommend using LFM2.5-8B-A1B for agentic workflows, tool use, structured outputs, multilingual assistants, and on-device personal-assistant applications. It is not the best fit for heavy programming or knowledge-intensive question answering without retrieval.

๐Ÿƒ Inference

LFM2.5-8B-A1B is supported by many inference frameworks. See the Inference documentation for the full list.

Name Description Docs Notebook
Transformers Simple inference with direct access to model internals. Link ๐Ÿ‘ Colab link
vLLM High-throughput production deployments with GPU. Link ๐Ÿ‘ Colab link
llama.cpp Cross-platform inference with CPU offloading. Link ๐Ÿ‘ Colab link
MLX Apple's machine learning framework optimized for Apple Silicon. Link โ€”
LM Studio Desktop application for running LLMs locally. Link โ€”

Quick start with Transformers (compatible with transformers>=5.0.0):

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_id = "LiquidAI/LFM2.5-8B-A1B-Base"
model = AutoModelForCausalLM.from_pretrained(
 model_id,
 device_map="auto",
 dtype="bfloat16",
# attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "What is C. elegans?"

input_ids = tokenizer.apply_chat_template(
 [{"role": "user", "content": prompt}],
 add_generation_prompt=True,
 return_tensors="pt",
 tokenize=True,
).to(model.device)

output = model.generate(
 input_ids,
 do_sample=True,
 temperature=0.2,
 top_k=80,
 repetition_penalty=1.05,
 max_new_tokens=8192,
 streamer=streamer,
)

๐Ÿ”ง Fine-Tuning

We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.

Name Description Docs Notebook
CPT (Unsloth) Continued Pre-Training using Unsloth for text completion. Link ๐Ÿ‘ Colab link
CPT (Unsloth) Continued Pre-Training using Unsloth for translation. Link ๐Ÿ‘ Colab link
SFT (Unsloth) Supervised Fine-Tuning with LoRA using Unsloth. Link ๐Ÿ‘ Colab link
SFT (TRL) Supervised Fine-Tuning with LoRA using TRL. Link ๐Ÿ‘ Colab link
DPO (TRL) Direct Preference Optimization with LoRA using TRL. Link ๐Ÿ‘ Colab link
GRPO (Unsloth) GRPO with LoRA using Unsloth. Link ๐Ÿ‘ Colab link
GRPO (TRL) GRPO with LoRA using TRL. Link ๐Ÿ‘ Colab link

๐Ÿ“ฌ Contact

Citation

@article{liquidAI20268BA1B,
 author = {Liquid AI},
 title = {LFM2.5-8B-A1B: Personal Assistant On Your Laptop},
 journal = {Liquid AI Blog},
 year = {2026},
 note = {www.liquid.ai/blog/lfm2-5-8b-a1b},
}
@article{liquidai2025lfm2,
 title = {LFM2 Technical Report},
 author = {Liquid AI},
 journal = {arXiv preprint arXiv:2511.23404},
 year = {2025}
}
Downloads last month
4,712
Safetensors
Model size
8B params
Tensor type
F32
ยท
BF16
ยท

Model tree for LiquidAI/LFM2.5-8B-A1B-Base

Adapters
1 model
Finetunes
17 models
Quantizations
2 models

Paper for LiquidAI/LFM2.5-8B-A1B-Base