VOOZH about

URL: https://huggingface.co/LiquidAI/LFM2-24B-A2B

โ‡ฑ LiquidAI/LFM2-24B-A2B ยท Hugging Face


๐Ÿ‘ Liquid AI
Try LFM โ€ข Docs โ€ข LEAP โ€ข Discord

LFM2-24B-A2B

LFM2 is a family of hybrid models designed for on-device deployment. LFM2-24B-A2B is the largest model in the family, scaling the architecture to 24 billion parameters while keeping inference efficient.

  • Best-in-class efficiency: A 24B MoE model with only 2B active parameters per token, fitting in 32 GB of RAM for deployment on consumer laptops and desktops.
  • Fast edge inference: 112 tok/s decode on AMD CPU, 293 tok/s on H100. Fits in 32B GB of RAM with day-one support llama.cpp, vLLM, and SGLang.
  • Predictable scaling: Quality improves log-linearly from 350M to 24B total parameters, confirming the LFM2 hybrid architecture scales reliably across nearly two orders of magnitude.

๐Ÿ‘ image

Find more information about LFM2-24B-A2B in our blog post.

๐Ÿ—’๏ธ Model Details

LFM2-24B-A2B is a general-purpose instruct model (without reasoning traces) with the following features:

Property LFM2-8B-A1B LFM2-24B-A2B
Total parameters 8.3B 24B
Active parameters 1.5B 2.3B
Layers 24 (18 conv + 6 attn) 40 (30 conv + 10 attn)
Context length 32,768 tokens 32,768 tokens
Vocabulary size 65,536 65,536
Training precision Mixed BF16/FP8 Mixed BF16/FP8
Training budget 12 trillion tokens 17 trillion tokens
License LFM Open License v1.0 LFM Open License v1.0

Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, Spanish, Portuguese

Generation parameters:

  • temperature: 0.1
  • top_k: 50
  • repetition_penalty: 1.05

We recommend the following use cases:

  • Agentic tool use: Native function calling, web search, structured outputs. Ideal as the fast inner-loop model in multi-step agent pipelines.
  • Offline document summarization and Q&A: Run entirely on consumer hardware for privacy-sensitive workflows (legal, medical, corporate).
  • Privacy-preserving customer support agent: Deployed on-premise at a company, handles multi-turn support conversations with tool access (database lookups, ticket creation) without data leaving the network.
  • Local RAG pipelines: Serve as the generation backbone in retrieval-augmented setups on a single machine without GPU servers.

We don't recommend using it for coding, as it wasn't optimized for this purpose.

Chat Template

LFM2-24B-A2B uses a ChatML-like format. See the Chat Template documentation for details. Example:

<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
What is C. elegans?<|im_end|>
<|im_start|>assistant

You can use tokenizer.apply_chat_template() to format your messages automatically.

Tool Use

LFM2-24B-A2B supports function calling as follows:

  1. Function definition: We recommend providing the list of tools as a JSON object in the system prompt. You can also use the tokenizer.apply_chat_template() function with tools.
  2. Function call: By default, LFM2-24B-A2B writes Pythonic function calls (a Python list between <|tool_call_start|> and <|tool_call_end|> special tokens), as the assistant answer. You can override this behavior by asking the model to output JSON function calls in the system prompt.
  3. Function execution: The function call is executed, and the result is returned as a "tool" role.
  4. Final answer: LFM2-24B-A2B interprets the outcome of the function call to address the original user prompt in plain text.

See the Tool Use documentation for the full guide. Example:

<|startoftext|><|im_start|>system
List of tools: [{"name": "get_candidate_status", "description": "Retrieves the current status of a candidate in the recruitment process", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "Unique identifier for the candidate"}}, "required": ["candidate_id"]}}]<|im_end|>
<|im_start|>user
What is the current status of candidate ID 12345?<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|>
<|im_start|>tool
[{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}]<|im_end|>
<|im_start|>assistant
The candidate with ID 12345 is currently in the "Interview Scheduled" stage for the position of Clinical Research Associate, with an interview date set for 2023-11-20.<|im_end|>

๐Ÿƒ Inference

LFM2-24B-A2B is supported by many inference frameworks. See the Inference documentation for the full list.

Name Description Docs Notebook
Transformers Simple inference with direct access to model internals. Link ๐Ÿ‘ Colab link
vLLM High-throughput production deployments with GPU. Link ๐Ÿ‘ Colab link
llama.cpp Cross-platform inference with CPU offloading. Link ๐Ÿ‘ Colab link
MLX Apple's machine learning framework optimized for Apple Silicon. Link โ€”
LM Studio Desktop application for running LLMs locally. Link โ€”

Here's a quick start example with Transformers (compatible with transformers>=5.0.0):

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_id = "LiquidAI/LFM2-24B-A2B"
model = AutoModelForCausalLM.from_pretrained(
 model_id,
 device_map="auto",
 dtype="bfloat16",
# attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "What is C. elegans?"

input_ids = tokenizer.apply_chat_template(
 [{"role": "user", "content": prompt}],
 add_generation_prompt=True,
 return_tensors="pt",
 tokenize=True,
).to(model.device)

output = model.generate(
 input_ids,
 do_sample=True,
 temperature=0.1,
 top_k=50,
 repetition_penalty=1.05,
 max_new_tokens=512,
 streamer=streamer,
)

๐Ÿ”ง Fine-Tuning

Name Description Docs Notebook
CPT (Unsloth) Continued Pre-Training using Unsloth for text completion. Link ๐Ÿ‘ Colab link
CPT (Unsloth) Continued Pre-Training using Unsloth for translation. Link ๐Ÿ‘ Colab link
SFT (Unsloth) Supervised Fine-Tuning with LoRA using Unsloth. Link ๐Ÿ‘ Colab link
SFT (TRL) Supervised Fine-Tuning with LoRA using TRL. Link ๐Ÿ‘ Colab link
DPO (TRL) Direct Preference Optimization with LoRA using TRL. Link ๐Ÿ‘ Colab link
GRPO (Unsloth) GRPO with LoRA using Unsloth. Link ๐Ÿ‘ Colab link
GRPO (TRL) GRPO with LoRA using TRL. Link ๐Ÿ‘ Colab link

๐Ÿ“Š Performance

CPU Inference

We compared LFM2-24B-A2B against two popular MoE models of similar size: Qwen3-30B-A3B-Instruct-2507 (30.5B total, 3.3B active parameters) and gpt-oss-20b (21B total, 3.6B active parameters). We measured both prefill and decode throughputs with Q4_K_M versions of these models using llama.cpp on AMD Ryzen AI Max+ 395.

๐Ÿ‘ image

๐Ÿ‘ image

GPU Inference

We also report throughput (total tokens / wall time) achieved with vLLM on a single H100 SXM5 GPU.

๐Ÿ‘ image

๐Ÿ“ฌ Contact

Citation

@article{liquidAI202624B,
 author = {Liquid AI},
 title = {LFM2.5-24B-A2B: Scaling Up the LFM2 Architecture},
 journal = {Liquid AI Blog},
 year = {2026},
 note = {www.liquid.ai/blog/},
}
@article{liquidai2025lfm2,
 title={LFM2 Technical Report},
 author={Liquid AI},
 journal={arXiv preprint arXiv:2511.23404},
 year={2025}
}
Downloads last month
21,587
Safetensors
Model size
24B params
Tensor type
F32
ยท
BF16
ยท

Model tree for LiquidAI/LFM2-24B-A2B

Adapters
3 models
Finetunes
12 models
Quantizations
23 models

Spaces using LiquidAI/LFM2-24B-A2B 4

Collection including LiquidAI/LFM2-24B-A2B

Paper for LiquidAI/LFM2-24B-A2B