KeyLM-75M-Instruct-GGUF

GGUF builds of KeyLM-75M-Instruct for llama.cpp, LM Studio, Ollama, and other GGUF runtimes.

KeyLM is a 75M-parameter instruction-tuned language model trained from scratch on approximately 18 billion tokens. See the main model card for benchmarks, training details, limitations, and the transformers (safetensors) version.

Files

File	Quant	Size	Notes
`KeyLM-75M-Instruct.F16.gguf`	F16	~144 MB	Full precision and recommended. The model is already tiny, so there is little reason to quantize further.

Run with llama.cpp

# straight from the Hub
llama-cli -hf Eclipse-Senpai/KeyLM-75M-Instruct-GGUF -cnv

# or a local file
llama-cli -m KeyLM-75M-Instruct.F16.gguf -cnv

The chat template (User: / Assistant:, assistant turns ending with </s>) is embedded in the GGUF, so conversation mode (-cnv) applies it automatically.

LM Studio / Ollama

LM Studio: load the .gguf; the embedded chat template is detected automatically.
Ollama: ollama run hf.co/Eclipse-Senpai/KeyLM-75M-Instruct-GGUF

Notes & limitations

KeyLM is a tiny model: good at simple instruction following and short chat, near random chance on knowledge/reasoning benchmarks. It is not a factual assistant. Full numbers and caveats are on the main model card.

License

Apache 2.0.

Downloads last month: 240

GGUF

Model size

75.3M params

Architecture

qwen3

Hardware compatibility

16-bit

Model tree for MinimaLabs/KeyLM-75M-Instruct-GGUF

Base model

MinimaLabs/KeyLM-75M

Finetuned

MinimaLabs/KeyLM-75M-Instruct

Quantized

(2)

this model

URL: https://huggingface.co/MinimaLabs/KeyLM-75M-Instruct-GGUF