KeyLM-75M-Instruct-GGUF
GGUF builds of KeyLM-75M-Instruct for llama.cpp, LM Studio, Ollama, and other GGUF runtimes.
KeyLM is a 75M-parameter instruction-tuned language model trained from scratch on approximately 18 billion tokens. See the main model card for benchmarks, training details, limitations, and the transformers (safetensors) version.
Files
| File | Quant | Size | Notes |
|---|---|---|---|
KeyLM-75M-Instruct.F16.gguf |
F16 | ~144 MB | Full precision and recommended. The model is already tiny, so there is little reason to quantize further. |
Run with llama.cpp
# straight from the Hub
llama-cli -hf Eclipse-Senpai/KeyLM-75M-Instruct-GGUF -cnv
# or a local file
llama-cli -m KeyLM-75M-Instruct.F16.gguf -cnv
The chat template (User: / Assistant:, assistant turns ending with </s>) is embedded in the GGUF, so conversation mode (-cnv) applies it automatically.
LM Studio / Ollama
- LM Studio: load the
.gguf; the embedded chat template is detected automatically. - Ollama:
ollama run hf.co/Eclipse-Senpai/KeyLM-75M-Instruct-GGUF
Notes & limitations
KeyLM is a tiny model: good at simple instruction following and short chat, near random chance on knowledge/reasoning benchmarks. It is not a factual assistant. Full numbers and caveats are on the main model card.
License
Apache 2.0.
- Downloads last month
- 240
GGUF
Model size
75.3M params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware
16-bit
