20 items • Updated • 1
Menlo/Jan-nano - GGUF
This repository contains GGUF quantizations of Menlo/Jan-nano.
About GGUF
GGUF is a quantization method that allows you to run large language models on consumer hardware by reducing the precision of the model weights.
Files
| Filename | Quant type | File Size | Description |
|---|---|---|---|
| model-f16.gguf | f16 | Large | Original precision |
| model-q4_0.gguf | Q4_0 | Small | 4-bit quantization |
| model-q4_1.gguf | Q4_1 | Small | 4-bit quantization (higher quality) |
| model-q5_0.gguf | Q5_0 | Medium | 5-bit quantization |
| model-q5_1.gguf | Q5_1 | Medium | 5-bit quantization (higher quality) |
| model-q8_0.gguf | Q8_0 | Large | 8-bit quantization |
Usage
You can use these models with llama.cpp or any other GGUF-compatible inference engine.
llama.cpp
./llama-cli -m model-q4_0.gguf -p "Your prompt here"
Python (using llama-cpp-python)
from llama_cpp import Llama
llm = Llama(model_path="model-q4_0.gguf")
output = llm("Your prompt here", max_tokens=512)
print(output['choices'][0]['text'])
Original Model
This is a quantized version of Menlo/Jan-nano. Please refer to the original model card for more information about the model's capabilities, training data, and usage guidelines.
Conversion Details
- Converted using llama.cpp
- Original model downloaded from Hugging Face
- Multiple quantization levels provided for different use cases
License
This model inherits the license from the original model. Please check the original model's license for usage terms.
- Downloads last month
- 20
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware
4-bit
5-bit
8-bit
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
