Menlo/Jan-nano - GGUF

This repository contains GGUF quantizations of Menlo/Jan-nano.

About GGUF

GGUF is a quantization method that allows you to run large language models on consumer hardware by reducing the precision of the model weights.

Files

Filename	Quant type	File Size	Description
model-f16.gguf	f16	Large	Original precision
model-q4_0.gguf	Q4_0	Small	4-bit quantization
model-q4_1.gguf	Q4_1	Small	4-bit quantization (higher quality)
model-q5_0.gguf	Q5_0	Medium	5-bit quantization
model-q5_1.gguf	Q5_1	Medium	5-bit quantization (higher quality)
model-q8_0.gguf	Q8_0	Large	8-bit quantization

Usage

You can use these models with llama.cpp or any other GGUF-compatible inference engine.

llama.cpp

./llama-cli -m model-q4_0.gguf -p "Your prompt here"

Python (using llama-cpp-python)

from llama_cpp import Llama

llm = Llama(model_path="model-q4_0.gguf")
output = llm("Your prompt here", max_tokens=512)
print(output['choices'][0]['text'])

Original Model

This is a quantized version of Menlo/Jan-nano. Please refer to the original model card for more information about the model's capabilities, training data, and usage guidelines.

Conversion Details

Converted using llama.cpp
Original model downloaded from Hugging Face
Multiple quantization levels provided for different use cases

License

This model inherits the license from the original model. Please check the original model's license for usage terms.

Downloads last month: 20

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ReallyFloppyPenguin/Jan-nano-GGUF

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

Menlo/Jan-nano

Quantized

(23)

this model

Collection including ReallyFloppyPenguin/Jan-nano-GGUF

20 items • Updated Mar 2 • 1

URL: https://huggingface.co/ReallyFloppyPenguin/Jan-nano-GGUF

⇱ ReallyFloppyPenguin/Jan-nano-GGUF · Hugging Face