VOOZH about

URL: https://huggingface.co/ramgpt/jan-nano-4b-gptqmodel-4bit

⇱ ramgpt/jan-nano-4b-gptqmodel-4bit · Hugging Face


Jan-nano GPTQ 4bit (vLLM-ready)

This is a 4-bit GPTQ quantized version of Menlo/Jan-nano, optimized for fast inference with vLLM.

  • Quantization: GPTQ (4-bit)
  • Group size: 128
  • Dtype: float16
  • Backend: gptqmodel
  • Max context length: 4096 tokens

🔧 Usage with vLLM

vllm serve ./jan-nano-4b-gptqmodel-4bit \
 --quantization gptq \
 --dtype half \
 --max-model-len 4096

📁 Files

  • Sharded .safetensors model weights
  • model.safetensors.index.json
  • tokenizer.json, tokenizer_config.json
  • config.json, generation_config.json, quantize_config.json (if available)

🙏 Credits

  • Original model by Menlo
  • Quantized and shared by ramgpt
Downloads last month
1,713
Safetensors
Model size
4B params
Tensor type
I32
·
BF16
·

Model tree for ramgpt/jan-nano-4b-gptqmodel-4bit

Finetuned
Qwen/Qwen3-4B
Finetuned
Menlo/Jan-nano
Quantized
(23)
this model