GPT-2 ChatML GGUF (no_robots SFT)

This repository contains GGUF quantized models converted from the fine-tuned JustACluelessKid2/gpt2-chatml-fp32.

gpt2-f32.gguf (252.5 MB) - Baseline F16-Embedding GGUF
ggml-model-Q8_0.gguf (136.7 MB) - High-fidelity 8-bit quantization
ggml-model-IQ4_NL.gguf (84.8 MB) - Highly-optimized 4-bit non-linear quantization
ggml-model-IQ4_XS.gguf (82.2 MB) - Imatrix optimized 4-bit quantization
ggml-model-Q6_K.gguf (106.7 MB) - High-quality 6-bit quantization
ggml-model-Q5_K_M.gguf (98.8 MB) - High-quality 5-bit quantization
ggml-model-IQ3_XXS.gguf (64.8 MB) - Imatrix 3-bit quantization (Chromebook-compatible)
ggml-model-IQ2_M.gguf (62.5 MB) - Imatrix optimized 2-bit quantization
ggml-model-IQ2_XXS.gguf (55.5 MB) - Ultra-low 2-bit quantization

These models were calibrated using an importance matrix computed on 1,000 shuffled conversational sequences.

GGUF

Model size

0.1B params

Architecture

gpt2

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

32-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JustACluelessKid2/gpt2-chatml-fp32-GGUF

Base model

Finetuned

Quantized

(1)

this model