Voozh

3 min read

👁 sysoft profile

byeongsoo kang

Jun 11

Gemma 4 QAT on a 1080 Ti: What 'Quantization-Aware' Actually Buys — and Fitting the 12B on 8 GB at 16k

#llm #machinelearning #gemma #quantization

Add Comment

5 min read

👁 tech_nuggets profile

Tech_Nuggets

Jun 11

Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4

#llm #quantization #mlops #tutorial

Add Comment

7 min read

👁 soytuber profile

soy

Jun 10

INT8 Q/DQ Calibration on Blackwell: 1.8 the TRT 10 + FP16 Baseline

#tensorrt #quantization #gpu #machinelearning

Add Comment

7 min read

👁 pat9000 profile

Patrick Hughes

May 13

GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

#llamacpp #gguf #quantization #localai

Add Comment

4 min read

👁 vystartasv profile

Vilius

May 9

1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4

#ai #llm #local #quantization

👁 Image
2 reactions

1 comment

2 min read

👁 alanwest profile

Alan West

May 27

Why your quantized LLM loses its MTP heads and how to keep them

#machinelearning #llm #python #quantization

👁 Image
1 reaction

Add Comment

5 min read

👁 aman_sachan_126d19c4a2773 profile

Aman Sachan

Apr 30

KVQuant: Run 70B LLMs on 8GB RAM with KV Cache Quantization

#python #llm #quantization

Add Comment

1 min read

👁 aman_sachan_126d19c4a2773 profile

Aman Sachan

Apr 30

KVQuant: Run 70B LLMs on 8GB RAM with 4-bit KV Cache Quantization

#python #llm #quantization #optimization

Add Comment

1 min read

👁 alanwest profile

Alan West

Apr 18

Traditional Quantization vs 1.58-Bit Ternary Models: A Practical Comparison

#machinelearning #llm #quantization #ai

1 comment

5 min read

👁 mxguru1 profile

MxGuru

May 20

The Best Result This Week Was a Failed Prediction — Phase-3a Doesn't Transfer

#quantization #hsaq #methodology #granite

Add Comment

1 min read

👁 mxguru1 profile

MxGuru

May 20

Two Localizers, Both Wrong: Bounding a Quantization Cost That Wouldn't Close

#quantization #hsaq #methodology #granite

Add Comment

1 min read

👁 mxguru1 profile

MxGuru

May 20

When the Sensitivity Metric Lies: A Drift-Inversion Smoking Gun in Mixed-Precision LLM Quantization

#quantization #hsaq #awq #granite

Add Comment

8 min read

👁 denlava profile

Denis Lavrentyev

Apr 13

GIMP's Posterization: Simple Quantization vs. Median Cut for Better Visuals

#gimp #posterization #quantization #mediancut

Add Comment

8 min read

👁 plasmon_imp profile

plasmon

Apr 8

Q4 KV Cache Fit 32K Context into 8GB VRAM — Only Math Broke

#llm #quantization #vram #localllm

Add Comment

8 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

URL: https://dev.to/t/quantization

⇱ Quantization - DEV Community

How to Pick a GGUF Quant Level for Your VRAM Budget

Gemma 4 QAT on a 1080 Ti: What 'Quantization-Aware' Actually Buys — and Fitting the 12B on 8 GB at 16k

Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4

INT8 Q/DQ Calibration on Blackwell: 1.8 the TRT 10 + FP16 Baseline

GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4

Why your quantized LLM loses its MTP heads and how to keep them

KVQuant: Run 70B LLMs on 8GB RAM with KV Cache Quantization

KVQuant: Run 70B LLMs on 8GB RAM with 4-bit KV Cache Quantization

Traditional Quantization vs 1.58-Bit Ternary Models: A Practical Comparison

The Best Result This Week Was a Failed Prediction — Phase-3a Doesn't Transfer

Two Localizers, Both Wrong: Bounding a Quantization Cost That Wouldn't Close

When the Sensitivity Metric Lies: A Drift-Inversion Smoking Gun in Mixed-Precision LLM Quantization

GIMP's Posterization: Simple Quantization vs. Median Cut for Better Visuals

Q4 KV Cache Fit 32K Context into 8GB VRAM — Only Math Broke