Voozh

Is it possible to make less than 1 bit quantization?

by RealBar - opened about 18 hours ago

I'm look for if there is any possible methods to make large frontier models 10x, 20x smaller size, maybe some weights fusion techs?

👁 Image

Gattouz0

about 17 hours ago

just don't install it atp

👁 Image

csabakecskemeti

about 16 hours ago

@RealBar you can prune the model

👁 Image

dugrema

about 9 hours ago

I have tried some older GLM models froms Cerebras REAP (https://huggingface.co/collections/cerebras/cerebras-reap). They were pruned by about 20% and then being quantized (e.g. by Unsloth). But that is still not another 10-20x on top of quantization. REAPed models work ok, but at that point you're probably just chasing shadows.

There are plenty of good enough smaller models out there if you don't have a few spare millions of $ in the bank to whip-up terabytes of VRAM.

👁 Image

danielhanchen

Unsloth AI org about 8 hours ago

That'll be hard - 1-bit is currently 86% smaller and retains around 76.2% accuracy

👁 Image

nesymerp1

about 7 hours ago

•

edited about 7 hours ago

xD, man how much I want to see IQ0_XXXXXS but no, less then 1 bit quantization isn't possible with our today's compute. The tiniest unit in compute is a 1 or a 0 so... xD

unless if I have been lied to

👁 Image

doruison

about 7 hours ago

You are crazy.

👁 Image

ilintar

about 6 hours ago

Yes, it is possible to do below-1-bit quantization, but it's not trivial to do. Basically, you have to pack individual values in tensors into groups and then quantize the groups - so you basically quantize something like a [0.5, 0.3, 1.2, 0.9] quadruple into say [-2]. As long as the bit-budget for the aggregate is smaller than the number of aggregates, you get a below-1-bit quant.

· Sign up or log in to comment

URL: https://huggingface.co/unsloth/GLM-5.2-GGUF/discussions/1

⇱ unsloth/GLM-5.2-GGUF · Is it possible to make less than 1 bit quantization?

Is it possible to make less than 1 bit quantization?