Voozh

Part 2 of the quantization series. Spent two days and $12 hunting for the right localizer after Part 1 showed the per-layer drift metric lies. Both candidates — token-level logit-divergence at wrong tokens, AWQ-clipping on the surfaced layers — came back empty. Honest finding: an 8B model on a 12GB card costs ~12.7% PPL on wikitext-2, the gap is diffuse and proportional, no clever subset-targeted fix closes it. One process habit (a no-op control reproducing the baseline to 4 decimals) caught a silent bug that would have shipped a wrong 'AWQ-clipping wins' claim.

URL: https://dev.to/mxguru1/two-localizers-both-wrong-bounding-a-quantization-cost-that-wouldnt-close-4op

⇱ Two Localizers, Both Wrong: Bounding a Quantization Cost That Wouldn't Close - DEV Community