VOOZH about

URL: https://x.com/togethercompute/status/2066672024165274064

โ‡ฑ Together AI on X: "Optimizing GLM 5.1 came down to three things: -> Rewrote the indexer topk kernel -> Fused the indexer kernel to reduce memory and launch overhead -> Eliminated CPU overhead that was gating prefill throughput The bigger win was in the indexer. Once we fixed that, the rest made https://t.co/a9LH1aJ8pi" / X


Post

Post

Don't miss what's happening
People on X are the first to know.