Optimizing GLM 5.1 came down to three things:
-> Rewrote the indexer topk kernel
-> Fused the indexer kernel to reduce memory and launch overhead
-> Eliminated CPU overhead that was gating prefill throughput
The bigger win was in the indexer. Once we fixed that, the rest made it even flable on Together AI.
