In-Kernel Broadcast Optimization: Co-Designing Kernels for RecSys Inference Blog
TL;DR: Traditional RecSys inference explicitly replicates shared user embeddings/sequences for every candidate. In-Kernel Broadcast Optimization…