VOOZH about

URL: https://www.phoronix.com/news/Linux-7.1-WQ

⇱ WQ_AFFN_CACHE_SHARD Merged For Linux 7.1: Significant Win For CPUs With Many Cores Per LLC - Phoronix


👁 Phoronix

WQ_AFFN_CACHE_SHARD Merged For Linux 7.1: Significant Win For CPUs With Many Cores Per LLC

Written by Michael Larabel in Linux Kernel on 15 April 2026 at 03:55 PM EDT. 3 Comments
The workqueue changes merged today for the Linux 7.1 kernel are significant for today's modern high-end processors where there can be many CPU cores per last level cache (LLC / L3 cache). The new WQ_AFFN_CACHE_SHARD affinity scope can reduce some contention on such systems and help achieve greater performance.

Linux engineer Breno Leitao with Meta worked on the set of patches for introducing the WQ_AFFN_CACHE_SHARD affinity scope to address observed bottlenecks where there are many CPU cores sharing the same L3 cache that can lead to heavy spinlock contention. The default unbound workqueue with WQ_AFFN_CACHE where there is just one pool for the entire system can lead to contention and hurt I/O performance.

While this issue is most observable on today's high-end Intel / AMD / Arm high core count processors, even for a 12-core system with a single shared L3 cache, Oracle engineer Check Lever had found when using NFS-over-RDMA with 12 FIO jobs that around 39% of the CPU cycles were spent in a spin lock slow-path largely from the default workqueue behavior.

👁 Lots of CPUs with many cores and LLCs


WQ_AFFN_CACHE_SHARD as this new intermediate affinity level showed nice throughput gains for Intel Xeon and NVIDIA Grace CPUs. Even on a 16-core Xeon D server there was an observed improvement up to 5.9% in FIO with random reads from NVMe storage. Or as noted with this merged patch:
"Benchmark on NVIDIA Grace (72 CPUs, single LLC, 50k items/thread), show cache_shard delivers ~5x the throughput and ~6.5x lower p50 latency compared to cache scope on this 72-core single-LLC system."

With the WQ_AFFN_CACHE_SHARD affinity scope, it subdivides each LLC into groups of at most wq_cache_shard_size CPUs with wq_cache_shard_size defaulting to eight but can be configured at boot time. This new cache affinity is the default one with its introduction in Linux 7.1.

WQ_AFFN_CACHE_SHARD is the main highlight of the workqueue changes submitted for Linux 7.1 that were merged today as the latest enticing optimization of this next kernel version.

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.