VOOZH about

URL: https://www.phoronix.com/news/Linux-6.19-Timers

⇱ Linux 6.19 Fixes A Thundering Herd Problem For Big NUMA Servers - Phoronix


👁 Phoronix

Linux 6.19 Fixes A Thundering Herd Problem For Big NUMA Servers

Written by Michael Larabel in Linux Kernel on 4 December 2025 at 08:11 AM EST. 6 Comments
The "timers/core" pull requests for updating Linux kernel timer-related code doesn't tend to be too interesting each kernel cycle, but this time around for Linux 6.19 it is for addressing a problem HPE discovered on big NUMA servers.

Linux 6.19 fixes a timekeeper CPU issue that could lead to a large number of CPU cores getting stuck on very large NUMA servers. The pull request noted:
"Prevent a thundering herd problem when the timekeeper CPU is delayed and a large number of CPUs compete to acquire jiffies_lock to do the update. Limit it to one CPU with a separate "uncontended" atomic variable."

Steve Wahl of HPE authored the patch to fix this issue they spotted at the company. The HPE engineer further explained with the patch:
"On large NUMA systems, while running a test program that saturates the inter-processor and inter-NUMA links, acquiring the jiffies_lock can be very expensive. If the cpu designated to do jiffies updates (tick_do_timer_cpu) gets delayed and other cpus decide to do the jiffies update themselves, a large number of them decide to do so at the same time. The inexpensive check against tick_next_period is far quicker than actually acquiring the lock, so most of these get in line to obtain the lock. If obtaining the lock is slow enough, this spirals into the vast majority of CPUs continuously being stuck waiting for this lock, just to obtain it and find out that time has already been updated by another cpu. For example, on one random entry to kdb by manually-injected NMI, I saw 2912 of 3840 cpus stuck here.

To avoid this, allow only one non-timekeeper CPU to call tick_do_update_jiffies64() at any given time, resetting ts->stalled jiffies only if the jiffies update function is actually called.

With this change, manually interrupting the test I find at most two CPUs in the tick_do_update_jiffies64 function (the timekeeper and one other)."

This fix was merged this week for Linux 6.19.

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.