VOOZH about

URL: https://www.phoronix.com/news/Linux-6.18-rc5-POWER-Regression

⇱ Linux 6.18-rc5 To Cut Down Performance Regression Observed On IBM POWER CPUs - Phoronix


👁 Phoronix

Linux 6.18-rc5 To Cut Down Performance Regression Observed On IBM POWER CPUs

Written by Michael Larabel in Linux Kernel on 8 November 2025 at 02:47 PM EST. 3 Comments
Merged today ahead of the Linux 6.18-rc5 kernel due out on Sunday is a partial fix for a performance regression observed on IBM POWER hardware.

Since the "IMMUTABLE" flag was dropped from the kernel's FUTEX code for the Linux 6.17 cycle, IBM engineers have noted a performance regression primarily affecting their hardware. Now for Linux 6.18-rc5 that performance regression is at least cut in half.

👁 POWER9 server


Intel engineer Peter Zijlstra worked out the partial fix/workaround by optimizing the per-CPU reference counting in the Futex code. Zijlstra explained with the now-merged patch:
"Shrikanth noted that the per-cpu reference counter was still some 10% slower than the old immutable option (which removes the reference counting entirely).

Further optimize the per-cpu reference counter by:

- switching from RCU to preempt;
- using __this_cpu_*() since we now have preempt disabled;
- switching from smp_load_acquire() to READ_ONCE().

This is all safe because disabling preemption inhibits the RCU grace period exactly like rcu_read_lock().

Having preemption disabled allows using __this_cpu_*() provided the only access to the variable is in task context -- which is the case here.

Furthermore, since we know changing fph->state to FR_ATOMIC demands a full RCU grace period we can rely on the implied smp_mb() from that to replace the acquire barrier().

This is very similar to the percpu_down_read_internal() fast-path.

The reason this is significant for PowerPC is that it uses the generic this_cpu_*() implementation which relies on local_irq_disable() (the x86 implementation relies on it being a single memop instruction to be IRQ-safe). Switching to preempt_disable() and __this_cpu*() avoids this IRQ state swizzling. Also, PowerPC needs LWSYNC for the ACQUIRE barrier, not having to use explicit barriers safes a bunch.

Combined this reduces the performance gap by half, down to some 5%."

This improvement was merged to the Linux 6.18 Git code today as the sole change of this week's locking/urgent pull request.

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.