VOOZH about

URL: https://www.phoronix.com/news/Glibc-New-Generic-FMA

⇱ GNU C Library Sees Up To 12.9x Improvement With New Generic FMA Implementation - Phoronix


👁 Phoronix

GNU C Library Sees Up To 12.9x Improvement With New Generic FMA Implementation

Written by Michael Larabel in GNU on 27 November 2025 at 06:30 AM EST. 12 Comments
Just a few days ago I wrote about the Glibc math code seeing a 4x improvement on AMD Zen by changing the used FMA implementation. Merged overnight was a new generic FMA implementation for the GNU C Library and now yielding up to a 12.9x throughput improvement on AMD Zen 3.

Adhemerval Zanella contributed this new generic FMA implementation to the GNU C Library. Zanella explained in the patch landing this new generic Fused Multiply Add (FMA) implementation:
"The current implementation relies on setting the rounding mode for different calculations (first to FE_TONEAREST and then to FE_TOWARDZERO) to obtain correctly rounded results. For most CPUs, this adds a significant performance overhead since it requires executing a typically slow instruction (to get/set the floating-point status), it necessitates flushing the pipeline, and breaks some compiler assumptions/optimizations.

This patch introduces a new implementation originally written by Szabolcs for musl, which utilizes mostly integer arithmetic. Floating-point arithmetic is used to raise the expected exceptions, without the need for fenv.h operations.

I added some changes compared to the original code:

* Fixed some signaling NaN issues when the 3-argument is NaN.

* Use math_uint128.h for the 64-bit multiplication operation. It allows the compiler to use 128-bit types where available, which enables some optimizations on certain targets (for instance, MIPS64).

* Fixed an arm32 issue where the libgcc routine might not respect the rounding mode. This can also be used on other targets to optimize the conversion from int64_t to double.

* Use -fexcess-precision=standard on i686."

This new musl libc based implementation is showing some "large improvements" with tests carried out by Adhemerval Zanella:

👁 New FMA implementation benchmarks


In another commit, Adhemerval Zanella summed up the recent math improvements made for Glibc 2.43 as:
"* Additional optimized and correctly rounded mathematical functions have been imported from the CORE-MATH project, in particular acosh, asinh, atanh, erf, erfc, lgamma, and tgamma.

* Optimized implementations for remainder, remaindef, frexpf, frexp, frexpl (binary128), and frexpl (intel96) have been added.

* The SVID handling for acosf, acoshf, asinhf, atan2f, atanhf, coshf, lgammaf/lgammaf_r, log10f, sinhf, sqrtf, tgammaf, y0/j0, y1/j1, and yn/jn were moved to compat symbols, allowing improvements in performance."

Look for these improvements and more with Glibc 2.43 due for release in February.

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.