VOOZH about

URL: https://www.phoronix.com/news/Linux-6.17-khugepaged-Optimize

⇱ Linux 6.17 Optimizes khugepaged For ARM64 With Huge "16x" Impact For One Code Path - Phoronix


👁 Phoronix

Linux 6.17 Optimizes khugepaged For ARM64 With Huge "16x" Impact For One Code Path

Written by Michael Larabel in Arm on 6 August 2025 at 06:20 AM EDT. Add A Comment
Andrew Morton this week sent in some additional memory management "MM" changes for the Linux 6.17 to complement last week's many MM patches from new optimizations to more DAMON features. Most notable with this secondary set of patches are khugepaged optimizations that especially help ARM64 Linux systems.

Khugepaged is part of the Transparent Hugepage support in the Linux kernel and is seeing some exciting optimization work in Linux 6.17 for AArch64 hardware. The optimizations improve khugepaged throughput via batching PTE operations for large folios.

Dev Jain of Arm explained on the patch series for the work:
"If the underlying folio mapped by the ptes is large, we can process those ptes in a batch using folio_pte_batch().

For arm64 specifically, this results in a 16x reduction in the number of ptep_get() calls, since on a contig block, ptep_get() on arm64 will iterate through all 16 entries to collect a/d bits. Next, ptep_clear() will cause a TLBI for every contig block in the range via contpte_try_unfold(). Instead, use clear_ptes() to only do the TLBI at the first and last contig block of the range.

For split folios, there will be no pte batching; the batch size returned by folio_pte_batch() will be 1. For pagetable split folios, the ptes will still point to the same large folio; for arm64, this results in the optimization described above, and for other arches, a minor improvement is expected due to a reduction in the number of function calls and batching atomic operations."

The ptep_get() call seeing a 16x reduction is a helper function for safely accessing page table entries (PTEs). ARM64 systems are also seeing a reduction in the number of TLB flushes happening as a result of these khugepaged optimizations.

👁 Ampere ARM64 server


The additional MM pull request for Linux 6.17 additionally includes enabling EXECMEM_ROX_CACHE support for ftrace and kprobes. The merged code also brings performance improvements for the mTHP swap-in code.

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.