Glibc Lands A Big Optimization For LoongArch CPUs
Loongson's LoongArch processors are running decent in our recent Loongson 3B6000 benchmarks but even better performance is on the way with the next GNU C Library "glibc" release.
Merged yesterday to Glibc Git is a LoongArch-specific change to enable transparent hugepages (THP) aligned load segments by default for LoongArch64. Aligning ELF load segments to THP boundaries is providing a consistent performance win for large binaries by reducing transparent lookaside buffer (TLB) pressure and improving instruction fetch efficiency.
Benchmarks for compiling Rust's Cargo on a Loongson 3A6000 show instruction TLB misses dropping by 72%, reduction in CPU cycles by about 4.7%, and around 4.2% wall time savings. Or compiling the Linux kernel with LLM yielded a wall time reduction of about 12%. It's quite a big performance win from this patch to THP-aligned load segments by default for LoongArch.
That patch is part of a series that also introduced the glibc.elf.thp tunable for THP-aware segment alignment and the new alignment code.
Will be fun to benchmark these LoongArch improvements soon to see how much better the 3B6000 is looking across a range of workloads.
Merged yesterday to Glibc Git is a LoongArch-specific change to enable transparent hugepages (THP) aligned load segments by default for LoongArch64. Aligning ELF load segments to THP boundaries is providing a consistent performance win for large binaries by reducing transparent lookaside buffer (TLB) pressure and improving instruction fetch efficiency.
Benchmarks for compiling Rust's Cargo on a Loongson 3A6000 show instruction TLB misses dropping by 72%, reduction in CPU cycles by about 4.7%, and around 4.2% wall time savings. Or compiling the Linux kernel with LLM yielded a wall time reduction of about 12%. It's quite a big performance win from this patch to THP-aligned load segments by default for LoongArch.
That patch is part of a series that also introduced the glibc.elf.thp tunable for THP-aware segment alignment and the new alignment code.
Will be fun to benchmark these LoongArch improvements soon to see how much better the 3B6000 is looking across a range of workloads.
