VOOZH about

URL: https://www.phoronix.com/news/Linux-6.17-AMDGPU-Hibernation

⇱ Linux 6.17 To Fix AMDGPU Hibernation So It Doesn't Take ~50 Minutes On Large GPU Servers - Phoronix


👁 Phoronix

Linux 6.17 To Fix AMDGPU Hibernation So It Doesn't Take ~50 Minutes On Large GPU Servers

Written by Michael Larabel in AMD on 17 July 2025 at 05:56 AM EDT. 21 Comments
While late in the Linux 6.16 cycle and hitting the cut-off for when the period to queue new DRM driver feature material for Linux 6.17 ends, an additional drm-misc-next pull request was sent out today with some last minute kernel graphics driver changes for this next kernel cycle. Motivating this extra pull were the recent AMDGPU system hibernation patches.

The headline change with today's drm-misc-next pull is incorporating the AMD patches to reduce system memory requirements for hibernation on large AI/GPU servers. The patches and issue were previously covered on Phoronix within AMD Instinct Accelerators With So Much vRAM Have Exposed Linux Hibernation Issues.

With the latest AMD Instinct accelerators able to see 192GB of device memory and having up to eight of them per server, all that device memory is causing issues with the AMDGPU driver during hibernation. In some cases it's causing issues for not enough free system memory when creating the hibernation image and when it does succeed it's taking a long time due to all the archiving and then restoring of the buffer objects.

Besides the possibility of hibernation failing if not enough system memory, when everything does otherwise go right it takes an awfully long time:
"For normal hibernation, GPU do not need to be resumed in thaw since it is not involved in writing the hibernation image. Skip resume in this case can reduce the hibernation time.

On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory, this can save 50 minutes."

Nearly one hour can be saved with these patches on a maxed out AMD Instinct accelerator server.

👁 AMD Instinct accelerators


Those patches to overhaul the AMDGPU hibernation handling are part of today's drm-misc-next pull request and what motivated this extra pull. Also contained in this pull request for Linux 6.17 are some memory leak fixes to different pieces of code, scheduler improvements for the Nouveau driver, Sitronix ST7567 support, BOE NE14QDM panel support, and other last minute changes.

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.