VOOZH about

URL: https://thenewstack.io/how-meta-patches-linux-at-hyperscale/

⇱ How Meta Patches Linux at Hyperscale - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-12-01 03:00:56
How Meta Patches Linux at Hyperscale
Linux / Operations

How Meta Patches Linux at Hyperscale

Patching Linux is easy. Except when you need to patch tens of thousands of servers without downtime. Here's how Meta does it.
Dec 1st, 2023 3:00am by Steven J. Vaughan-Nichols
👁 Featued image for: How Meta Patches Linux at Hyperscale
Feature image by Casey Allen on Unsplash.    

RICHMOND, Va. — Anyone with a tech clue can patch a Linux server. But, patching thousands of them without any downtime, that’s not easy.

At the Linux Plumbers Conference, the invite-only conference of top Linux kernel developers earlier this month, Meta Linux kernel engineer Breno Leitao explained how Facebook pulls the trick off with its millions of servers around the world.

If you were to use ordinary techniques, Leitao said it would take more than 45 days to roll out a new kernel to all machines. As he put it, “Draining and un-draining hosts is hard.” You can say that again.

That may be fine if it’s a minor update, but if it’s a security patch, that won’t work.

So, Meta uses Kernel Live Patching (KLP) with Red Hat‘s Kpatch, to deliver fast patches. In KLP, you can apply the latest security updates to Linux kernels without rebooting. This maximizes system uptime and availability.

Live Kernel Patches

Kernel live patches are delivered as packages with modified code that are separate from the main kernel package. The live patches are cumulative, so the latest patch contains all fixes from the previous ones for the kernel package. Each kernel live package is tied to the exact kernel revision for which it is issued.

Live patches won’t work on everything, though. You can’t patch data or structure. Another problem is that extra engineering work is usually required to make a live patch. As Leitao warned, “It’s not just as simple as compiling the live patch, and knowing it’ll be safe and applying it. These are kernel modules, you can break things if you’re not careful. There are no guarantees provided that the patch itself is correct.”

Kpatch works by comparing the original and patched kernels and then uses a customized kernel module to patch the new code into the running kernel. The Kpatch process then watches the stack of existing processes using ftrace to see if a patch can be made without any harmful effects.

When it’s safe, it redirects the running code to the patched functions and then removes the now outdated code. And, there you are, your server’s patched, and there’s been no downtime.

Of course, it’s not that simple in practice. Leitao explained, “At Meta, when we apply a live patch, it usually takes one to two seconds to apply the patch to the host. That’s to a single host, obviously not to like the whole fleet of servers, but one to two seconds for a host is really, really fast compared to even kexec,” the Linux kernel mechanism for booting a new kernel. It doesn’t require any downtime or workload migration, you just apply the live patch, and off you go.”

How to Patch Millions of Machines

But, when you’re talking about millions of machines, that’s not the entire story. Meta will find bugs during their patch rollouts, so the administrators start by patching a release candidate tier. So, as the package roller delivers the RPM-based patches, the servers’ health is automatically checked as well.

Meta looks for crashes, major alarms, and application problems and performances in the new kernels. This data is pulled up from a variety of sources, including crashes, netconsole results, and core dumps. If the error rate goes over one crash per thousand servers, the patch is pulled, and the old kernel is restored.

With over a billion users, Facebook also keeps a close eye on performance. As Leitao said, “The live patch performance overhead is small, but there is always a concern when a relatively hot function is patched.”

While Meta uses Kpatch, there are alternatives. SUSE offers kGraft; while Oracle uses Ksplice; and Canonical supports Livepatch. Regardless of the code, they all deliver similar results.

So, if you’d rather not have downtime with your servers, data centers, and clouds, follow Meta’s example and use live patching. You’ll be glad you did.

TRENDING STORIES
Steven J. Vaughan-Nichols, aka sjvn, has been writing about technology and the business of technology since CP/M-80 was the cutting-edge PC operating system, 300bps was a fast internet connection, WordStar was the state-of-the-art word processor, and we liked it.
Read more from Steven J. Vaughan-Nichols
SHARE THIS STORY
TRENDING STORIES
Red Hat and Oracle are sponsors of The New Stack.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.