VOOZH about

URL: https://www.phoronix.com/news/Linux-Open-Tree-Namespace

⇱ OPEN_TREE_NAMESPACE To Provide A Security & Performance Win For Dealing With Containers - Phoronix


👁 Phoronix

OPEN_TREE_NAMESPACE To Provide A Security & Performance Win For Dealing With Containers

Written by Michael Larabel in Linux Kernel on 19 January 2026 at 02:44 PM EST. Add A Comment
A new feature expected to be merged for the upcoming Linux 7.0 kernel cycle is adding an OPEN_TREE_NAMESPACE flag for the open_tree() system call. This OPEN_TREE_NAMESPACE option can provide a nice performance win with added security benefits if you are dealing a lot with containerized workloads on Linux.

Microsoft engineer Christian Brauner developed the OPEN_TREE_NAMESPACE functionality for open_tree() to make launching containers less wasteful around copying mounts that are ultimately unnecessary and to then be immediately destroyed. Brauner elaborated in the late December patch series:
"When creating containers the setup usually involves using CLONE_NEWNS via clone3() or unshare(). This copies the caller's complete mount namespace. The runtime will also assemble a new rootfs and then use pivot_root() to switch the old mount tree with the new rootfs. Afterward it will recursively umount the old mount tree thereby getting rid of all mounts.

On a basic system here where the mount table isn't particularly large this still copies about 30 mounts. Copying all of these mounts only to get rid of them later is pretty wasteful.

This is exacerbated if intermediary mount namespaces are used that only exist for a very short amount of time and are immediately destroyed again causing a ton of mounts to be copied and destroyed needlessly.

With a large mount table and a system where thousands or ten-thousands of namespaces are spawned in parallel this quickly becomes a bottleneck increasing contention on the semaphore.

Extend open_tree() with a new OPEN_TREE_NAMESPACE flag. Similar to OPEN_TREE_CLONE only the indicated mount tree is copied. Instead of returning a file descriptor referring to that mount tree OPEN_TREE_NAMESPACE will cause open_tree() to return a file descriptor to a new mount namespace. In that new mount namespace the copied mount tree has been mounted on top of a copy of the real rootfs.

The caller can setns() into that mount namespace and perform any additionally setup such as move_mount()ing detached mounts in there.

This allows OPEN_TREE_NAMESPACE to function as a combined unshare(CLONE_NEWNS) and pivot_root()."

In testing out the new functionality, it was found to be around 40% faster:
"With the older pivot_root() based method, I can create about 73k "containers" in 60s. With the newer open_tree() method, I can create about 109k in the same time. So it seems like the new method is roughly 40% faster than the older scheme (and a lot less syscalls too)."

Beyond OPEN_TREE_NAMESPACE being less wasteful and better efficiency, there are also expected security benefits too for blocking attacks if the container root manages to get unmounted in trying to access the underlying mounts.

👁 OPEN_TREE_NAMESPACE


The OPEN_TREE_NAMESPACE patches as of a few days ago have been queued into vfs/vfs.git's vfs-7.0.namespace Git branch. With the code now there, it will presumably be sent in for the upcoming Linux 6.20~7.0 kernel merge window barring any last minute issues.

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.