VOOZH about

URL: https://thenewstack.io/kueue-can-now-schedule-kubernetes-batch-jobs-across-clusters/

⇱ Kueue Can Now Schedule Kubernetes Batch Jobs Across Clusters - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-11-21 07:36:55
Kueue Can Now Schedule Kubernetes Batch Jobs Across Clusters
Kubernetes

Kueue Can Now Schedule Kubernetes Batch Jobs Across Clusters

The "MultiKueue" beta multicluster job dispatching feature allows admins to place workloads on remote clusters.
Nov 21st, 2024 7:36am by Joab Jackson
👁 Featued image for: Kueue Can Now Schedule Kubernetes Batch Jobs Across Clusters

A batch scheduler from the K8s Kubernetes Batch Working Group now has the ability to schedule workloads on external clusters, promising to simplify operations management and potentially expand the range of available computational resources, certainly a much-desired feature for orgs with computationally-heavy AI workloads.

The new beta capability, called MultiKueue, was deftly demonstrated in a KubeCon+CloudNativeCon North America keynote last week by Ricardo Rocha, platform engineering lab engineer at CERN, no stranger to large computational workloads.

Such software could go a long way in helping “manage a very complex infrastructure, with multiple clusters across multiple administrative domains,” he said.

What Is Kueue?

An open source project under the Apache 2 license, Kueue is a Kubernetes resource quota manager, providing a workload queue for Kubernetes clusters, which can be both elastic and heterogeneous.

It decides when pods should be created to start a job and when the job should stop and its pods deleted. It can also pre-empt jobs. The set of APIs provides the language to set quotas and policies for fair sharing among tenants.

Different types of computational resources, such as GPUs or spot instance-based virtual machines, are described as “ResourceFlavors” or objects that can then be used to fit the workload of the resources and are also captured as objects.

Kueue can be installed atop any vanilla Kubernetes cluster. It builds on existing K8s technologies for autoscaling, pod-to-node scheduling and job lifecycle management.

Kubernetes Scheduling with MultiKueue

On its own, Kubernetes will schedule multiple jobs in the queue in a random order. It will also schedule partial workloads, which can be problematic given the type of workload that needs to be executed.

Kueue executes all-or-nothing scheduling. Workloads are queued and are run in their entirety only when there are sufficient resources.

Other all-or-nothing scheduling tools include Apache YuniKorn and Volcano.

But Kueue is also advantageous in that it supports multiple queues for different teams. Each research team can get its own dedicated portion of the cluster with its own namespace, and Kueue provides the ability to temporarily share each team’s portion if it is not being used.

Such queueing can be extremely valuable given the size of AI processing jobs and the relative scarcity of GPUs to run them, noted Marcin Wielgus, software engineer at Google, also in the keynote presentation.

With MultiKueue, Kueue can manage clusters not only on-premises but also from external cloud providers and other High-Performance Computing (HPC) centers.

A job can be submitted to a control cluster, which searches for a home in one of a number of available clusters, placing the job when sufficient capacity is found.

If a job requires GPUs, then that limit is designated in the workload description, so Kueue will know to place that job only on nodes with sufficient GPUs.

Clusters Near and Far

Currently, MultiKueue is a beta feature turned on by default in v. 9 of Kueue.

One organization taking a serious look at incorporating MultiQueue has been CERN.

The European Nuclear Research Agency, CERN (which stands for “Conseil Européen pour la Recherche Nucléaire”) is currently designing its next particle accelerator. Currently, the research facility generates 100PBs of data a year, but with new particle accelerators incoming, this number could grow by a factor of 10 or more.

Rocha is part of an engineering team that is looking at building a system to schedule jobs against multiple resources, be they in-house, public cloud providers, or via CERN’s Worldwide LHC Computing Grid, a network of HPC supercomputers around the globe.

Such a system would be built for batch jobs using parameter optimization and work with existing schedulers, such as Slurm and KubeFlow, centralized through a Kueue entry point.

Rocha demonstrated how this project would work with MultiKueue.  Within a dashboard, Rocha showed a number of active clusters, one in-house and one located in Germany.

All the jobs for these clusters are queued up and appear in the master cluster. One job Rocha tee’d up was too large for the local cluster, Kueue automatically started it on the remote cluster, which had the computational resources available.

“The idea is to submit jobs and not care where they run,” Rocha said.

Enjoy the entire keynote talk here:

TRENDING STORIES
Joab Jackson is a senior editor for The New Stack, covering cloud native computing and system operations. He has reported on IT infrastructure and development for over 30 years, including stints at IDG and Government Computer News. Before that, he...
Read more from Joab Jackson
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.