VOOZH about

URL: https://volcano.sh/blog/

⇱ Blog | Volcano


Skip to main content

As batch training, inference, AI Agent, HPC, big-data and other diverse workloads are increasingly co-located in the same Kubernetes cluster, the scheduler must make higher-quality decisions under intensifying resource contention while preserving job-level semantics, queue fairness, topology affinity, and operational stability. v1.15.0 delivers enhancements across the scheduling core, heterogeneous resource management, multi-scheduler coordination, and performance observability.

The most notable new capability is Gang-Aware Preemption and Resource Reclamation: preemption decisions are evaluated at gang granularity on both the preemptor and victim sides — the preemptor is placed as a whole gang, and victim candidates are organized and evaluated at job/gang granularity, preferring surplus replicas to avoid per-Pod random eviction that disrupts multiple training jobs while the preemptor itself still cannot start. In addition, v1.15.0 introduces DRA queue quota in the capacity plugin, a pluggable multi-sharding policy framework, a Benchmark and performance observability tool, Kubernetes 1.35 support, NodeGroup preferred ordering, Agent Scheduler stability fixes, GPU/vGPU incremental enhancements, and Scheduling Gates for queue admission control.

Volcano community v1.14 is now officially released. As AI workloads evolve from single offline training to diverse scenarios including online inference and AI Agents, the scheduling system faces unprecedented challenges. v1.14 delivers architecture-level innovations that maintain Volcano's advantages in large-scale batch computing while closing the gap for latency-sensitive workloads, taking a solid step toward the goal of becoming a "unified scheduling platform for AI training, inference, RL, and Agent scenarios."

Today, the Volcano community is proud to announce the launch of Kthena, a new sub-project designed for global developers and MLOps engineers.

Kthena is a cloud-native, high-performance system for LLM inference routing, orchestration, and scheduling, tailored specifically for Kubernetes. Engineered to address the complexity of serving LLMs at production scale, Kthena delivers granular control and enhanced flexibility. Through features like topology-aware scheduling, KV Cache-aware routing, and Prefill-Decode (PD) disaggregation, it significantly improves GPU/NPU utilization and throughput while minimizing latency.

As a sub-project of Volcano, Kthena extends Volcano’s capabilities beyond AI training, creating a unified, end-to-end solution for the entire AI lifecycle.

👁 Image

[HONG KONG, CHINA — June 10, 2025] — The Cloud Native Computing Foundation (CNCF) today announced that iFlytek has won the CNCF End-User Case Study Competition. The CNCF, which is committed to building a sustainable ecosystem for cloud native software, recognized iFlytek for its innovative use of Volcano. The company shared its success in large-scale AI model training at the KubeCon + CloudNativeCon China conference, held in Hong Kong from June 10-11.

Volcano v1.12 released: Advancing Cloud-Native AI and Batch Computing

As AI large model technology rapidly evolves, enterprises are placing higher demands on computing resource efficiency and application performance. For complex application scenarios such as AI, big data, and high-performance computing (HPC), efficiently utilizing accelerators like GPUs, ensuring high system availability, and managing resources with fine granularity are the core areas of focus for the Volcano community's continuous innovation.

Volcano is excited to announce the completion of our CNCF-funded security audit carried out by Ada Logics and facilitated by OSTIF in collaboration with the Volcano maintainers. The audit was scoped to cover the Volcano source code, supply-chain risks and fuzzing. The auditing team identified 10 security issues which the Volcano security team has fixed with the completion of the audit.

The Growing Demand for LLM Workloads and Associated Challenges

The increasing adoption of large language models (LLMs) has led to heightened demand for efficient AI training and inference workloads. As model size and complexity grow, distributed training and inference have become essential. However, this expansion introduces challenges in network communication, resource allocation, and fault recovery within large-scale distributed environments. These issues often create performance bottlenecks that hinder scalability.

As the de facto standard in cloud-native batch computing, Volcano has been widely adopted across various scenarios, including AI, Big Data, and High-Performance Computing (HPC). With over 800 contributors from more than 30 countries and tens of thousands of code commits, Volcano has been deployed in production environments by over 60 enterprises worldwide. It provides the industry with excellent practical standards and solutions for cloud native batch computing.

On Sep 19, 2024, UTC+8, Volcano version v1.10.0 was officially released. This version introduced the following new features: