github volcano-sh/volcano v1.12.0

latest releases: v1.12.2, v1.12.1
3 months ago

What's New

Welcome to the v1.12.0 release of Volcano! 🚀 🎉 📣
In this release, we have brought a bunch of significant enhancements that have long-awaited by community users.

Network Topology Aware Scheduling: Alpha Release

Volcano's network topology-aware scheduling, initially introduced as a preview in v1.11, has now reached its Alpha release in v1.12. This feature aims to optimize the deployment of AI tasks in large-scale training and inference scenarios, such as model parallel training and Leader-Worker inference. It achieves this by scheduling tasks within the same network topology performance domain, which reduces cross-switch communication and significantly enhances task efficiency. Volcano leverages the HyperNode CRD to abstract and represent heterogeneous hardware network topologies, supporting a hierarchical structure for simplified management.

Key features integrated in v1.12 include:

  • HyperNode Auto-Discovery: Volcano now offers automatic discovery of cluster network topologies. Users can configure the discovery type, and the system will automatically create and maintain hierarchical HyperNodes that reflect the actual cluster network topology. Currently, this supports InfiniBand (IB) networks by acquiring topology information via the UFM (Unified Fabric Manager) interface and automatically updating HyperNodes. Future plans include support for more network protocols like RoCE.

  • Prioritized HyperNode Selection:

    This release introduces a scoring strategy based on both node-level and HyperNode-level evaluations, which are accumulated to determine the final HyperNode score.

    • Node-level: It is recommended to configure the BinPack plugin to prioritize filling HyperNodes, thereby reducing resource fragmentation.
    • HyperNode-level: Lower-level HyperNodes are preferred for better performance due to fewer cross-switch communications. For HyperNodes at the same level, those containing more tasks receive higher scores to reduce HyperNode-level resource fragmentation.
  • Support for Label Selector Node Matching:

    HyperNode leaf nodes are associated with physical nodes in the cluster, supporting three matching strategies:

    • Exact Match: Direct matching of node names.
    • Regex Match: Matching node names using regular expressions.
    • Label Match: Matching nodes via standard Label Selectors.

Related Documentation:

Related PRs: (#3874, #3894, #3969, #3971, #4068, #4213, #3897, #3887, @ecosysbin, @weapons97, @Xu-Wentao,@penggu @JesseStutler, @Monokaix)

Dynamic MIG Slicing for GPU Virtualization

Volcano's GPU virtualization feature now supports requesting partial GPU resources by memory and compute capacity. This, combined with Device Plugin integration, achieves hardware isolation and improves GPU utilization.

Traditional GPU virtualization restricts GPU usage by intercepting CUDA APIs (based on HAMI-Core software solutions). NVIDIA Ampere architecture introduced MIG (Multi-Instance GPU) technology, allowing a single physical GPU to be partitioned into multiple independent instances. However, general MIG solutions often pre-fix instance sizes, leading to resource waste and insufficient flexibility.

Volcano v1.12 provides dynamic MIG slicing and scheduling capabilities. It can select appropriate MIG instance sizes in real-time based on the user's requested GPU usage and employs a Best-Fit algorithm to minimize resource waste. It also supports GPU scoring strategies like BinPack and Spread to reduce resource fragmentation and enhance GPU utilization. Users can request resources using the unified volcano.sh/vgpu-number, volcano.sh/vgpu-cores, and volcano.sh/vgpu-memory APIs without needing to concern themselves with the underlying implementation.

Related Documentation:

Related PRs: (#4290, #3953, @sailorvii, @archlitchi)

Dynamic Resource Allocation (DRA) Support

Kubernetes DRA (Dynamic Resource Allocation) is a built-in Kubernetes feature designed to provide a more flexible and powerful way to manage heterogeneous hardware resources in a cluster, such as GPUs, FPGAs, and high-performance network cards. It addresses the limitations of traditional Device Plugins in certain advanced scenarios, enabling device vendors and platform administrators to better declare, allocate, and share these hardware resources with Pods and containers.

Volcano v1.12 adds support for DRA. This feature allows the cluster to dynamically allocate and manage external resources, enhancing Volcano's integration with the Kubernetes ecosystem and its resource management flexibility.

Related Documentation:
Unified Scheduling with DRA

Related PR: (#3799, @JesseStutler)

Volcano Global Supports Queue Capacity Management

Queues are a fundamental concept in Volcano. To enable tenant quota management in multi-cluster and multi-tenant environments, Volcano v1.12 introduces enhanced global queue capacity management. Users can now centrally limit tenant resource usage across multiple clusters. The configuration remains consistent with single-cluster setups: tenant quotas are defined by setting the capability field within the queue configuration.

Related PR: volcano-sh/volcano-global#16 (@tanberBro)

Security Enhancements

The Volcano community consistently focuses on security. In v1.12, beyond fine-grained control over sensitive permissions like ClusterRole, we've addressed and fixed the following potential security risks:

  • HTTP Server Timeout Settings: Metric and Healthz endpoints for all Volcano components have been configured with server-side ReadHeader, Read, and Write timeouts. This prevents prolonged resource occupation.
  • Warning Logs for Skipping SSL Certificate Verification: When client requests set insecureSkipVerify to true, a warning log is now added. We strongly advise enabling SSL certificate verification in production environments.
  • Volcano Scheduler pprof Endpoint Disabled by Default: To prevent the disclosure of sensitive program information, the Profiling data port (used for troubleshooting) is now disabled by default.
  • Removal of Unnecessary File Permissions: Unnecessary execution permissions have been removed from Go source files to maintain minimal file permissions.
  • Security Context and Non-Root Execution for Containers: All Volcano components now run with non-root privileges. We've added seccompProfile, SELinuxOptions, and set allowPrivilegeEscalation to false to prevent container privilege escalation. Additionally, only necessary Linux Capabilities are retained, comprehensively limiting container permissions.
  • HTTP Request Response Body Size Limit: For HTTP requests sent by the Extender Plugin and Elastic Search Service, their response body size is now limited. This prevents excessive resource consumption that could lead to OOM (Out Of Memory) issues.

Performance Improvements in Large-Scale Scenarios

Volcano continuously optimizes performance. The new version, without affecting functionality, has by default removed and disabled some unnecessary Webhooks, improving performance in large-scale batch creation scenarios:

  • PodGroup Mutating Webhook Disabled by Default: When creating a PodGroup without specifying a queue, the system can now read from the Namespace to populate it. Since this scenario is uncommon, this Webhook is disabled by default. Users can enable it as needed.
  • Queue Status Validation Moved from Pod to PodGroup: When a queue is closed, task submission is disallowed. The original validation logic was performed during Pod creation. As Volcano's basic scheduling unit is PodGroup, migrating the validation to PodGroup creation is more logical. Since the number of PodGroups is less than Pods, this reduces Webhook calls, improving performance. For scenarios where a queue is closed after PodGroup creation, Volcano Scheduler will still check the queue status during Pod scheduling.

Related PRs: (#4128, #4132, @Monokaix)

Gang Scheduling Support for Multiple Workload Types

Gang scheduling is a core capability of Volcano. For Volcano Job and PodGroup objects, users can directly set minMember to define the minimum number of replicas required. For other workload types like Deployment, StatefulSet, and Job, minMember was previously defaulted to 1.

In the new version, users can specify the desired minimum number of replicas by setting the annotation scheduling.volcano.sh/group-min-member on the workload. For example, to set minMember for a Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: volcano-group-deployment
  annotations:
    # Set min member=10
    scheduling.volcano.sh/group-min-member: "10"

This setting means that when using Volcano for scheduling, either all 10 replicas are successfully scheduled, or none are, thereby enabling Gang scheduling for various workload types.

Related Documentation:

Multiple Workload Types Support with Gang

Related PR: (#4000, @sceneryback)

Job Flow Enhancements

Job Flow is a volcano's lightweight workflow orchestration framework for Volcano Jobs, received the following enhancements in v1.12:

  • New Monitoring Metrics: Added support for measuring the number of successful and failed Job Flows.
  • DAG Validation: Introduced functionality to validate the legality of Job Flow DAG (Directed Acyclic Graph) structures.
  • Status Synchronization Fix: Addressed issues with inaccurate Job Flow status synchronization.

Related PRs: (#4169, #4090, #4135, #4169, @dongjiang1989)

Finer-Grained Permission Control in Multi-Tenant Scenarios

Volcano natively supports multi-tenant environments and emphasizes permission control in such scenarios to achieve permission isolation for different users. In the new version, Volcano enhances permission control for Volcano Job by adding read-only and read-write ClusterRoles. Users can now assign different read and write permissions to various tenants as needed to achieve permission isolation.

Related PR: (#4174, @Hcryw)

Kubernetes 1.32 Support

Volcano versions closely track Kubernetes community releases. v1.12 supports the latest Kubernetes v1.32, with comprehensive UT and E2E test cases ensuring functionality and reliability.

To contribute to Volcano's adaptation work for new Kubernetes versions, please refer to: adapt-k8s-todo.

Related PR: (#4099, @guoqinwill, @danish9039)

Enhanced Queue Monitoring Metrics

Volcano queues now include several new key resource metrics. Support has been added for monitoring and visualizing CPU, Memory, and extended resource metrics such as request, allocated, deserved, capacity, and real_capacity, providing a detailed view of the queue's critical resource status.

Related PR: (#3937, @zedongh)

Fuzz Testing Support

Fuzz testing (or fuzzing) is an automated software testing technique that involves injecting large amounts of random, invalid, or abnormal input data into a target program and monitoring its behavior to discover potential defects.

Volcano introduces a fuzz testing framework in this new version, performing fuzz testing on key function units and continuously testing using Google's open-source OSS-Fuzz framework. This aims to proactively identify potential vulnerabilities and defects, enhancing Volcano's security and robustness.

Related PR: (#4205, @AdamKorcz)

Stability Enhancements

Multiple stability issues have been resolved in the new version, including:

  • Panic issues caused by unreasonable settings of queue capacity capability, deserved, and guaranteed.
  • Hierarchical queue validation failures.
  • Queue Update Concurrency Issues.
  • Meaningless PodGroup refresh issues.
  • StatefulSet replicas being 0 but still occupying queue resources.

(#4273, #4272, #4179, #4141, #4033, #4012, #3603, @halcyon-r, @guoqinwill, @JackyTYang, @JesseStutler, @zhutong196, @Wang-Kai, @HalfBuddhist)

Important Notes Before Upgrading

Before upgrading to Volcano v1.12, please note the following changes:

  • PodGroup Mutating Webhook Disabled by Default: In v1.12, the PodGroup Mutating Webhook is disabled by default. This means that when creating a PodGroup without specifying a queue, the system will attempt to read queue information from its associated Namespace for population. This scenario has low usage; if your specific workflows rely on this behavior, ensure to manually enable this Webhook after upgrading.

  • Queue Status Validation Migration and Behavior Change: The queue status validation logic for task submission has been migrated from the Pod creation phase to the PodGroup creation phase. This means that when a queue is closed, the system will block task submission at the time of PodGroup creation. However, if independent Pods (not submitted via PodGroup) continue to be submitted to a queue after it is closed, these Pods can be submitted successfully, but the Volcano Scheduler will not schedule them.

  • Volcano Scheduler pprof Endpoint Disabled by Default
    For security enhancement, the pprof endpoint for the Volcano Scheduler is now disabled by default in this release. If you require this endpoint for debugging or monitoring, you will need to explicitly enable it post-upgrade. This can be achieved by:

    • If you are using helm, specifying custom.scheduler_pprof_enable=true during Helm installation or upgrade.
    • OR, manually setting the command-line argument --enable-pprof=true when starting the Volcano Scheduler.

    Please be aware of the security implications before enabling this endpoint in production environments.

Overall Changes

New Contributors

Full Changelog: v1.11.2...v1.12.0

Don't miss a new volcano release

NewReleases is sending notifications on new releases.