What's New

Welcome to the v1.12.0 release of Volcano! 🚀 🎉 📣
In this release, we have brought a bunch of significant enhancements that have long-awaited by community users.

Network Topology Aware Scheduling: Alpha Release

Volcano's network topology-aware scheduling, initially introduced as a preview in v1.11, has now reached its Alpha release in v1.12. This feature aims to optimize the deployment of AI tasks in large-scale training and inference scenarios, such as model parallel training and Leader-Worker inference. It achieves this by scheduling tasks within the same network topology performance domain, which reduces cross-switch communication and significantly enhances task efficiency. Volcano leverages the HyperNode CRD to abstract and represent heterogeneous hardware network topologies, supporting a hierarchical structure for simplified management.

Key features integrated in v1.12 include:

HyperNode Auto-Discovery: Volcano now offers automatic discovery of cluster network topologies. Users can configure the discovery type, and the system will automatically create and maintain hierarchical HyperNodes that reflect the actual cluster network topology. Currently, this supports InfiniBand (IB) networks by acquiring topology information via the UFM (Unified Fabric Manager) interface and automatically updating HyperNodes. Future plans include support for more network protocols like RoCE.
Prioritized HyperNode Selection:
This release introduces a scoring strategy based on both node-level and HyperNode-level evaluations, which are accumulated to determine the final HyperNode score.
- Node-level: It is recommended to configure the BinPack plugin to prioritize filling HyperNodes, thereby reducing resource fragmentation.
- HyperNode-level: Lower-level HyperNodes are preferred for better performance due to fewer cross-switch communications. For HyperNodes at the same level, those containing more tasks receive higher scores to reduce HyperNode-level resource fragmentation.
Support for Label Selector Node Matching:
HyperNode leaf nodes are associated with physical nodes in the cluster, supporting three matching strategies:
- Exact Match: Direct matching of node names.
- Regex Match: Matching node names using regular expressions.
- Label Match: Matching nodes via standard Label Selectors.

Related Documentation:

Related PRs: (#3874, #3894, #3969, #3971, #4068, #4213, #3897, #3887, @ecosysbin, @weapons97, @Xu-Wentao,@penggu @JesseStutler, @Monokaix)

Dynamic MIG Slicing for GPU Virtualization

Volcano's GPU virtualization feature now supports requesting partial GPU resources by memory and compute capacity. This, combined with Device Plugin integration, achieves hardware isolation and improves GPU utilization.

Traditional GPU virtualization restricts GPU usage by intercepting CUDA APIs (based on HAMI-Core software solutions). NVIDIA Ampere architecture introduced MIG (Multi-Instance GPU) technology, allowing a single physical GPU to be partitioned into multiple independent instances. However, general MIG solutions often pre-fix instance sizes, leading to resource waste and insufficient flexibility.

Volcano v1.12 provides dynamic MIG slicing and scheduling capabilities. It can select appropriate MIG instance sizes in real-time based on the user's requested GPU usage and employs a Best-Fit algorithm to minimize resource waste. It also supports GPU scoring strategies like BinPack and Spread to reduce resource fragmentation and enhance GPU utilization. Users can request resources using the unified volcano.sh/vgpu-number, volcano.sh/vgpu-cores, and volcano.sh/vgpu-memory APIs without needing to concern themselves with the underlying implementation.

Related Documentation:

Related PRs: (#4290, #3953, @sailorvii, @archlitchi)

Dynamic Resource Allocation (DRA) Support

Kubernetes DRA (Dynamic Resource Allocation) is a built-in Kubernetes feature designed to provide a more flexible and powerful way to manage heterogeneous hardware resources in a cluster, such as GPUs, FPGAs, and high-performance network cards. It addresses the limitations of traditional Device Plugins in certain advanced scenarios, enabling device vendors and platform administrators to better declare, allocate, and share these hardware resources with Pods and containers.

Volcano v1.12 adds support for DRA. This feature allows the cluster to dynamically allocate and manage external resources, enhancing Volcano's integration with the Kubernetes ecosystem and its resource management flexibility.

Related Documentation:
Unified Scheduling with DRA

Related PR: (#3799, @JesseStutler)

Volcano Global Supports Queue Capacity Management

Queues are a fundamental concept in Volcano. To enable tenant quota management in multi-cluster and multi-tenant environments, Volcano v1.12 introduces enhanced global queue capacity management. Users can now centrally limit tenant resource usage across multiple clusters. The configuration remains consistent with single-cluster setups: tenant quotas are defined by setting the capability field within the queue configuration.

Related PR: volcano-sh/volcano-global#16 (@tanberBro)

Security Enhancements

The Volcano community consistently focuses on security. In v1.12, beyond fine-grained control over sensitive permissions like ClusterRole, we've addressed and fixed the following potential security risks:

HTTP Server Timeout Settings: Metric and Healthz endpoints for all Volcano components have been configured with server-side ReadHeader, Read, and Write timeouts. This prevents prolonged resource occupation.
- PR: #4208
Warning Logs for Skipping SSL Certificate Verification: When client requests set insecureSkipVerify to true, a warning log is now added. We strongly advise enabling SSL certificate verification in production environments.
- PR: #4211
Volcano Scheduler pprof Endpoint Disabled by Default: To prevent the disclosure of sensitive program information, the Profiling data port (used for troubleshooting) is now disabled by default.
- PR: #4173
Removal of Unnecessary File Permissions: Unnecessary execution permissions have been removed from Go source files to maintain minimal file permissions.
- PR: #4171
Security Context and Non-Root Execution for Containers: All Volcano components now run with non-root privileges. We've added seccompProfile, SELinuxOptions, and set allowPrivilegeEscalation to false to prevent container privilege escalation. Additionally, only necessary Linux Capabilities are retained, comprehensively limiting container permissions.
- PR: #4207
HTTP Request Response Body Size Limit: For HTTP requests sent by the Extender Plugin and Elastic Search Service, their response body size is now limited. This prevents excessive resource consumption that could lead to OOM (Out Of Memory) issues.
- Disclosure: GHSA-hg79-fw4p-25p8

Performance Improvements in Large-Scale Scenarios

Volcano continuously optimizes performance. The new version, without affecting functionality, has by default removed and disabled some unnecessary Webhooks, improving performance in large-scale batch creation scenarios:

PodGroup Mutating Webhook Disabled by Default: When creating a PodGroup without specifying a queue, the system can now read from the Namespace to populate it. Since this scenario is uncommon, this Webhook is disabled by default. Users can enable it as needed.
Queue Status Validation Moved from Pod to PodGroup: When a queue is closed, task submission is disallowed. The original validation logic was performed during Pod creation. As Volcano's basic scheduling unit is PodGroup, migrating the validation to PodGroup creation is more logical. Since the number of PodGroups is less than Pods, this reduces Webhook calls, improving performance. For scenarios where a queue is closed after PodGroup creation, Volcano Scheduler will still check the queue status during Pod scheduling.

Related PRs: (#4128, #4132, @Monokaix)

Gang Scheduling Support for Multiple Workload Types

Gang scheduling is a core capability of Volcano. For Volcano Job and PodGroup objects, users can directly set minMember to define the minimum number of replicas required. For other workload types like Deployment, StatefulSet, and Job, minMember was previously defaulted to 1.

In the new version, users can specify the desired minimum number of replicas by setting the annotation scheduling.volcano.sh/group-min-member on the workload. For example, to set minMember for a Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: volcano-group-deployment
  annotations:
    # Set min member=10
    scheduling.volcano.sh/group-min-member: "10"

This setting means that when using Volcano for scheduling, either all 10 replicas are successfully scheduled, or none are, thereby enabling Gang scheduling for various workload types.

Related Documentation:

Multiple Workload Types Support with Gang

Related PR: (#4000, @sceneryback)

Job Flow Enhancements

Job Flow is a volcano's lightweight workflow orchestration framework for Volcano Jobs, received the following enhancements in v1.12:

New Monitoring Metrics: Added support for measuring the number of successful and failed Job Flows.
DAG Validation: Introduced functionality to validate the legality of Job Flow DAG (Directed Acyclic Graph) structures.
Status Synchronization Fix: Addressed issues with inaccurate Job Flow status synchronization.

Related PRs: (#4169, #4090, #4135, #4169, @dongjiang1989)

Finer-Grained Permission Control in Multi-Tenant Scenarios

Volcano natively supports multi-tenant environments and emphasizes permission control in such scenarios to achieve permission isolation for different users. In the new version, Volcano enhances permission control for Volcano Job by adding read-only and read-write ClusterRoles. Users can now assign different read and write permissions to various tenants as needed to achieve permission isolation.

Related PR: (#4174, @Hcryw)

Kubernetes 1.32 Support

Volcano versions closely track Kubernetes community releases. v1.12 supports the latest Kubernetes v1.32, with comprehensive UT and E2E test cases ensuring functionality and reliability.

To contribute to Volcano's adaptation work for new Kubernetes versions, please refer to: adapt-k8s-todo.

Related PR: (#4099, @guoqinwill, @danish9039)

Enhanced Queue Monitoring Metrics

Volcano queues now include several new key resource metrics. Support has been added for monitoring and visualizing CPU, Memory, and extended resource metrics such as request, allocated, deserved, capacity, and real_capacity, providing a detailed view of the queue's critical resource status.

Related PR: (#3937, @zedongh)

Fuzz Testing Support

Fuzz testing (or fuzzing) is an automated software testing technique that involves injecting large amounts of random, invalid, or abnormal input data into a target program and monitoring its behavior to discover potential defects.

Volcano introduces a fuzz testing framework in this new version, performing fuzz testing on key function units and continuously testing using Google's open-source OSS-Fuzz framework. This aims to proactively identify potential vulnerabilities and defects, enhancing Volcano's security and robustness.

Related PR: (#4205, @AdamKorcz)

Stability Enhancements

Multiple stability issues have been resolved in the new version, including:

Panic issues caused by unreasonable settings of queue capacity capability, deserved, and guaranteed.
Hierarchical queue validation failures.
Queue Update Concurrency Issues.
Meaningless PodGroup refresh issues.
StatefulSet replicas being 0 but still occupying queue resources.

(#4273, #4272, #4179, #4141, #4033, #4012, #3603, @halcyon-r, @guoqinwill, @JackyTYang, @JesseStutler, @zhutong196, @Wang-Kai, @HalfBuddhist)

Important Notes Before Upgrading

Before upgrading to Volcano v1.12, please note the following changes:

PodGroup Mutating Webhook Disabled by Default: In v1.12, the PodGroup Mutating Webhook is disabled by default. This means that when creating a PodGroup without specifying a queue, the system will attempt to read queue information from its associated Namespace for population. This scenario has low usage; if your specific workflows rely on this behavior, ensure to manually enable this Webhook after upgrading.
Queue Status Validation Migration and Behavior Change: The queue status validation logic for task submission has been migrated from the Pod creation phase to the PodGroup creation phase. This means that when a queue is closed, the system will block task submission at the time of PodGroup creation. However, if independent Pods (not submitted via PodGroup) continue to be submitted to a queue after it is closed, these Pods can be submitted successfully, but the Volcano Scheduler will not schedule them.
Volcano Scheduler pprof Endpoint Disabled by Default
For security enhancement, the pprof endpoint for the Volcano Scheduler is now disabled by default in this release. If you require this endpoint for debugging or monitoring, you will need to explicitly enable it post-upgrade. This can be achieved by:
- If you are using helm, specifying custom.scheduler_pprof_enable=true during Helm installation or upgrade.
- OR, manually setting the command-line argument --enable-pprof=true when starting the Volcano Scheduler.
Please be aware of the security implications before enabling this endpoint in production environments.

Overall Changes

Extend the default timeout for stale by @wangyang0616 in #2778
Dynamic-mig for volcano-vgpu design by @archlitchi in #3906
Add JesseStutler as reviewer by @JesseStutler in #4002
Fix typos by @co63oc in #4008
[DOCS] improve readme, visit to should be visit by @mahdikhashan in #4016
skip the jobs that have no tasks during the close session step in gang plugin by @Wang-Kai in #4012
add helm values scheduler_plugins_dir by @weapons97 in #3988
Fix: fix an issue where the wrong action name could not be ignored by @xieyanke in #3994
scheduler: correct mismatched error message by @SataQiu in #4013
[docs]: fix passive tone by @mahdikhashan in #4036
Fix typos by @co63oc in #4035
Update uninstall-volcano.sh by @mahmut-Abi in #3938
typo: change configure to configuration by @mahdikhashan in #4024
fix: the problem that the pending tasks cannot be scheduled during the backfill action by @hansongChina in #4029
update document for volcano-vgpu feature by @archlitchi in #4034
Improve overused messaging by @sfc-gh-raravena in #4053
Refactor: move DeviceName const into its own package by @SataQiu in #4045
change to action cache v4 by @Monokaix in #4059
scheduler: fix a bug where the job NodesFitErrors field is not updated when ssn.Allocate failed by @SataQiu in #4009
refactor: Optimized code by @feyounger in #4058
fix typo in comment by @mahdikhashan in #4020
Fix inaccurate statements in node-lock.md by @SataQiu in #4060
[bugfix]fix creating a hierarchical sub-queue will be rejected by @zhutong196 in #4033
[docs] improve the development.md document by @mahdikhashan in #4018
fix custom plugin doc and build script by @Monokaix in #4042
Using TypedRateLimitingInterface instead of deprecated RateLimitingInterface by @SataQiu in #4063
delete label tempalte by @hwdef in #4067
Optimized code by @feyounger in #4082
fix: Fix jobflow status confusion problem by @dongjiang1989 in #4090
take gpu-number into consideration by @linuxfhy in #3901
Add more info when e2e failed by @Monokaix in #4097
[fix] update feature flag for job support by @yuyue9284 in #4092
Optimize append operations for better performance by @feyounger in #4087
[Refactor] controller cache deletePod logic,skip create by @feyounger in #4088
CI: add dependabot by @dongjiang1989 in #4077
update readme by @Monokaix in #4113
Bump docker/setup-buildx-action from 2 to 3 by @dependabot in #4107
fix flaky test by @Monokaix in #4121
Remove pod mutate webhook by default by @Monokaix in #4120
Remove podgroup mutating webhook by default by @Monokaix in #4128
fix: add hierarchy queue validation and update for enqueueable by @JesseStutler in #4119
feat: Volcano adapts to the k8s v1.32 by @guoqinwill in #4099
Move queue state validate from pod to podgroup by @Monokaix in #4132
fix: remove hostpath mount in volcano-scheduler by @cnmcavoy in #4124
Enabled Cooldown Protection Plugin for reclaiming also by @qGentry in #4123
Bump actions/setup-go from 4 to 5 by @dependabot in #4126
chore: Change dependabot schedule interval to weekly by @dongjiang1989 in #4136
Bump golang.org/x/net from 0.26.0 to 0.36.0 by @dependabot in #4115
cleanup residual useless victims code in preempt action by @JesseStutler in #4138
Bump ossf/scorecard-action from 2.3.1 to 2.4.1 by @dependabot in #4108
Bump github/codeql-action from 1 to 3 by @dependabot in #4109
scheduler: consider the nominated node first when allocating Node for Pod by @SataQiu in #4079
Bump docker/login-action from 1 to 3 by @dependabot in #4111
Bump helm/chart-releaser-action from 1.5.0 to 1.7.0 by @dependabot in #4140
Bump actions/setup-java from 3 to 4 by @dependabot in #4110
fix: the problem that PVC will be continuously created indefinitely by @ytcisme in #4142
improve fail messages for pod scheduling in gang unschedulable scenario by @ouyangshengjia in #4117
Replace queue status update by using ApplyStatus method by @JesseStutler in #4141
feat: support scalar resources metric by @zedongh in #3937
fix: remove lessPartly condition in reclaimable fn from capacity and … by @baddoub in #4160
[Security] Remove the execute permission for some files, chmod to 644 by @JesseStutler in #4171
Bump golang.org/x/crypto from 0.35.0 to 0.37.0 by @dependabot in #4176
Fix typos by @co63oc in #4177
fix: panic when total guarantee of child queue exceeds capacity of parent by @JackyTYang in #4179
chore: rename VolcanoNamespace -> VolcanoSubSystemName in metrics by @googs1025 in #4180
fix: allocated in queue status should include allocated tasks, not only running tasks by @Poor12 in #4183
volcano-devices unified config by @archlitchi in #3953
feat/vcctl: add parent column to queue list cmd by @de6p in #4181
Add v1.11 compatibility matrix by @Monokaix in #4184
fix scorecards ci err by @Monokaix in #4185
Add topLevel permission for CI by @Monokaix in #4186
delete job vaild action in openSession by @fjq123123 in #4158
[Security] Add a switch to control whether enable pprof in scheduler by @JesseStutler in #4173
Update ubuntu base image by @Monokaix in #4194
fix(controller): add statefulset gc for podgroup. by @HalfBuddhist in #3603
[Refactor] Cover case checkControllers ut by @feyounger in #4199
Fix: remove controller-manager metrics that should not be introduced by @dongjiang1989 in #4201
Update readme by @Monokaix in #4192
fix docs v100 name by @hiwangzhihui in #4204
Add wangyang0616 as approver by @wangyang0616 in #4206
Fix bug for vgpu type check by @SataQiu in #4149
Bump actions/checkout from 3 to 4 by @dependabot in #4161
use replicas when initializing pg minResources by @sceneryback in #4000
adjust e2e log level by @Monokaix in #4212
[Security] Add http server timeout by @Monokaix in #4208
[Security] Add warning msg when TLS verification disabled by @Monokaix in #4211
Fix ci err caused by slow change of scheduling configMap by @Monokaix in #4223
Clear multiple generated hash values by @feyounger in #4215
[Security] Add security context configuration by @JesseStutler in #4207
render scripts using TAG env by @Monokaix in #4203
Revert github action bump by @Monokaix in #4251
Add fuzz tests for job controller by @AdamKorcz in #4205
Optimize multiple 'if' statements in the code by @feyounger in #4222
in capacity plugin attr.deserved no need MinDimensionResource with attr.request by @kingeasternsun in #3946
chore: change BuildPodWithPreeemptionPolicy -> BuildPodWithPreemptionPolicy by @googs1025 in #4264
fix: prevent the scheduling of pods in noopen queues by @googs1025 in #4263
fix: scheduler leader elect namespace not take effect by @Poor12 in #4282
Fix panic while queues' total guarantee exceed the total resource of the cluster in some situations. by @halcyon-r in #4273
fix: add miss queue state check in allocatable action by @googs1025 in #4274
fix controller panic for mpi job by @guoqinwill in #4272
Bump golang.org/x/net from 0.36.0 to 0.38.0 by @dependabot in #4196
WORLD_SIZE calculation for PyTorch Jobs by @murali1539 in #4281
Feature stale action by @mahdikhashan in #4266
[ci] debug security score workflow artifact upload failure by @mahdikhashan in #4300
feat: add graceful shutdown server for webhook manager. by @fengruotj in #4285
Prevent pod scheduling when reclaim by @Monokaix in #4307
cleanup: update jobflow example docs by @hwdef in #4305
Vcctl supports merging multiple kubeconfig to support context switching among multiple k8s clusters. by @halcyon-r in #4298
Refactor volume binding and add prebind implementation by @JesseStutler in #4152
Fix: Incorrect conversion between integer types by @dongjiang1989 in #4105
Add resync-period flag for k8s native informers by @sfc-gh-raravena in #4047
✨ feat: add clusterrole for editor & viewer by @Hcryw in #4174
Network Topology Aware Scheduling capability move to master by @Monokaix in #4310
fix go mod and informer resync by @Monokaix in #4319
Add vgpu dynamic mig by @sailorvii in #4290
Add citation by @mahdikhashan in #4302
feature: Add dynamic resource allocation(DRA) plugin by @JesseStutler in #3799
fix: Fix jobflow status from running to failed FSM by @dongjiang1989 in #4135
Fix admission webhook with labelSelector for hyperNode by @MondayCha in #4321
Add hyperNode controller framework and IB discovery by @Monokaix in #4322
Add NetworkTopology plugin score doc by @Monokaix in #4325
reconcile hypernode nodeCount status by @Monokaix in #4327
Refine the GPU mode process by @sailorvii in #4329
feat: Add jobflow metrics by @dongjiang1989 in #4169
feat: add jobflow flow dag validate by @dongjiang1989 in #4122
Bump image to v1.12.0 by @Monokaix in #4335

New Contributors

@co63oc made their first contribution in #4008
@mahdikhashan made their first contribution in #4016
@xieyanke made their first contribution in #3994
@mahmut-Abi made their first contribution in #3938
@hansongChina made their first contribution in #4029
@sfc-gh-raravena made their first contribution in #4053
@feyounger made their first contribution in #4058
@linuxfhy made their first contribution in #3901
@yuyue9284 made their first contribution in #4092
@dependabot made their first contribution in #4107
@cnmcavoy made their first contribution in #4124
@qGentry made their first contribution in #4123
@ouyangshengjia made their first contribution in #4117
@baddoub made their first contribution in #4160
@JackyTYang made their first contribution in #4179
@de6p made their first contribution in #4181
@fjq123123 made their first contribution in #4158
@HalfBuddhist made their first contribution in #3603
@hiwangzhihui made their first contribution in #4204
@AdamKorcz made their first contribution in #4205
@halcyon-r made their first contribution in #4273
@murali1539 made their first contribution in #4281
@Hcryw made their first contribution in #4174
@sailorvii made their first contribution in #4290
@MondayCha made their first contribution in #4321

Full Changelog: v1.11.2...v1.12.0

volcano-sh/volcano v1.12.0 on GitHub

What's New

Network Topology Aware Scheduling: Alpha Release

Dynamic MIG Slicing for GPU Virtualization

Dynamic Resource Allocation (DRA) Support

Volcano Global Supports Queue Capacity Management

Security Enhancements

Performance Improvements in Large-Scale Scenarios

Gang Scheduling Support for Multiple Workload Types

Job Flow Enhancements

Finer-Grained Permission Control in Multi-Tenant Scenarios

Kubernetes 1.32 Support

Enhanced Queue Monitoring Metrics

Fuzz Testing Support

Stability Enhancements

Important Notes Before Upgrading

Overall Changes

New Contributors

volcano-sh/volcano v1.12.0
on GitHub