What's New
Support Queue Priority Scheduling Strategy
In traditional big data processing scenarios, users can directly set queue priorities to control the scheduling order of jobs. To ease the migration from Hadoop/Yarn to cloud-native platforms, Volcano supports setting priorities at the queue level, reducing migration costs for big data users while enhancing user experience and resource utilization efficiency.
Queues are a fundamental resource in Volcano, each with its own priority. By default, a queue's priority is determined by its share
value, which is calculated by dividing the resources allocated to the queue by its total capacity. This is done automatically, with no manual configuration needed. The smaller the share
value, the fewer resources the queue has, making it less saturated and more likely to receive resources first. Thus, queues with smaller share
values have higher priority, ensuring fairness in resource allocation.
In production environments—especially in big data scenarios—users often prefer to manually set queue priorities to have a clearer understanding of the order in which queues are scheduled. Since the share
value is dynamic and changes in real-time as resources are allocated, Volcano introduces a priority
field to allow users to set queue priorities more intuitively. The higher the priority
, the higher the queue's standing. High-priority queues receive resources first, while low-priority queues have their jobs reclaimed earlier when resources need to be recycled.
To ensure compatibility with the share
mechanism, Volcano also considers the share value when calculating queue priorities. By default, if a user has not set a specific queue priority or if priorities are equal, Volcano will fall back to comparing share values. In this case, the queue with the smaller share has higher priority. Users have the flexibility to choose between different priority strategies based on their specific needs—either by using the priority or the share method.
Queue priority design doc: Queue priority
Related PRs: (#132, #3700, @TaiPark)
Enable Fine-Grained GPU Resource Sharing and Reclaim
Volcano introduced the elastic queue capacity scheduling feature in version v1.9, allowing users to directly set the capacity for each resource dimension within a queue. This feature also supports elastic scheduling based on the deserved
value, enabling more fine-grained resource sharing and recycling across queues.
For detailed design information on elastic queue capacity scheduling, refer to the Capacity Scheduling Design Document.
For a step-by-step guide on using the capacity plugin, see the Capacity Plugin User Guide.
In version v1.10, Volcano extends its support to include reporting different types of GPU resources within elastic queue capacities. NVIDIA's default Device Plugin
does not distinguish between GPU models, instead reporting all resources uniformly as nvidia.com/gpu
. This limits AI training and inference tasks from selecting specific GPU models, such as A100 or T4, based on their particular needs. To address this, Volcano now supports reporting distinct GPU models at the Device Plugin
level, working with the capacity
plugin to enable more precise GPU resource sharing and recycling.
For instructions on using the Device Plugin
to report various GPU models, refer to the GPU Resource Naming Guide.
Note:
In version v1.10.0, the capacity
plugin is the default for queue management. Note that the capacity
and proportion
plugins are incompatible, so after upgrading to v1.10.0, you must set the deserved
field for queues to ensure proper functionality. For detailed instructions, refer to the Capacity Plugin User Guide.
The capacity
plugin allocates cluster resources based on the deserved
value set by the user, while the proportion
plugin dynamically allocates resources according to queue weight. Users can select either the capacity
or proportion
plugin for queue management based on their specific needs. For more details on the proportion plugin, visit: Proportion Plugin.
Related PR: (#68, @MondayCha)
Introduce Pod Scheduling Readiness Support
Once a Pod is created, it is considered ready for scheduling. In Kube-scheduler, it will try its best to find a suitable node to place all pending Pods. However, in reality, some Pods may be in a "lack of necessary resources" state for a long time. These Pods actually interfere with the decision-making and operation of the scheduler (and downstream components such as Cluster AutoScaler) in an unnecessary way, causing problems such as resource waste. Pod Scheduling Readiness is a new feature of Kube-sheduler. In Kubernetes v.1.30 GA, it has become a stable feature. It controls the scheduling timing of Pods by setting the schedulingGates field of the Pod.
Volcano supports unified scheduling of online and offline jobs. In order to better support the scheduling of microservices, we also support Pod Scheduling Readiness scheduling in Volcano v1.10 to meet the scheduling needs of users in multiple scenarios.
For the documentation of Pod Scheduling Readiness features, please refer to: Pod Scheduling Readiness | Kubernetes
Related PR: (#3581, @ykcai-daniel)
Add Sidecar Container Scheduling Capabilities
A Sidecar container is an auxiliary container designed to support the main business container by handling tasks such as logging, monitoring, and network initialization.
Prior to Kubernetes v1.28, the concept of Sidecar containers existed only informally, with no dedicated API to distinguish them from business containers. Both types of containers were treated equally, which meant that Sidecar containers could be started after the business container and might end before it. Ideally, Sidecar containers should start before and finish after the business container to ensure complete collection of logs and monitoring data.
Kubernetes v1.28 introduces formal support for Sidecar containers at the API level, implementing unified lifecycle management for init containers, Sidecar containers, and business containers. This update also adjusts how resource requests and limits are calculated for Pods, and the feature will enter Beta status in v1.29.
The development of this feature involved extensive discussions, mainly focusing on maintaining compatibility with existing APIs and minimizing disruptive changes. Rather than introducing a new container type, Kubernetes reuses the init container type and designates Sidecar containers by setting the init container’s restartPolicy to Always. This approach addresses both API compatibility and lifecycle management issues effectively.
With this update, the scheduling of Pods now considers the Sidecar container’s resource requests as part of the business container’s total requests. Consequently, the Volcano scheduler has been updated to support this new calculation method, allowing users to schedule Sidecar containers with Volcano.
For more information on Sidecar containers, visit Sidecar Containers | Kubernetes.
Related PR: (#3706, @Monokaix, @7h3-3mp7y-m4n)
Enhance Vcctl Command Line Tool
vcctl is a command line tool for operating Volcano's built-in CRD resources. It can be conveniently used to view/delete/pause/resume vcjob resources, and supports viewing/deleting/opening/closing/updating queue resources. Volcano has enhanced vcctl in the new version, adding the following features:
-
Support creating/deleting/viewing/describing
jobflow
andjobtemplate
resources -
Support querying vcjob in a specified queue
-
Support querying Pods by queue and vcjob filtering
For detailed guidance documents on vcctl, please refer to: vcctl
Command Line Enhancement.
Relared PRs: (#3584, #3543, #3530, #3524, #3508, @googs1025)
Ensure Compatibility with Kubernetes v1.30
Volcano closely follows the pace of Kubernetes community versions and supports every major version of Kubernetes. The latest supported version is v1.30, and runs complete UT and E2E use cases to ensure functionality and reliability.
If you want to participate in the development of Volcano adapting to the new version of Kubernetes, please refer to: adapt-k8s-todo for community contributions.
Related PR: (#3556, @guoqinwill, @wangyysde)
Strengthen Volcano Security Measures
Volcano has always attached great importance to the security of the open source software supply chain. It follows the specifications defined by OpenSSF in terms of license compliance, security vulnerability disclosure and repair, warehouse branch protection, CI inspection, etc. Volcano recently added a new workflow to Github Action, which will run OpenSSF security checks when the code is merged, and update the software security score in real time to continuously improve software security.
At the same time, Volcano has reduced the RBAC permissions of each component, retaining only the necessary permissions, avoiding potential risks of unauthorized access and improving the security of the system.
Related PRs: (#3655, #3545, #3504, @harshitasao, @Monokaix)
Optimize Volcano for Large-Scale Performance
In large-scale scenarios, Volcano has done a lot of performance optimization work, mainly including:
- Optimize vcjob update strategy, reduce vcjob update and synchronization frequency, reduce API Server pressure, and improve QPS of submitted tasks
- Add controller gate switch to vc controller, users can choose to close unnecessary controllers, reduce memory usage and CPU load
- All controllers use shared informer to reduce memory usage
Related PR: (#3514, #3486, #3541,#3497, #3493, #3598, @wangyysde, @googs1025, @babugeet, @y-ykcir, @lekaf974,@Wang-Kai)
Improve GPU Monitoring Function
The new version of Volcano optimizes and enhances GPU monitoring indicators, fixes the problem of inaccurate GPU monitoring, and adds node information to the GPU computing power and video memory monitoring indicators, allowing users to more intuitively view the computing power of each GPU on each node, the total amount and allocated amount of video memory.
Related PR: (#3620, @archlitchi)
Optimize Helm Chart Installation And Upgrade Processes
Volcano has optimized the installation and upgrade process of helm chart, and supports installing helm chart packages to set more custom parameters, mainly including:
- By using the helm hook mechanism, after successfully installing Volcano, the volcano-admission-init job is automatically deleted to avoid the subsequent upgrade failure using helm upgrade
- Update the secret file required by Volcano admission after each successful installation to avoid the problem of repeated installation and uninstallation of Volcano without specifying the helm package name, which will cause the Volcano admission process to fail.
- Support setting common labels for resource objects in helm packages
- Support setting log level for Volcano components through helm
- Support specifying the image registry of Volcano components through helm
- Support setting container-level securityContext through helm
Related PRs: (#3504, #3653, #3511, #3656, #3436, #3704 @Monokaix, @Aakcht, @chenshiwei-io, @calvin0327, @lekaf974)
Changes
- don't enable error cache if task role spec is empty(#3733 @lowang-bh)
- enquable and allocatable compare resource with the required dimensions and add testcaes (#3732 @lowang-bh)
- fix: volumeZone and podTopologySpread don't work if nodeVolumeLimits is enabled (#3727 @JesseStutler)
- fix: not remove podgroup uid will cause topology annotation to be useless (#3711 @JesseStutler)
- Pod Scheduling Readiness (#3658 @ykcai-daniel)
- feat: Add securityContext support at container level in helm chart templates (#3704 @lekaf974)
- [Ready]docs about job's min resource (#2945 @lowang-bh)
- Proposal for Support of Pod Scheduling Readiness (#3581 @ykcai-daniel)
- Support sidecar scheduling (#3706 @Monokaix)
- fix: vcctl unit test ci failed (#3708 @googs1025)
- Add queue priority (#3700 @TaiPark)
- vcctl(jobtemplate): fix describe miss APIVersion and Kind bug, and remove ManagedFields (#3692 @googs1025)
- Chart: remove duplicate label field (#3707 @Aakcht)
- delete reservation plugin doc (#3546 @hwdef)
- improve unschedule message (#3538 @lowang-bh)
- Add queue priority design doc (#3602 @TaiPark)
- fix: preempt in same job should also skip pod which is not preemptable (#3683 @lowang-bh)
- vulnerability fix (#3690 @harshitasao)
- fix always update pod nominatedNodeName when pod is pipelined (#3680 @bibibox)
- add kind/question label for question issue (#3678 @hwdef)
- Improve capacity plugin: Only compare requested dimension resources when reclaim (#3664 @Monokaix)
- Expose volcano components (controller, scheduler, etc.) log level control to the helm chat values (#3656 @chenshiwei-io)
- Update volcano-admission secret when it already exists (#3653 @Monokaix)
- fix pg controller create redundancy podGroup when schedulerName isn't matched (#3672 @liuyuanchun11)
- feat: add vcctl pod list (#3530 @googs1025)
- Added the scorecard github action and its badge (#3655 @harshitasao)
- feat: add oidc's auth provider (#3663 @snappyyouth)
- fix: remove cache sync repeatedly (#3583 @lx1036)
- Fix predicate return (#3553 @lowang-bh)
- metrics: use milliseconds instead of microseconds as the time unit for scheduling latency (#3548 @microyahoo)
- resource compare support only consider the requested resource item (#3522 @lowang-bh)
- Optimize bind event update logic (#3622 @wangyang0616)
- aggregate err when checking options (#3586 @googs1025)
- feat: add vcctl job list queueName filter (#3524 @googs1025)
- add hwdef as approver (#3643 @hwdef)
- use create pod result for statistic of job status (#3598 @Wang-Kai)
- Use helm package flag to set version in generate chart release (#3648 @Monokaix)
- feat: add vcctl jobflow command (#3543 @googs1025)
- upgrade kube-state-metrics image versio to v2.0.0-beta from v1.9.7 (#3566 @lengrongfu)
- update issue template (#3644 @hwdef)
- Add lowang-bh as approver (#3632 @lowang-bh)
- Update volcano-vgpu monitoring system (#3620 @archlitchi)
- Add Monokaix as approver (#3607 @Monokaix)
- Remove enqueue action in preempt e2e case (#3639 @Monokaix)
- Add config field to the ControllerOption (#3615 @liuyuanchun11)
- Improve ginkgo e2e case of preempt (#3626 @Monokaix)
- fix ut that priority can not be guaranteed cross nodes when choose victims (#3576 @lowang-bh)
- Fix spark ci error (#3625 @Monokaix)
- add enhancement vcctl features docs (#3584 @googs1025)
- Fix CSI node selector not take effect (#3594 @Monokaix)
- add ut and refactor for pkg/scheduler/plugins/util/k8s package (#3412 @googs1025)
- Add container log size limit config for kind node (#3609 @Monokaix)
- fix lint error: third part imports should put after system package (#3610 @lowang-bh)
- handle job not found case (#3599 @Wang-Kai)
- improve: continue allocating if remain tasks is more than job's minMember (#3430 @lowang-bh)
- filter out those nodes which are UnschedulableAndUnresolvable when preempting (#3432 @lowang-bh)
- fix: use built-in slices package from go1.21 for simplify code (#3585 @lx1036)
- Remove invalid parameter IgnoredNamespaces (#3557 @xieyanker)
- optimization verion command (#3580 @lengrongfu)
- Added pod annotation with job RetryCount (#3544 @belo4ya)
- add controller gate flag (#3486 @googs1025)
- Delete unused variables. (#3532 @MichaelXcc)
- reduce function calling (#3540 @lowang-bh)
- Add feature gates flag (#3571 @Monokaix)
- Update Kubernetes compatibility (#3569 @Monokaix)
- Fix podgroup not created (#3561 @liuyuanchun11)
- Volcano adapts to the k8s v1.30 (#3556 @guoqinwill)
- Shrink permissions of vc scheduler & controller (#3545 @Monokaix)
- update pod status when bind error (#3547 @bibibox)
- Added logic to avoid unnecessary API update (#3541 @babugeet)
- put back the queue to priority queue after job's resource allocating finished (#3515 @lowang-bh)
- Avoid unnecessary API update and re-enqueue action (#3514 @wangyysde)
- typo: fix typo in sla plugin (#3529 @yxxhero)
- Add common labels for chart objects (#3511 @Aakcht)
- feat: add vcctl jobtemplate (#3508 @googs1025)
- feat: add jobflow name for vcjob label and annotation when creating (#3500 @googs1025)
- Add pre-install&pre-upgrade hook for admission-init job (#3504 @Monokaix)
- fix calculations of podgroup min resource (#3057 @lowang-bh)
- Update unit-test && e2e to run on Mac (#3506 @yeahdongcn)
- Replace reflect.DeepEqual by equality.Semantic.DeepEqual (#3493 @lekaf974)
- optimizate controllers with one sharedInformerFactory (#3497 @y-ykcir)
- Mark --lock-object-namespace as deprecated and throw a warning when it is used (#3483 @SataQiu)
- add lock before read Jobs map #3478 (#3479 @Wang-Kai)
- refactor vcctl (#3494 @googs1025)
- Update NominatedNodeName for pipelined task (#3498 @bibibox)
- fix bug3476 (#3480 @wangyysde)
- fix: log info or error print (#3410 @googs1025)
- Feature/allow helm config overrides (#3354 @lukasboettcher)
- feat: added question to issue template (#3423 @shruti2522)
- test: add priority plugin UTs (#3457 @lowang-bh)
- fix job clone miss PgUID (#3462 @lowang-bh)
- Added documentation on what must be done to adapt volcano to the k8s version upgrade. (#3459 @guoqinwill)
- bugfix and code refactor around klog (#3454 @SataQiu)
- fix: log when outputting preempt error (#3447 @googs1025)
- Add options: worker-threads-for-gc (#3425 @WulixuanS)
- add ut for enqueue action (#3453 @googs1025)