Changes since v0.17.0:
Urgent Upgrade Notes
(No, really, you MUST read this before you upgrade)
- AdmissionChecks: Add the alpha
RejectUpdatesToCQWithInvalidOnFlavorsfeature gate (disabled by default) to reject updates to existing ClusterQueues with invalidAdmissionCheckStrategy.OnFlavorsreferences.
when enabling this feature gate, fix any existing invalidOnFlavorsreferences before updating the affected ClusterQueues. (#10384, @ShaanveerS) - Observability: Replace the "evicted_workloads_once_total" metric "detailed_reason" label with "underlying_cause" label. This is a consistency fix as all other metrics name the label "underlying_cause".
If you use the "detailed_reason" label for the "evicted_workloads_once_total", you can migrate to "underlying_cause" label. (#10637, @vamsikrishna-siddu)
Changes by Kind
Feature
- Aggregate Kueue CRD read-only clusterRoles to k8s default view clusterRole (#10482, @amy)
- Improve eviction message for AdmissionChecks in Retry state to include per-check name and reason (#10623, @reruno)
- Introduce Concurrent Admission feature (#10610, @PBundyra)
- Kueue-populator: support init different ClusterQueue in single namespace. (#9746, @samzong)
- Promote MultiKueueRedoAdmissionOnEvictionInWorker to stable. (#10695, @mbobrovskyi)
- Promote MultiKueueWaitForWorkloadAdmitted to stable. (#10656, @mbobrovskyi)
- Promote SkipFinalizersForPodsSuspendedByParent to stable. (#10645, @mbobrovskyi)
Documentation
- Documentation: New agent kueue related skills under cmd/experimental/agent/skills in the kueue repo (#10744, @amy)
Failing Test
- Observability: Fix a bug where kueue_cohort_subtree_admitted_workloads_total and kueue_cohort_subtree_admitted_active_workloads metrics could include results for an implicit root Cohort after deletion of a child Cohort or ClusterQueue. (#10080, @mbobrovskyi)
Bug or Regression
-
AdmissionChecks: ClusterQueue validation now checks that the flavors specified in
AdmissionCheckStrategy.OnFlavorsare listed in quota. (#10336, @ShaanveerS) -
AdmissionChecks: fix the bug that on backoff admission checks which are spanning all ResourceFlavors, such as MultiKueue, may be missing in the Workload’s status.
For MultiKueue that manifested with a bug, when aside from the MultiKueue admission check there was another non-MultiKueue admission check. In the scenario when eviction on the management cluster happened the manager that had temporarily lost connection to a worker, the remote workload would keep running on the reconnected worker, despite the workload staying without reservation on the manager cluster. (#9359, @Singularity23x0)
-
AdmissionFairSharing: Fixed a bug in entry penalties by reducing them when workload is admitted and also clearing them up if all the resources on the admission entry penalty have value zero. (#10156, @MaysaMacedo)
-
ElasticJobs: Fix a bug where pods stay gated after scale-up by allowing finished workloads to ungate their own pods. (#10272, @sohankunkerkar)
-
FailureRecovery: Forcefully delete pods that are Failed/Succeeded and scheduled on unreachable nodes.
This unblocks cases like a JobSet deleting a Job with foreground cascade being stuck because a pod in a terminal phase exists on one of the unhealthy nodes. (#10853, @kshalot) -
FailureRecoveryPolicy: Fixed an issue where pods could remain stuck terminating if their node became unreachable only after the force-termination timeout had already elapsed. (#10463, @kshalot)
-
Fix a bug in HA mode that caused follower replicas to retain stale workload caches after deletion. (#10518, @Ladicle)
-
Fix a bug where the batch/v1 Job mutating webhook could still run even when the batch/job integration was disabled. (#10315, @Ladicle)
-
Fix a race-condition bug that a deleted ClusterQueue may be kept by a finalizer, even after deletion of all workloads and LQs. (#10821, @ShaanveerS)
-
Fix handling of orphaned workloads which could result in the accumulation of stale workloads
after PodsReady timeout eviction for Deployment-owned pods. (#10274, @sebest) -
Fixed a bug in Kueue's cache that could leave stale SubtreeQuota values in ancestor cohorts after a child Cohort
was deleted, leading to potential over-admission of workloads and incorrect metrics reporting. (#10797, @mszadkow) -
Fixed a bug where admitted Workloads could fail to patch through the v1beta1 API due to CEL validation of the
priorityClassSourceimmutability rule. (#10594, @kannon92) -
LeaderWorkerSet integration: fix the bug that the PodTemplate metadata wasn't propagated to the Workload's PodSets. (#10330, @pajakd)
-
MultiKueue: Fixes the bug where a job, after being dispatched to a worker, would not sync correctly after being evicted there. This would also cause its workload to be incorrectly labeled as admitted.
Now the workload and the manager job instance will correctly reflect the evicted state and MultiKueue will perform a fallback, then dispatch remote workloads to all eligible workers again after being evicted from the Worker it was successfully admitted to before. An example of such a case is if the remote instance got preempted on the worker. (#9670, @Singularity23x0)
-
MultiKueue: fix the bug that when custom admission checks are configured on the manager cluster, other than
the MultiKueue admission check, then the Job may start running on the selected worker before the other admission
checks are satisfied (Ready). We fix the issue by deferring the dispatching of workload until all non-MultiKueue AdmissionChecks become Ready. (#9866, @mszadkow) -
Observability: Fix excessive memory overhead in hot code paths by reusing the named logger in NewLogConstructor and avoiding unnecessary logger cloning. (#10365, @MatteoFari)
-
Observability: Fix kueue_cohort_subtree_quota and kueue_cohort_subtree_resource_reservations metrics incorrectly reporting raw milliCPU values instead of CPU units for CPU resources. (#10747, @baoalvin1)
-
Observability: avoid logging update failures as "error" when they are caused by concurrent object modifications, especially when multiple errors are present.
Example log message: "failed to update MultiKueueCluster status: Operation cannot be fulfilled on multikueueclusters.kueue.x-k8s.io "testing-cluster": the object has been modified; please apply your changes to the latest version and try again after failing to load client config: open /tmp/kubeconfig no such file or directory" (#10322, @mbobrovskyi)
-
Observability: downgrade the non-compatible flavor error logs to Info level (v3). (#10636, @maishivamhoo123)
-
TAS: Fix a bug where admitted workloads with unhealthy nodes were not evicted when an AdmissionCheck entered Retry or when the PodsReady recovery timeout was exceeded. (#10666, @vamsikrishna-siddu)
-
TAS: Fix empty slices for count=0 podSets causing infinite scheduling loop (#10478, @jzhaojieh)
-
TAS: Fix handling of PodSet groups which could lead in some scenarios to empty topologyAssignment. (#10783, @yuluo-yx)
-
TAS: Fix nil pointer panic in TAS node reconciler when unadmitted workloads exist in the cluster. (#10641, @j-skiba)
-
TAS: Refine the NodeHotSwap logic to ensure that UnhealthyNodes are only updated for workloads currently assigned to a Node via a topology topology assignment. This prevents "late pods" from stale topologies from triggering inaccurate health reporting. (#10760, @j-skiba)
-
TAS: fix a bug that Pods which only contain the
kueue.x-k8s.io/podset-slice-required-topologyorkueue.x-k8s.io/podset-slice-required-topology-constraintsas the TAS annotation are not ungated. (#10282, @tg123) -
TAS: reduce the churn on the TAS-enabled controller, called NonTasUsageReconciler, by skipping triggering
of the Reconcile on Pod changes which are irrelevant from the controller point-of-view. (#10488, @MatteoFari) -
VisibilityOnDemand: Fixed a bug in the visibility endpoint, that listing workloads from a local queue includes
workloads from other LocalQueues in different namespaces, if the other LocalQueues have the same name. (#10672, @mbobrovskyi)