Changes since v0.17.1:
Changes by Kind
Bug or Regression
- FailureRecovery: Forcefully delete pods that are Failed/Succeeded and scheduled on unreachable nodes.
This unblocks cases like a JobSet deleting a Job with foreground cascade being stuck because a pod in a terminal phase exists on one of the unhealthy nodes. (#10856, @kshalot) - Fix a race-condition bug that a deleted ClusterQueue may be kept by a finalizer, even after deletion of all workloads and LQs. (#10833, @ShaanveerS)
- Fixed a bug in Kueue's cache that could leave stale SubtreeQuota values in ancestor cohorts after a child Cohort
was deleted, leading to potential over-admission of workloads and incorrect metrics reporting. (#10840, @mszadkow) - Fixed a bug where admitted Workloads could fail to patch through the v1beta1 API due to CEL validation of the
priorityClassSourceimmutability rule. (#10631, @kannon92) - Observability: Fix kueue_cohort_subtree_quota and kueue_cohort_subtree_resource_reservations metrics incorrectly reporting raw milliCPU values instead of CPU units for CPU resources. (#10754, @baoalvin1)
- Observability: downgrade the non-compatible flavor error logs to Info level (v3). (#10639, @maishivamhoo123)
- TAS: Fix a bug where admitted workloads with unhealthy nodes were not evicted when an AdmissionCheck entered Retry or when the PodsReady recovery timeout was exceeded. (#10692, @pajakd)
- TAS: Fix handling of PodSet groups which could lead in some scenarios to empty topologyAssignment. (#10841, @mimowo)
- TAS: Fix nil pointer panic in TAS node reconciler when unadmitted workloads exist in the cluster. (#10653, @j-skiba)
- TAS: Refine the NodeHotSwap logic to ensure that UnhealthyNodes are only updated for workloads currently assigned to a Node via a topology topology assignment. This prevents "late pods" from stale topologies from triggering inaccurate health reporting. (#10837, @j-skiba)
- VisibilityOnDemand: Fixed a bug in the visibility endpoint, that listing workloads from a local queue includes
workloads from other LocalQueues in different namespaces, if the other LocalQueues have the same name. (#10679, @mbobrovskyi)