kubernetes-sigs/kueue v0.16.2 on GitHub

Changes since v0.16.1:

Changes by Kind

Feature

KueueViz Helm: Add podSecurityContext and containerSecurityContext configuration options to KueueViz Helm chart for restricted pod security profile compliance (#9319, @ziadmoubayed)
Observability: Increased the maximum finite bucket boundary for admission_wait_time_seconds histogram from ~2.84 hours to ~11.3 hours for better observability of long queue times. (#9507, @mukund-wayve)

Bug or Regression

ElasticJobs: fix the temporary double-counting of quota during workload replacement.
In particular it was causing double-counting of quota requests for unchanged PodSets. (#9364, @benkermani)
FairSharing: workloads fitting within their ClusterQueue's nominal quota are now preferred over workloads that require borrowing, preventing heavy borrowing on one flavor from deprioritizing a CQ's nominal entitlement on another flavor. (#9532, @mukund-wayve)
Fix non-deterministic workload ordering in ClusterQueue by adding UID tie-breaker to queue ordering function. (#9140, @sohankunkerkar)
Fix serverName substitution in kustomize prometheus ServiceMonitor TLS patch for cert-manager deployments. (#9188, @IrvingMg)
Fixed invalid field name in the ClusterQueue printer columns. The "Cohort" column will now correctly display the assigned cohort in kubectl, k9s, and other UI tools instead of being blank. (#9422, @polinasand)
Fixed the bug that prevented managing workloads with duplicated environment variable names in initContainers. This issue manifested when creating the Workload via the API. (#9126, @monabil08)
FlavorFungability: fix the bug that the semantics for the flavorFungability.preference enum values
(ie. PreemptionOverBorrowing and BorrowingOverPreemption) were swapped. (#9486, @tenzen-y)
LeaderWorkerSet: fix an occasional race condition resulting in workload deletion getting stuck during scale down. (#9135, @PannagaRao)
MultiKueue: Fix a bug that the remote Job object was occasionally left by MultiKueue GC,
even when the corresponding Job object on the management cluster was deleted.
This issue was observed for LeaderWorkerSet. (#9310, @sohankunkerkar)
MultiKueue: for the StatefulSet integration copy the entire StatefulSet onto the worker clusters. This allows
for proper management (and replacements) of Pods on the worker clusters. (#9539, @IrvingMg)
Observability: Fix missing "replica-role" in the logs from the NonTasUsageReconciler. (#9456, @IrvingMg)
Observability: Fix the stale "replica-role" value in scheduler logs after leader election. (#9431, @IrvingMg)
Scheduling: Fix the bug where inadmissible workloads would be re-queued too frequently at scale.
This resulted in excessive processing, lock contention, and starvation of workloads deeper in the queue.
The fix is to throttle the process with a batch period of 1s per CQ or Cohort. (#9490, @gabesaba)
TAS: Fix a bug that LeaderWorkerSet with multiple PodTemplates (.spec.leaderWorkerTemplate.leaderTemplate and .spec.leaderWorkerTemplate.workerTemplate), Pod indexes are not correctly evaluated during rank-based ordering assignments. (#9368, @tenzen-y)
TAS: fix a bug where NodeHotSwap may assign a Pod, based on rank-ordering, to a node which is already
occupied by another running Pod. (#9282, @j-skiba)