github kubernetes-sigs/kueue v0.16.2

latest release: v0.15.5
8 hours ago

Changes since v0.16.1:

Changes by Kind

Feature

  • KueueViz Helm: Add podSecurityContext and containerSecurityContext configuration options to KueueViz Helm chart for restricted pod security profile compliance (#9319, @ziadmoubayed)
  • Observability: Increased the maximum finite bucket boundary for admission_wait_time_seconds histogram from ~2.84 hours to ~11.3 hours for better observability of long queue times. (#9507, @mukund-wayve)

Bug or Regression

  • ElasticJobs: fix the temporary double-counting of quota during workload replacement.
    In particular it was causing double-counting of quota requests for unchanged PodSets. (#9364, @benkermani)
  • FairSharing: workloads fitting within their ClusterQueue's nominal quota are now preferred over workloads that require borrowing, preventing heavy borrowing on one flavor from deprioritizing a CQ's nominal entitlement on another flavor. (#9532, @mukund-wayve)
  • Fix non-deterministic workload ordering in ClusterQueue by adding UID tie-breaker to queue ordering function. (#9140, @sohankunkerkar)
  • Fix serverName substitution in kustomize prometheus ServiceMonitor TLS patch for cert-manager deployments. (#9188, @IrvingMg)
  • Fixed invalid field name in the ClusterQueue printer columns. The "Cohort" column will now correctly display the assigned cohort in kubectl, k9s, and other UI tools instead of being blank. (#9422, @polinasand)
  • Fixed the bug that prevented managing workloads with duplicated environment variable names in initContainers. This issue manifested when creating the Workload via the API. (#9126, @monabil08)
  • FlavorFungability: fix the bug that the semantics for the flavorFungability.preference enum values
    (ie. PreemptionOverBorrowing and BorrowingOverPreemption) were swapped. (#9486, @tenzen-y)
  • LeaderWorkerSet: fix an occasional race condition resulting in workload deletion getting stuck during scale down. (#9135, @PannagaRao)
  • MultiKueue: Fix a bug that the remote Job object was occasionally left by MultiKueue GC,
    even when the corresponding Job object on the management cluster was deleted.
    This issue was observed for LeaderWorkerSet. (#9310, @sohankunkerkar)
  • MultiKueue: for the StatefulSet integration copy the entire StatefulSet onto the worker clusters. This allows
    for proper management (and replacements) of Pods on the worker clusters. (#9539, @IrvingMg)
  • Observability: Fix missing "replica-role" in the logs from the NonTasUsageReconciler. (#9456, @IrvingMg)
  • Observability: Fix the stale "replica-role" value in scheduler logs after leader election. (#9431, @IrvingMg)
  • Scheduling: Fix the bug where inadmissible workloads would be re-queued too frequently at scale.
    This resulted in excessive processing, lock contention, and starvation of workloads deeper in the queue.
    The fix is to throttle the process with a batch period of 1s per CQ or Cohort. (#9490, @gabesaba)
  • TAS: Fix a bug that LeaderWorkerSet with multiple PodTemplates (.spec.leaderWorkerTemplate.leaderTemplate and .spec.leaderWorkerTemplate.workerTemplate), Pod indexes are not correctly evaluated during rank-based ordering assignments. (#9368, @tenzen-y)
  • TAS: fix a bug where NodeHotSwap may assign a Pod, based on rank-ordering, to a node which is already
    occupied by another running Pod. (#9282, @j-skiba)

Don't miss a new kueue release

NewReleases is sending notifications on new releases.