Changes by Kind
Deprecation
- Remove incorrect Kubernetes REST API types in proto messages (#8746, @liggitt @jackfrancis)
- Deprecating scale-down-enabled flag (#9060, @YahiaBadr)
API Change
- Capacitybuffers.autoscaling.x-k8s.io scope changed from Cluster to Namespaced. (#8824, @jakubin)
- Make CapacityBuffer ProvisioningStrategy a string (#8832, @jakubin)
- The cluster-autoscaler cloudprovider/externalgrpc NodeGroupTemplateNodeInfoResponse and NodeGroupAutoscalingOptions APIs deprecate the nodeInfo, scaleDownUnneededTime, scaleDownUnreadyTime, MaxNodeProvisionTime fields, in favor of nodeBytes, scaleDownUnneededDuration, scaleDownUnreadyDuration, MaxNodeProvisionDuration fields. (#8747, @liggitt)
Feature
- Released CapacityBuffer v1beta1 API (#9093, @norbertcyran)
- Integrated CapacityBuffers with the core ResourceQuotas. Buffers will now respect the defined quotas, and the number of the buffer's replicas will be shrunk accordingly if provisioning the buffer would exceed the limit. (#9092, @norbertcyran)
- Add support for counting CSI volume limits when scaling nodes (#9037, @gnufied)
- Add amd.com/gpu as a supported GPU resource name (#8629, @yansun1996)
- Cluster autoscaling now works with
gpu.intel.com/xeresource requests. (#9050, @poussa) - Add capacity buffers scale up events (#8621, @abdelrahman882)
- Load CapacityBuffer resource to schema (#9135, @norbertcyran)
- Added vpa.recommender to the cluster-autoscaler Helm chart, allowing VPA to use a specific recommender (#8567, @hobti01)
- Cluster Autoscaler now supports a new flag
--max-node-startup-timeto configure the maximum wait time for a node to become ready. (#8543, @lxuan94-pp) - Introduced a new metric, tracking time to process all nodes in scale down simulations.
--max-node-skip-eval-time-tracker-enabledflag enables the new metric. (#8614, @shaikenov) - Pass imagePullSecrets to updater deployment (#8711, @wilfriedroset)
- Pods can now request habana.ai/gaudi as a valid resource (#8853, @DorWeinstock)
- Use optimized scaling loop interval by default (#8019, @kawych)
- Cluster Autoscaler adds a new Prometheus histogram metric (behind a feature flag):
cluster_autoscaler_node_deletion_duration_seconds— duration from when a node is marked as unneeded until it is either deleted (deleted="true") or becomes needed again (deleted="false").
Reported values are adjusted by subtracting the configured scale-down threshold. (#8485, @ttetyanka) - New --predicate-parallelism flag allowing CA to use more threads to run scheduler predicates. (#8729, @x13n)
- Add new label to last_activity metric called "mainSuccessful" that is emitted after each full loop execution that finishes without errors. (#8964, @jbtk)
- When pod listing, filter out unschedulable and scheduler unprocessed pods not in the list of allowed schedulers (#8869, @damikag)
Bug or Regression
- Add topology.k8s.aws/zone-id to AWS ignore label set (#8910, @DaThumpingRabbit)
- Cluster Autoscaler no longer panics when the DRA flag is false but there are pending Pods referencing ResourceClaims in the cluster. (#8598, @towca)
- Fix: exclude gated pods from proactive scale up (#8580, @abdelrahman882)
- Fix: handle missing node info in SimulateNodeRemoval (#8449, @kincoy)
- Fixed a bug causing
--max-binpacking-timeflag to be ignored. (#8801, @x13n) - Nodes with GPUs exposed via DRA are no longer treated as unready if they don't have the nvidia.com/gpu custom resource in allocatable (#8547, @mtrqq)
- CoreWeave: This improves the user experience when using the CoreWeave implementation of the Cluster Autoscaler. Specifically prevents scenarios such as too many nodes getting removed and node groups being mistakenly scaled up. (#8880, @nickstern2002)
- Fix scale-down behavior when both scale-down-utilization-threshold and node utilization are zero, so empty nodes (for example, nodes with only DaemonSet pods) can be scaled down as expected. (#9031, @kincoy)
- Fixed a bug in node removal latency metrics where "flapping" node states caused incorrect or negative latency values. Metrics now accurately track the full duration a node remains unneeded. (#9065, @ttetyanka)
- Fixed an issue where the --log-file option stopped working. Logging now correctly writes to the specified file in addition to standard error. (#9104, @lxuan94-pp)
- Fixed event recording for CapacityBuffers. (#9169, @norbertcyran)
- Fixed misleading validation message returned when PodTemplateRef and ScalableRef are set together in CapacityBuffer's spec. (#9131, @norbertcyran)
- Fixes an issue where
node_removal_latency_secondsreported incorrect values (missing the unneeded/unready threshold) when a previously unneeded node became needed again. (#9005, @ttetyanka)
Cloudprovider Changes
- CoreWeave: Enable scale from zero in Coreweave cluster autoscaler (#8924, @nickstern2002)
- Hetzner: add IP range configuration for private network (#8570, @tloesch)
- GCE: Add price info for N4A machine family (#8687, @drjackild)
- GCE: add n4d pricing (#8657, @mariafromano-25)
- GCE: Minor performance improvement (#9133, @x13n)
- GCE: Add GCE-specific deployment scripts for E2E testing (#9142, @Choraden)
- AWS: Use AWS SDK v2, deprecate v1 (#8896, @domenicbozzuto)
- AWS: Update EC2 static instance list on 2026-01-07 (#9021, @punkwalker)
- AWS: use k8s.io/cloud-provider-aws v1.35.0 (#9025, @jackfrancis)
- AWS: Add g7e and x8i EC2 instance types for AWS (#9108, @ceuity)
- AWS: Log the correct allocatable resources from tags data during EKS node template build flow (#8709, @jackfrancis)
- Azure: Update Azure SDK to v2 (#8784, @mboersma)
- Azure: Regenerate Azure static SKU list (#8507, @yanjar)
- Cluster API: Fix potential race/deadlock in CAPI unstructured handling (#9173, @joelsmith)
- Cluster API: The ClusterAPI provider will now recognize labels that should be propagated from MachineDeployments and MachineSets when scaling from zero. (#8713, @elmiko)
- Scaleway: cloudprovider optimization (#8782, @pablo-ruth)
- Scaleway: Fix Scaleway cloud provider node pricing when cluster has 0 node pools (#9096, @pablo-ruth)
Other (Cleanup or Flake)
- Update Kubernetes libraries to v1.35.0 (#8953, @jackfrancis)
- Fix race condition in eligibility_test.go (#9159, @kukichek)
Images
registry.k8s.io/autoscaling/cluster-autoscaler:v1.35.0registry.k8s.io/autoscaling/cluster-autoscaler-arm64:v1.35.0registry.k8s.io/autoscaling/cluster-autoscaler-amd64:v1.35.0registry.k8s.io/autoscaling/cluster-autoscaler-s390x:v1.35.0