Changelog
General
- Parallel node drain for scale down is partially implemented, controlled by the
--max-scale-down-parallelism
and--max-drain-parallelism
flags. The feature is still under development and not recommended for production use. Currently setting--max-drain-parallelism
to>1
is a no-op, while--max-scale-down-parallelism
can be used to control the maximum number of empty nodes deleted at the same time instead of the existing--max-empty-bulk-delete
flag. The full feature is targeting Cluster Autoscaler 1.26 (tracking issue). - Introduces the
--max-pod-eviction-time
flag to allow configuration ofMaxPodEvictionTime
, i.e., the maximum time the cluster autoscaler tries to evict a pod (#4842). - Backoff time parameters are now configurable via the
--initial-node-group-backoff-duration
,--max-node-group-backoff-duration
, and--node-group-backoff-reset-timeout
flags (#3853). - Event de-duplication is now configurable via the
--enable-event-duplication
flag (#4921). - Fix an issue where CA could drastically overshoot scale-up for pods using zonal scheduling constraints (PodTopologySpreading or PodAntiAffinity on zonal topology) (#4970).
- Limit maximum duration of binpacking simulation to prevent CA becoming unresponsive in huge scale-up scenarios. Introduce
--max-nodes-per-scaleup
and--max-nodegroup-binpacking-duration
that can be used to control this behavior (note: those flags are only meant for fine-tuning scale-up calculation latency; they're not intended for rate-limiting scale-up) (#4970). PodDisruptionBudget
is bumped fromv1beta1
tov1
(#4990).- Non-root user is now used for Cluster Autoscaler's base image (#4728).
- Add an option to balance node groups exlusively by a set of labels defined by the
--balancing-label
flag (#4174). - A new metric (
cluster_autoscaler_skipped_scale_events_count
) has been added to monitor when CPU and memory resource limits have been exceeded (#5059).
GCE
- Correct memory and ephemeral storage capacity calculations for ARM instances (#4899).
- Add ephemeral storage pricing (#4911).
- Correct invalid pricing for
n2-highmem-128
, and then2d
family (#4959). - Fix support for unusual custom machine types (from families other than n1, or using extended memory) (#5024, #5103).
- Make
VM_EXTERNAL_IP_ACCESS_POLICY_CONSTRAINT
error code recognizable (#5057). - Add pricing for new A2 shapes and GPUs (#5070).
AWS
- Support for NVIDIA A10G GPU type added (#4920).
- Instance type list is updated, including
c7g
,i4i
,x2i(e)dn
,c6id
,m6id
(#4917, #4925). - Cluster Autoscaler can still work if instance type listing fails (#4873).
- DescribeAutoScalingGroups now supports directly including tag filers, which results in less API calls to AWS. Users of the AWS Cloudprovider may want to update their IAM roles to remove the DescribeTags action as this is no longer used
- Add support for attribute-based instance type selection for AWS using available instance requirements (#4588).
Azure
- Update instance types, including
Standard_Ls_v3
,Standard_HB120
, andStandard_NC
(#5037). - Effectively cache instance-types SKUs (#5047).
Hetzner
- Add support for hcloud firewall feature (#4185).
- Add Hetzner public IPv4 and IPv6 configuration (#5001).
- Add metrics for API calls (#5049).
- Cache Hetzner Cloud API requests (#5055).
Cluster API
- Drop deprecated annotations (#4928).
- Add support for scaling to and from zero nodes (#4840). Enabling this feature will require changes by the user, for instruction please see the Cluster API (clusterapi) provider README file.
OVHcloud
- Various bug fixes (#4874),
OCI
- Support for skipping time-consuming findsInstanceByDetails API calls, turned off by default (#4860).
External gRPC
- Proxy cloud provider for pluggable out-of-tree cloud provider implementations over gRPC is implemented (#4654).
CherryServers
- Cluster Autoscaler support for CherryServers is implemented (#4843).
- Support for including SSH keys to node pools (#4867).
- Support for passing os partition size when creating nodes (#4955),
Civo
- Cluster Autoscaler support for Civo is implemented (#4852).
Scaleway
- Cluster Autoscaler support for Scaleway is implemented (#5062).
Rancher
- Cluster Autoscaler support for Rancher with RKE2 is implemented (#4975).
Kamatera
- Cluster Autoscaler support for Kamatera is implemented (#5101).
Images
k8s.gcr.io/autoscaling/cluster-autoscaler:v1.25.0
k8s.gcr.io/autoscaling/cluster-autoscaler-arm64:v1.25.0
k8s.gcr.io/autoscaling/cluster-autoscaler-amd64:v1.25.0