Deprecations

--ignore-taint flag and ignore-taint.cluster-autoscaler.kubernetes.io/ taint prefix are now deprecated. Instead use:
- --status-taint flag or status-taint.cluster-autoscaler.kubernetes.io/ taint prefix for taints that denote node status.
- --startup-taint flag or startup-taint.cluster-autoscaler.kubernetes.io/ taint prefix for taints that are used to prevent pods from scheduling before node is fully initialized (e.g. when using daemonset to install device plugin).
- For backward compatibility --ignore-taint flag and ignore-taint.cluster-autoscaler.kubernetes.io/ continues to work with behavior identical to startup taint (which is the same behavior it had before).
- Please see FAQ for more details. - #6132, #6218
Flags that were unused in the code (i.e. setting them had no effect) were deprecated and will be removed in the future release. Affected flags are: --node-autoprovisioning-enabled and --max-autoprovisioned-node-group-count.

General

Adds new flag --bypassed-scheduler-names with default empty value to maintain original behaviour.
If flag is set to non-empty list, CA will not wait for schedulers (listed in the flag value) to mark pods as unschedulable and will evaluate non processed pods. Furthermore, if bypassed schedulers are non-empty CA will not wait for pods to reach a certain age to scale-up, effectively ignoring unschedulablePodTimeBuffer - #6235
- Enabling this feature can improve autoscaling latency (CA will react to pods faster), but it can also increase load on CA in case of very large scale-ups (thousands of pending pods). This is because limited scheduler throughput can effectively act as a rate limiter, protecting CA from having to process a scale-up of too many pods at the same time. We believe this change will be beneficial in vast majority of environments, but given that CA scalability varies greatly between cloud providers we recommend testing this feature before enabling it in large clusters.
A new flag (--drain-priority-config) is introduced which allows users to configure drain behavior during scale-down based on pod priority. The new flag is mutually exclusive with --max-graceful-termination-sec. --max-graceful-termination-sec can still be used if the new configuration options are not needed. The default behavior is preserved (simple config, default value of --max-graceful-termination-sec). - #6139
Added --dynamic-node-delete-delay-after-taint-enabled flag. Enabling this flag changes delay between tainting and draining a node from constant delay to a dynamic one based on Kubernetes api-server latency. This minimizes the risk of race conditions if api-server connection is slow and improves scale-down throughput when it's fast. - #6019
Add structured logging support via --logging-format json - #6035
Introduced a new node_group_target_count metric that keeps track of target sizes of each NodeGroup. This metric is only available if --emit-per-nodegroup-metrics flag is enabled. - #6361
Introduced a new node_taints_count metric tracking different types of taints in the cluster. - #6201
New command line option --kube-api-content-type is added to specify content type to communicate with apiserver. This option also changes default content type from "application/json" to "application/vnd.kubernetes.protobuf". - #6114
Fixed a bug where resource requests of restartable init containers were not included in utilization calculation. - #6225
Fixed a bug where CA might have created less nodes than desired with a message about "Capping binpacking after exceeding threshold of 4 nodes" even though it then didn't actually add four new nodes. - #6165
Fixed support for --feature-gates=ContextualLogging=true. - #6162
Fixed a bug where scale down may have failed with "daemonset.apps not found". - #6122
Optimized CA memory usage. - #6159, #6110
Disambiguated wording in the log messages related to node removal ineligibility caused by high resource allocation. - #6223
Pods with the "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" annotation will now always report an annotation-related warning message if they block scale-down (where previously they might've reported e.g. a message about not being replicated).- #6077

AWS

Added c7a, r7i, mac2-m2 families and size in i4i, c7i.metal, r7a.metal, r7iz.metal for Amazon EC2 instances static list. - #6347
Added p5.48xlarge - #6131
Updated cloudprovider/aws/aws-sdk-go to 1.48.7 in order to support dynamic auth token. - #6325
Fixed an issue where the capacityType label inferred from an empty AWS ManagedNodeGroup does not match the same label on the nodes after it scales from 0 -> 1. - #6261
Introduced caching to reduce volume of DescribeLaunchTemplateVersions API calls made by Cluster Autoscaler. - #6245
Nodes annotated with k8s.io/cluster-autoscaler-enabled=false will be skipped by CA and would no longer produce spammy logs about missing AWS instances. - #6265, #6301
Added additional log output when updating the ASG information form AWS. - #6282
Fixed a bug where CA may tried to remove an instance that was already in Terminated state. - #6166
Scale up from 0 now working with existing AWS EBS CSI PersistentVolume without having to add tag to ASG. - #6090

Azure

Removed AKS vmType. - #6186

Civo

Introduced support for scaling NodeGroup from 0. - #6322

Cluster API

Users of Cluster API can override the default architecture to consider in the templates for autoscaling from zero so that pods requesting non-amd64 nodes in their node selector terms can trigger the scale-up in non-amd64 single-arch clusters. - #6066

Equinix Metal

The packet provider and its configuration parameters are now deprecated in favor of equinixmetal - #6085
- The cluster-autoscaler --cloud-provider flag should now be set to equinixmetal. For backward compatibility, "--cloud-provider=packet" continues to work
- "METAL_AUTH_TOKEN" replaces "PACKET_AUTH_TOKEN". For backward compatibility, the latter still works.
- "EQUINIX_METAL_MANAGER" replaces "PACKET_MANAGER". For backward compatibility, the latter still works.
- Each node managed by cloud-provider "equinixmetal" will be labeled with the "METAL_CONTROLLER_NODE_IDENTIFIER_LABEL" defined label. For backward compatibility, "PACKET_CONTROLLER_NODE_IDENTIFIER_LABEL" still works.
We now use metros in the Equinix Metal (Packet) cloudprovider. Facilities support has been removed. - #6078

GCE

Flag --gce-expander-ephemeral-storage-support is now Deprecated. The ephemeral-storage support is always enabled and the flag itself would be ignored.
Support for paginated MIG instance listing. - #6376
Improved reporting of errors related to GCE Reservations. - #6093

gRPC

Timeout of grpc calls can be specified through cloud-config. - #6373
grpc based cloud providers can now pass the grpc error code 12, Unimplemented, to signal they do not implement optional methods. - #5937
Fixed: cluster-autoscaler thinks newly scaled up nodegroup using externalgrpc provider has MaxNodeProvisionTime set as 0 seconds and expects the new node to be registered in 0-10 seconds instead of the default 15m. Check #5935 for more info. - #5936

Hetzner

Fixed a bug where failed servers are kept for longer than necessary. - #6364
Fixed a bug where too many requests are sent to the Hetzner Cloud API, causing Rate Limit issues. - #6308
Each node pool can now have different init configs. - #6184

Kwok

Introduced new kwok cloud provider (check https://github.com/kubernetes/autoscaler/blob/kwok-poc/cluster-autoscaler/cloudprovider/kwok/README.md) for more info.

Images

registry.k8s.io/autoscaling/cluster-autoscaler:v1.29.0
registry.k8s.io/autoscaling/cluster-autoscaler-arm64:v1.29.0
registry.k8s.io/autoscaling/cluster-autoscaler-amd64:v1.29.0
registry.k8s.io/autoscaling/cluster-autoscaler-s390x:v1.29.0

kubernetes/autoscaler cluster-autoscaler-1.29.0 Cluster Autoscaler 1.29.0 on GitHub

Deprecations

General

AWS

Azure

Civo

Cluster API

Equinix Metal

GCE

gRPC

Hetzner

Kwok

Images

kubernetes/autoscaler cluster-autoscaler-1.29.0
Cluster Autoscaler 1.29.0

on GitHub