Changelog
General
- Cluster Autoscaler can now provision nodes before all pending pods are created and marked as unschedulable by scheduler. This behavior is disabled by default and can be enabled with
--enable-proactive-scaleup
flag.--pod-injection-limit
flag is introduced to allow fine-tuning this behavior. (#7145)- This functionality can significantly speed up provisioning of nodes when hundreds or thousands of pods are created at the same time as well as lead to better scale-up decisions in those cases.
- Injecting too many pods can make CA unstable, depending on number of NodeGroups and scalability of particular cloud provider integration.
--pod-injection-limit
can help control this.
- Added support for ProvisioningRequest v1 API. (#7195)
- Allows the user to use in-cluster kubernetes configuration while self-hosting cluster-autoscaler as a pod within their cluster. (#7156)
- Faster handling of failed scale ups, useful especially with multiple quota or stockout errors across the cluster. (#7087)
- Bin packing will be cut short after exceeding "maxBinpackingDuration". The "maxBinpackingDuration" is set using an new flag "--max-binpacking-time". This can prevent rare cases where CA gets unresponsive in scenarios with a very large number of pods pending. (#6556)
- Added a new
least-nodes
expander (#6792)
AWS
- Fix an issue in the Kubernetes Cluster Autoscaler where actual AWS instances could be incorrectly scaled down instead of placeholders. (#6911)
- Fix an issue with reading taints on Managed Node Groups scaled to zero, that can cause scale-up of nodes with taints that pending pods don't tolerate (#6482)
Azure
- ACTION REQUIRED: VMSS GPU Nodes must now also include the
kubernetes.azure.com/accelerator
label in addition toaccelerator
. (#7203) - From now on, users should refer to https://cloud-provider-azure.sigs.k8s.io/install/configs/ for configuration interface (#6947)
- Fixed an issue where environment variables were not being passed in when config file exists (#6947)
- Fixed an issue where some cloud provider configurations were not being validated when UseManagedIdentityExtension is set to true (#6947)
- Renamed several fields from config file, with old names are still acceptable and taking precedence:
useWorkloadIdentityExtension
touseFederatedWorkloadIdentityExtension
,vmssCacheTTL
tovmssCacheTTLInSeconds
,vmssVmsCacheTTL
tovmssVirtualMachinesCacheTTLInSeconds
,enableVmssFlex
toenableVmssFlexNodes
(#6947) - Renamed several environment variables, with old names are still acceptable and taking precedence:
ARM_USE_MANAGED_IDENTITY_EXTENSION
toARM_USE_FEDERATED_WORKLOAD_IDENTITY_EXTENSION
,AZURE_VMSS_CACHE_TTL
toAZURE_VMSS_CACHE_TTL_IN_SECONDS
,AZURE_VMSS_VMS_CACHE_TTL
toAZURE_VMSS_VMS_CACHE_TTL_IN_SECONDS
,AZURE_ENABLE_VMSS_FLEX
toAZURE_ENABLE_VMSS_FLEX_NODES
(#6947) - Fix some cases where instance cache is outdated but not getting refreshes (#7116)
- Support cloud provider AAD certificate authentication (#7003)
- getVMSS api will be called when using spot instances for having better updated information (#6470)
- The
AZURE_CLUSTER_AUTOSCALER_USER_AGENT_SUFFIX
variable can be used to customize the user agent for the Azure provider of cluster-autoscaler. Setting this to-my-user-agent
results in a user agent likeGo/go1.22.5 (amd64-linux) go-autorest/v14.2.1 cluster-autoscaler-my-user-agent/v1.31.0-alpha.2
. (#7033) - You can now optionally specify a default
min
andmax
size for Azure VMSSs through the auto discovery tags. Explicitmin
andmax
tags on VMSSs will still be given priority over the default. (#6863). - Skips Azure-specific node labels that might mistakenly categorize nodegroups as different when, in reality, they are similar. (#6634)
Cluster API
- Added configurable autoscaling options to clusterapi provider allowing users to configure e.g.
--scale-down-unneeded-time
on a per node group level. (#6743)
GCE
- GCE cloud provider will use Instance.List api to list mig instances. IGM.ListManagedInstances api will be used as a fall back mechanism and for listing instances for migs that have instances in creating or deleting states. This should improve performance in clusters with a large number of NodeGroups. (#6955)
Hetzner
- Fixed exhausted node groups not backing off for Hetzner Provider (#6750)
Images
registry.k8s.io/autoscaling/cluster-autoscaler:v1.31.0
registry.k8s.io/autoscaling/cluster-autoscaler-arm64:v1.31.0
registry.k8s.io/autoscaling/cluster-autoscaler-amd64:v1.31.0
registry.k8s.io/autoscaling/cluster-autoscaler-s390x:v1.31.0
Full Changelog: cluster-autoscaler-1.30.0...cluster-autoscaler-1.31.0