Announcements
We are looking for maintainers, reach out in #5432.
Deprecation / Removal
- [Ambassador] Remove code, ci and ansible tags as it's no longer maintained and not working anymore. (#8086, @floryut)
- Drop support for Fedora 33 (#8246, @floryut)
- Remove ovn4nfv support (#8265, @floryut)
- Mitogen: support for the mitogen playbook accelerator is now deprecated in preparation of ansible upgrades, please clean up your playbooks that depend on it. (#8147, @cristicalin)
- Remove registry-proxy of container registry (#8327, @zhengtianbao)
Feature / Major changes
- Replace docker with containerd as the default container_manager (#8175, @cristicalin)
- Add ArgoCD as a kubernetes-app, using the new
argocd_enabled
variable (#7895, @atorrescogollo) - Add ServiceTypes support to container registry (using new variables
registry_service_type
,registry_service_clusterIP
,registry_service_loadBalancerIP
,registry_service_annotations
,registry_service_nodePort
) (#8291, @zhengtianbao) - Add TLS and authentication support to container registry (using new variables
registry_tls_secret
,registry_htpasswd
,registry_config
) (#8229, @zhengtianbao) - Add a new option
cert_manager_trusted_internal_ca
to specify trusted internal ca of cert_manager. (#8135, @infra-monkey) - Add a new option
metrics_server_resizer
(default to false) to control the addon-resizer container deployment in metrics-server pod (#8018, @oomichi) - Add an optional fallback to node drain during cluster upgrades using
--disable-eviction
flag (#8094, @utkuozdemir) - Add capability to use node swap with kubernetes 1.22+ (using new variable
kubelet_fail_swap_on
, default to true) (#8241, @cristicalin) - Add possibility of automation creation of Load Balancers on Google Compute Engine (#8179, @lmercl)
- Add support for Fedora 35 (#8234, @floryut)
- Add support for Rocky Linux (#8095, @ooraini)
- Add support for cgroups v2 (no more reverting to cgroups v1 for Fedora) (#8237, @cristicalin)
- Add the ability to skip some phases in the kubeadm join_phase using
kubeadm_join_phases_skip
(#8067, @necatican) - Added terraform support for Hetzner Cloud (#8053, @Xartos)
- Allow to scrape etcd metrics using a service (#8203, @sathieu)
- Default DNS replica count is now set to the minimum value between 2 and the length of k8s_cluster inventory group. (#8112, @smasset)
- Determine root filesistem device and partition before running growpart (allowing to not always be sda1) (#8024, @mlorenzo-stratio)
- Ensure apparmor is installed on Ubuntu (#8036, @rtsp)
- Fail metrics-server installation when
addon-resizer
is used on a platform different than amd64 (#8144, @zhengtianbao) - Krew: upgrade to v0.4.2 (#8168, @zhengtianbao)
- Move deprecated
kube_feature_gates
from kebelet args to kubelet config (#8048, @fungusakafungus) - Multiple Ansible versions are now supported (2.9/2.10/2.11) and tested by CI (#8172, @cristicalin)
- Prefer nodelocaldns as dns server over coredns when defined (#7731, @Alvaro-Campesino)
- Python 2.7: revive python2.7 support on EL7, note that this is not properly exercised in CI. (#8192, @cristicalin)
- Remove Terraform 0.14/0.15 support and CI -> Add TF 1.x (#8062, @floryut)
- Support Python 3.10 -
ruamel.yaml.clib
need to be updated to 0.2.4 (#8034, @olivierlemasle) - Update Netchecker to v1.2.2 - now local etcd backend is needed to run (#8074, @cristicalin)
- Update registry template with additional options (security context and proves) and variables (
registry_storage_access_mode
to changes access mode,registry_replica_count
for replicas) (#8198, @zhengtianbao) - [nodelocaldns] add the capability to hot swap nodelocaldns without causing DNS blackholes during the swap (#8100, @cristicalin)
- Add Ingress support to container registry (using new variables
registry_ingress_annotations
,registry_ingress_host
,registry_ingress_tls_secret
) (#8311, @zhengtianbao)
Applications
- [cinder-csi] Add new variable
cinder_csi_rescan_on_resize
to controlrescan-on-resize
option (#8057, @reneluria) - [cinder-csi] Added variable
cinder_tolerations
that sets tolerations for cinder-csi-nodeplugin DaemonSet (no tolerations by default) (#8137, @Ajarmar) - [cinder-csi] Update version to support Kubernetes 1.22 and up (#8296, @StevenReitsma)
- [Metallb] Allow changing metallb default pool name (var
metallb_pool_name
) (#8111, @damjanek) - [Metallb] Allow setting 'auto-assign' property to 'false' for default IP pool (var
matallb_auto_assign
) (#8193, @IKRozhkov) - [Openstack] Fix a bug where Openstack cloud provider could not be used with username/password (#8021, @bl0m1)
- [Openstack] Replaces the global
use_server_groups
with the option to enable and set server group policy for each of the master, etcd, and node server groups respectively. (#8046, @OlleLarsson) (see Notes 2) - [Openstack] Adds the option to set boot volume type for k8s nodes (using
node_volume_type
variable) (#8256, @robinAwallace) - [Openstack] Use a pre-existing floating IP for bastion node, instead of creating a new one. (#8214, @feber)
- [nginx-ingress] Nginx controller now also watch kind:ingress without class (#8128, @LuckySB)
- [vSphere-CSI] Update to 2.4.0 (#8295, @cristicalin)
- [vSphere] Terraform code now documents and requires specification of the OVF template to use and separate specification of the netmask to use. (#8178, @llarsson)
Network
- [Calico] Add support for BGPPeer sourceAddress (#8306, @kakkotetsu)
- [Calico] Reduced calico bird route removal time on large clusters to less than one minute improving Kubernetes node removal performance (#8227, @khatrig)
- [Calico] Bump 3.21.x to 3.21.2 (#8275, @cristicalin)
- [Calico] Add support for container ip forwarding setting, using new variable
calico_allow_ip_forwarding
(#8184, @zhengtianbao) - [Calico] Add vxlanEnabled spec in FelixConfiguration to prevent calico network (when using vxlan) from crashing after upgrading the cluster (#8167, @devinjeon)
- [Calico] Check if 'plugins' key exists in
calico_cni_config
object allowing user to add nodes using both playbooks (#7717, @dlouks) - [Calico] Fix Kube-bench security warnings on calico controller (file ownership/permissions) (#8072, @oomichi)
- [Calico] Fix typha prometheus causing a deployment error (#8005, @ericlake)
- [Calico] Increase CPU limit to prevent throttling (#8076, @olevitt)
- [Calico] Increase node probe timeouts and add
calico_node_readinessprobe_timeout
/calico_node_livenessprobe_timeout
to tune them (#7981, @cristicalin) - [Calico] Make
calico_min_version
check relevant (#7939, @cristicalin) - [Calico] Make calico 3.20.x the default release and drop support for calico 3.17.x (#7984, @cristicalin)
- [Calico] When default pool already exists and
calico_pool_blocksize
is defined in inventory, the assertion on blocksize equality wrongly fails because a string cast is missing (#8321, @emiran-orange) - [Cilium] During upgrades, wait for cilium pod to be ready before uncordoning node, add new option
upgrade_post_cilium_wait_timeout
to control that (By default 120 seconds) (#7978, @reneluria) - [Cilium] Fix operator metrics activation (
enable-metrics
key missing) (#8000, @L3o-pold) - [Weave] Allow EXTRA_ARGS to be configured for weave-npc, using
weave_npc_extra_args
(#8140, @brainfair) - [Weave] Update template to match upstream (#8013, @frankfil)
- [ovn4nfv] Move crd API to v1, update crd spec (#8006, @floryut)
Container-Managers
- Container engine is no longer installed on separate etcd nodes when using
etcd_deployment_type: host
(#7532, @VannTen) - [Docker] When using
containerd_manager==docker
(default config) you will now need to usedocker_containerd_version
to change the containerd version instead of the establishedcontainerd_version
(#8130, @cristicalin) - [Kata-Containers] Update versions 2.2.0 (new default) and 2.1.1 (bugfix replacing 2.1.0). (#8017, @cristicalin)
- [Kata-Containers] add support for version 2.3.0 (needs kubernetes 1.22.0+) (#8276, @cristicalin)
- [containerd] Add the hashes for containerd version 1.4.12 and 1.5.8 and makes 1.5.8 the new default. (#8239, @cristicalin)
- [containerd] upgrade versions 1.4.11 and 1.5.7 and make 1.4.11 the default (#8129, @cristicalin)
- [containerd] Add support for SuSE distributions (#8261, @cristicalin)
- [containerd] Download containerd from upstream instead of using distro specific packages (#7970, @cristicalin)
- [containerd] Allow 'stable' and 'edge' ContainerD values on validation (#8020, @electrocucaracha)
- [containerd] Ensure pulling, exporting and importing images for the target platform when dealing with multi-platform images to avoid partial import issues (#8245, @cristicalin)
- [containerd] Fix the usage of cgroupfs with containerd and introduce cgroupsfs specific variables (⚠️
containerd_runtimes
is nowcontainerd_additional_runtimes
) (#8123, @pasqualet) - [containerd] Moved containerd and runc from
/usr/bin
tobin_dir
(defaults to/usr/local/bin
) - Fixing install for FCOS (#8107, @mafn) - [containerd] Switch default resolvconf_mode to host_resolvconf (#8247, @cristicalin)
- [containerd] Insecure registry support (#8298, @Morion-Self)
- [cri-o] Add support for cri-o user namespaces (#8268, @nmasse-itix)
- [cri-o] Enable experimental modules when rpm-ostree version >= 2021.9 (#8202, @zhengtianbao)
- [gVisor] Update gVisor to 20210921 release (#8015, @cristicalin)
- [runc] upgrade to v1.0.3 and add arm64 (#8274, @cristicalin)
Bug or Regression
- Add gather facts to remove-node playbook to prevent issue with os evaluation (#8231, @IKRozhkov)
- Add missing 'stable' and 'edge' keys in
docker_cli_versioned_pkg
dict (#8019, @electrocucaracha) - Add missing proxy settings for subscription-manager in RHEL OS (if http_proxy is defined) (#8012, @oomichi)
- Change dns upstream condition for coredns (use upstream dns even whern
resolveconf_mode
is set todocker_dns
) (#8263, @toplordsaito) - Change etcd-events listen port (2381 -> 2383) to avoid conflicts (#8232, @zhengtianbao)
- DeprecationWarning occurs when indentfirst=None is specified in
coredns-config.yml.j2
(#8224, @Ishizuka427) - Fix CentOS7 issue with allowPrivilegeEscalation value from metrics-server (#8014, @oomichi)
- Fix Heketi deployment logic that was broken by the ansible 3.4 upgrade (#8118, @cristicalin)
Fix[REVERTED]apiserver_loadbalancer_domain_name
pointing to external LB instead of dbip (#8299, @singeleaf)- Fix a conflict with containerd and podman under CentOS 8.x (remove podman when installing Docker/Containerd) (#8016, @panpan0000)
- Fix bad indentation in cert-manager when trusted internal ca is defined (#8314, @infra-monkey)
- Fix calico's inventory check (Check if inventory match current cluster configuration) conversion (#8120, @juliohm1978)
- Fix cert_manager ClusterIssuer manifest by removing deprecated ClusterIssuer (#8064, @rtsp)
- Fix cloud_provider check in preinstall task, allowing
oci
value (and removing deprecated ones) (#8164, @oomichi) - Fix containerd failed to start if apparmor is not installed (#8011, @rtsp)
- Fix debian 9 check for apt cache update in bootstrap-os (#8215, @floryut)
- Fix deploying loadbalancer to masters when bind-address is not set to 0.0.0.0 (and
loadbalancer_apiserver_localhost
istrue
) (#8262, @Bledai) - Fix forgotten update of etcd-servers list in apiserver manifest when scaling (#8253, @liupeng0518)
- Fix k8s-certs-renew cp path wrongly using
/usr/bin/
(#7992, @lazybetrayer) - Fix k8scsi/csi-resizer repo (from gcr to quay) (#8270, @oomichi)
- Fix kata-containers runtime with version 2.x (#8068, @cristicalin)
- Fix kubespray flatcar ansible_os_family and ansible_distribution for backward compatibility (#8029, @isantospardo)
- Fix quorum check when recovering broken etcd cluster (with etcd 3.5.x) (#8126, @floryut)
- Fix reset playbook for Fedora OS (#8205, @cristicalin)
- Fix wrong baseurl for centos extra repo for Oracle Linux (missing
/os/
) (#8208, @buker) - Fixes incongruence between metrics-server resources limits/requests defined in official templates (#8088, @irizzant)
- [Calico] Fix support for version 3.21.x (#8250, @cristicalin)
- [Calico] add missing verbs in ClusterRole (#8136, @krystianmlynek)
- Fix resolved config when nodelocaldns is not enabled (#8351, @liupeng0518)
Other note worthy changes
- Add auto completion for krew addon (#8171, @zhengtianbao)
- Added Ubuntu 21.04 (hirsute) in restart network task (reset role) (#8134, @seungjinyu)
- Limit
kubectl delete node
to k8s nodes and not etcd (#8101, @VannTen) - NetworkManager tasks can now be run with ansible
check_mode
(#8133, @Isakgicu) - Remove comparison of
kubelet_shutdown_grace_period
andkubelet_shutdown_grace_period_critical_pods
(#7993, @cristicalin) (see Notes 1) - Replace deprecated --delete-local-data in pre-remove/pre-upgrade tasks (#8081, @mzaian)
- Replace path_join (in reset role) to support Ansible 2.9 (#8160, @zhengtianbao)
- Update
local-volume-provisioner
image from quay to k8s.gcr (#8054, @foxdalas) - Use
kube_config_dir
for kubeconfig instead of hard path in multiple plays (#7996, @oomichi) - Add Glusterfs daemonset readiness and liveness params (and increase
initial_delay_seconds
to 10 seconds) (#8309, @zemkogabor) - Simplify usage of pre-remove role (#8334, @VannTen)
Component versions:
- Kubernetes v1.22.5
- Etcd 3.5.0
- Docker 20.10
- Containerd 1.5.8
- CRI-O 1.22
- CNI-plugins v1.0.1
- Calico v3.20.3
- Cilium 1.9.11
- Flannel 0.15.1
- Kube-ovn 1.8.1
- Kube-Router 1.3.2
- Multus 3.8
- Weave 2.8.1
- CoreDNS 1.8.0
- Nodelocaldns 1.21.1
- Helm 3.7.1
- Nginx-ingress 1.0.4
- Cert-manager 1.5.4
- Kubernetes Dashboard v2.4.0
Known issues
n/a
Notes
- This PR removes the comparison of kubelet_shutdown_grace_period to kubelet_shutdown_grace_period_critical_pods because ansible cannot do time interval comparisons sanely so we defer to the better judgement of the deployer.
- The terraform variable
use_server_groups
is no more, please usemaster_server_group_policy
/node_server_group_policy
andetcd_server_group_policy