Important
- With the user-authz module enabled, the permission model for
admin.confhas been changed. Instead of binding to the built-incluster-adminrole, DKP now uses its managed RBAC model with more granular permissions. This improves cluster access security. Emergency access viasuper-admin.confis still available. If your cluster uses third-party Kubernetes operators (not part of DKP), make sure to explicitly grant permissions for their CRDs; otherwise, access to those resources will be restricted. - The default version of the Ingress NGINX Controller has been updated from 1.10 to 1.12. Controllers using the default version will be upgraded automatically.
- The fencing-agent component of the node-manager module now uses a gossip-based protocol (
memberlistlibrary) for distributed node health monitoring. This reduces the risk of false worker node reboots when the control plane is unavailable or the API server is under heavy load. - Fixed an issue where cluster-autoscaler could get stuck after a failed machine creation in the cloud (for errors other than
ResourceExhausted). The cluster now automatically recovers its ability to scale after temporary cloud provider failures.
Deckhouse subsystem
- Added support for deploying a DKP cluster using the Proxy mode for accessing the container registry.
kubectlon cluster nodes has been replaced with thed8 kalias (Deckhouse CLI utility).- Changes in dhctl (CLI installer):
- Added support for running
dhctlas a standalone binary without downloading the full image with all dependencies. Required components are pulled from the registry during execution, reducing startup time. - Added validation of public SSH keys. Invalid keys are now detected earlier, before they can cause connection issues.
- Improved handling of TLS certificates for registries. The system certificate pool is now used together with a custom CA.
- Added validation of
InstanceClassagainst the selected cloud provider before installation, allowing early detection and installation abort in case of misconfiguration. - Fixed namespace configuration during bootstrap.
dhctlnow updates settings even if the namespace already exists.
- Added support for running
- Improved multi-master resiliency. Kubernetes versions are now correctly detected even when some master nodes are unavailable.
kube-apiserverno longer routes requests toetcdnodes in the learner state.
Kubernetes & Scheduling subsystem
- The
d8-cluster-kubernetesConfigMap now includes supported Kubernetes versions (supportedVersions), versions available for upgrade and downgrade (availableVersions), and the current automatic version (automaticVersion). - Changes in the descheduler module:
- Upgraded to version 0.35.1. Added namespace filtering via label selectors and other improvements (see module changelog for details).
- Added the
deschedulingIntervalparameter to control how often descheduling runs:Frequent,Moderate, orRare. - Added the
RemovePodsHavingTooManyRestartsstrategy, which evicts pods exceeding a restart threshold, freeing node resources from pods inCrashLoopBackOffstate. - The module now automatically uses the Kubernetes Metrics API when the
metrics.k8s.ioAPI group is available in the cluster. This provides real resource consumption data for strategies to use.
- The vertical-pod-autoscaler module updated to version 1.6.0. Key changes:
InPlaceOrRecreatemode is now used by default. In this mode, VPA attempts to adjust pod requests and limits without recreating pods (in-place updates).- Reduced likelihood of pod restarts when in-place updates are not possible (via
--in-place-skip-disruption-budgetflag); alsominReplicasvalue is no longer considered for in-place updates.
- VPA memory recommendations are now rounded to 64 MiB, improving readability and reducing unnecessary pod restarts caused by minor recommendation fluctuations.
- For Kubernetes 1.34 and later, the
DRAExtendedResourcefeature gate is enabled by default. - Added the
D8EtcdHighRequestsLatencyalert to detect high etcd request latency, which can lead to API server timeouts. A corresponding metric has been added to the Control Plane Status dashboard.
IAM subsystem
- Changes in the user-authn module:
- Disabled the insecure OAuth 2.0 Implicit Flow. In that flow, access tokens are passed in URL fragments, posing leakage risks (for example, via the
Refererheader, browser history, or logs). Any integrations using the Authorization Code Flow are unaffected. - Added SAML 2.0 support via the
DexProviderresource. DKP clusters can now be connected to such identity providers as AD FS, Okta, Keycloak, OneLogin, and Shibboleth. - Added refresh token support for SAML in Dex. This allows
DexAuthenticatorandkubeconfig-generatorto refresh tokens without a manual user login. - Added Single Logout (SLO) support. The identity provider can end user sessions via
/saml/slo/{connector}. - Fixed stale cache issue on the login page by adding a Dex build-based cache parameter to the CSS URL.
- Disabled the insecure OAuth 2.0 Implicit Flow. In that flow, access tokens are passed in URL fragments, posing leakage risks (for example, via the
- In multitenancy mode, users without permissions defined via
ClusterAuthorizationRuleno longer see all namespaces in the Deckhouse web UI. The default behavior is now deny-by-default: users get an empty namespace list unless access is explicitly granted. This does not affect privileged groups (system:masters,kubeadm:cluster-admins). Documentation of the user-authz module now includes an example for granting access to all namespaces. - Added support for the service-with-health-checks module in isolated projects. A
NetworkPolicyexception for thed8-service-with-healthchecksnamespace has been added to the project template, which allows module agents to verify service availability and correctly operate under conditions of network isolation.
Security subsystem
- Gatekeeper upgraded to 3.22.0 and Ratify to 1.4.0. Key changes include a unified CEL and Rego policy behavior, added namespace context support in policies, and a CLI tool for policy benchmarking.
- The
denyVulnerableImagessetting has been moved from admission-policy-engine to operator-trivy, making it the single owner of this logic. - The cert-manager module upgraded to 1.20.0. Key changes include added support for Azure DNS Private Zones in the DNS01 challenge, added annotation
acme.cert-manager.io/http01-ingress-ingressclassnamefor redefining IngressClasses in HTTP-01 solvers, and security fixes. - Updated vulnerable Go dependencies in multitenancy-manager (
go-jose,golang.org/x/crypto,grpc, etc.).
Cluster & Infrastructure subsystem
- Changes in the node-manager module:
- Improved draining hook event generation for better observability of the node eviction process.
- Fixed
StaticInstanceissue causing infinite infrastructure deletion loops. That prevents endless cleanup and queueing cycles. Apiserver-proxyrewritten from NGINX to native Go implementation for better stability and reconnection speed.
- Changes in the cloud-provider-dvp module:
- Added hybrid cluster support. Static master nodes and cloud worker nodes can now be used in a single cluster.
- Improved converge reliability. Reworked the mechanism of resource readiness check and added the fail-fast behavior for possible errors when creating the infrastructure.
- Added default StorageClass inheritance for Cluster API child clusters.
- Improved VM lifecycle handling, reducing potential problems when deleting VMs and associated objects.
- Added support for the
masterNodeGroup.instanceClass.additionalDisksparameter that lets you configure additional disks for master nodes.
- Changes in the cloud-provider-aws module:
- Spot node draining logic was moved into DKP, making the spot node handling more unified.
- Fixed handling of regions without
DescribeInstanceTopology, reducing the number of false IAM errors.
- Changes in the cloud-provider-openstack module:
- Added
ipMode: Proxysupport for LoadBalancer in Kubernetes 1.32 and later. - Added the
csiDriver.fsGroupPolicyparameter for the CSI driver, which defines whether the volume supports changing owner and permissions before mounting.
- Added
- Changes in the cloud-provider-vsphere module:
- Enabled the CSI snapshotter, enabling volume snapshotting.
- Fixed zone and datastore detection. Now the only considered zones are the ones specified in the provider configuration.
- In the cloud-provider-azure module, added support for NVMe disks on Gen2 VMs.
- In the cloud-provider-gcp module, added
enableNestedVirtualizationandadditionalDisksparameters toGCPInstanceClass, which let you configure nodes for workloads using virtualization and storage scenarios. - In the cloud-provider-yandex module, fixed converge failure when removing
externalIPAddresses. - DVP, Huawei Cloud, and zVirt providers were migrated to Cluster API v1beta2.
- For Huawei Cloud and VCD providers, added
SecurityPolicyExceptionresources and enabled security policy validations.
Network subsystem
- Added support for handling incoming traffic via Gateway API using the alb module.
- Changes in the cni-cilium module:
- Added conntrack table synchronization for live VM migration.
- Added ICMP (ping) support via ExternalIP for LoadBalancer services with MetalLB.
- Added support for switching between supported CNI plugins in DKP clusters.
- In the node-local-dns module, added
disableIPv6parameter to disable IPv6 DNS resolution.
Component version updates
The following DKP components have been updated:
| Component | Version |
|---|---|
| cert-manager | 1.20.0 |
| descheduler | 0.35.1 |
| etcd | 3.6.10 |
| Gatekeeper | 3.22.0 |
| Ingress NGINX Controller | 1.12 |
| Kubernetes patch versions | 1.33.11, 1.34.7, and 1.35.4 |
| Ratify | 1.4.0 |