Release v2.8.2
After testing and observation in the community, Rancher v2.8.1 is now considered a stable release and also designated as a Rancher Prime release. To learn more about Rancher Prime, visit https://www.rancher.com/products/rancher-platform.
Important: Review the Install/Upgrade notes before upgrading to any Rancher version.
Rancher v2.8.2 is a security release to address the following issues:
Security Fixes for Rancher Vulnerabilities
This release addresses the following Rancher security issues:
- Fixed an issue where the Rancher Audit Log was leaking sensitive data. For more information, see CVE-2023-22649.
- Updated the versions of Norman and API Server to address CVEs CVE-2023-32193 and CVE-2023-32192, respectively, that could lead to an unauthenticated cross-site scripting (XSS) in Rancher's APIs.
- Fixed an issue where users with permissions in custom API groups could manage namespaces in the core API group. For more information, see CVE-2023-32194.
- Bumped
runc
to v1.1.12 to fix CVE-2024-21626, see relatedrunc
security advisory.
For more details, see the Security Advisories and CVEs page in Rancher's documentation page or in Rancher's GitHub repo.
Rancher General
Features and Enhancements
- Rancher now supports Kubernetes v1.27. See #41840.
- Rancher's new Rancher Kubernetes API (RK-API) lets you manage Rancher in the same way you manage Kubernetes. You can now use the Rancher Kubernetes API to interact with Rancher CRDs via Kubernetes tooling. This includes convenient documentation via the kubectl
explain
command. A limited set of our most widely-used CRDs will be available in Rancher v2.8.0. More are on the way. - Custom Global Role configuration is now more flexible. Additional fields are being added to the Global Role CRD, starting with the
inheritedClusterRoles
field, which allows users to give permissions for all downstream clusters with a single GlobalRole and GlobalRoleBinding. This lets you easily define and apply roles tailored to specific use cases, creating a level of flexibility not present in built-in global roles, such as Restricted Admin.
Major Bug Fixes
- Rancher no longer rapidly requests unnecessary upgrade operations from
helm-operation
pods. See #41746. - Importing a generic cluster with
system-default-registry
uses registry images, and the registration.yaml file no longer contains docker.io images when thesystem-default-registry
is defined. See #42048. - Fetching images for ui-extensions from a GitHub URL no longer fails, allowing Rancher to start in air-gapped environments. See #43910 and #43892.
Behavior Changes
- Rancher Compose is no longer supported, and all parts of it are being removed in the v2.8 release line. See #43341.
- Kubernetes v1.23 and v1.24 are no longer supported. Before you upgrade to Rancher v2.8.0, make sure that all clusters are running Kubernetes v1.25 or later. See #42828.
Cluster Provisioning
Features and Enhancements
- The new
imagePolicyPull
global setting allows you to configure the image pull policy formachine-provision-image
. The default value isAlways
, which preserves the default behavior in previous Rancher versions. You can set this value during Rancher initialization, with theCATTLE_MACHINE_PROVISION_IMAGE_PULL_POLICY
environment variable. See #42294.
Major Bug Fixes
- Rancher no longer requires you to force-refresh the UI to accept a web certificate during initial setup. See #7867.
Behavior Changes
- Kontainer Engine v1 (KEv1) provisioning and the respective cluster drivers are now deprecated. KEv1 provided plug-ins for different targets using cluster drivers. The Rancher-maintained cluster drivers for EKS, GKE and AKS have been replaced by the hosted provider drivers, EKS-Operator, GKE-Operator and AKS-Operator. Node drivers are now available for self-managed Kubernetes.
RKE Provisioning
Behavior Changes
- Rancher no longer supports the Amazon Web Services (AWS) in-tree cloud provider for RKE clusters. This is in response to upstream Kubernetes removing the in-tree AWS provider in Kubernetes v1.27. You should instead use the out-of-tree AWS cloud provider for any Rancher-managed clusters running Kubernetes v1.27 or later. See #43175.
- The Weave CNI plugin for RKE v1.27 and later is now deprecated. Weave will be removed in RKE v1.30. See #42730.
Known Issues
-
Scaling up etcd nodes on RKE may fail, with nodes stuck waiting to register with Kubernetes. This causes the cluster to hang. There are two workarounds available, depending on whether the cluster is active or hanging. See #43356.
-
Workaround for active clusters:
- Add one etcd node, wait for the cluster to become active again, then repeat as needed.
-
Workaround for hanging clusters:
- Delete the stuck etcd nodes.
- Find the leader pod:
kubectl -n kube-system get configmap cattle-controller
- Restart the leader pod. This terminates the GRPC goroutine.
- Wait for the stuck nodes to be removed.
- Add one etcd node, wait for the cluster to become active again, then repeat as needed.
-
RKE2 Provisioning
Major Bug Fixes
- You can use IPv6 to provision RKE2 clusters. In previous Rancher versions, attempting to provision a new cluster with IPv6 would fail. See #42411.
- RKE2 clusters no longer become stuck in an
Updating
state after a change to the node pool. Instead of successfully rotating the node pool, the init node couldn't be deleted, which prevented the provisioning process from completing. See #42709.
Behavior Changes
- Rancher no longer supports the Amazon Web Services (AWS) in-tree cloud provider for RKE2 clusters. This is in response to upstream Kubernetes removing the in-tree AWS provider in Kubernetes v1.27. You should instead use the out-of-tree AWS cloud provider for any Rancher-managed clusters running Kubernetes v1.27 or later. See #42749.
- Similar to Rancher v2.7.9, when you upgrade to Rancher v2.8.0 with provisioned RKE2/K3s clusters in an unhealthy state, you may encounter the error message,
implausible joined server for entry
. This requires manually marking the nodes in the cluster with a joined server. See #42856.
Known Issues
- Scaling down etcd node pools on RKE2/K3s machine-provisioned clusters may cause unexpected behavior. To avoid this, define multiple machine pools for etcd nodes, each with a quantity of one. You can then scale down simply by deleting machine pools. As a further mitigation, have a robust backup strategy and store your etcd snapshots in a safe location. Restoring from an etcd snapshot allows you to return the cluster to an operational state if you're affected by unexpected behaviors. See #42582 and #43097.
Rancher App (Global UI)
Features and Enhancements
- The Rancher UI sidebar has been updated, to facilitate switching between clusters and navigating to global areas of the UI. The new sidebar menu includes a filterable search for finding clusters by name. You can customize cluster icons with colors and letters by giving them a badge through the Cluster Dashboard. See #9407.
Behavior Changes
The built-in restricted-admin
role is being deprecated in favor of a more flexible global role configuration, which is now available for different use cases other than only the restricted-admin
. If you want to replicate the permissions given through this role, use the new inheritedClusterRoles
feature to create a custom global role. A custom global role, like the restricted-admin
role, grants permissions on all downstream clusters. See #42462. Given its deprecation, the restricted-admin
role will continue to be included in future builds of Rancher through the v2.8.x and v2.9.x release lines. However, in accordance with the CVSS standard, only security issues scored as critical will be backported and fixed in the restricted-admin
role until it is completely removed from Rancher.
- Reverse DNS server functionality has been removed. The associated
rancher/rdns-server
repository is now archived. Reverse DNS is already disabled by default. - The Rancher CLI configuration file
~/.rancher/cli2.json
previously had permissions set to0644
. Although0644
would usually indicate that all users have read access to the file, the parent directory would block users' access. New Rancher CLI configuration files will only be readable by the owner (0600
). Invoking the CLI will trigger a warning, in case old configuration files are world-readable or group-readable. See #42838.
Known Issues
Legacy Custom Banner settings can cause the top left Menu to disappear. See #10140.
As a workaround:
- Go to
<rancher url>/v3/settings/ui-banners
. - Press
Edit
, change the value to{}
, pressShow Request
and thenSend Request
(Note - this will remove custom banners). - Go back to the UI and refresh the browser.
- The nav should be back, and re-applying custom banners should not result in the bug again.
Role-Based Access Control (RBAC) Framework
Features and Enhancements
- A new optional field is now available in Global Roles. The
inheritedClusterRoles
field grants cluster-level permissions on all downstream clusters and lets you create custom global roles. See #42213.
Pod Security Standards (PSS) & Pod Security Admissions (PSA)
Known Issues
- After an upgrade from Rancher v2.7.2 - v2.7.6, Rancher doesn't update the PSA configuration template,
rancher-restricted
, to includecattle-provisioning-capi-system
andcattle-fleet-local-system
under theexemptions.namespaces
list. As a workaround, manually updaterancher-restricted
to addcattle-provisioning-capi-system
andcattle-fleet-local-system
under theexemptions.namespaces
list. See #43150.
Security
Features and Enhancements
- TLS v1.3 is now supported for Rancher app ingresses. See #42027.
Behavior Changes
- TLS v1.0 and v1.1 are no longer supported for Rancher app ingresses. See #42027.
Authentication
Major Bug Fixes
- Changing your vSphere credentials, such as your account password, no longer causes node deletion to fail. See #40608.
Behavior Changes
- The
kubeconfig-token-ttl-minutes
setting has been replaced by the setting,kubeconfig-default-token-ttl-minutes
, and is no longer available in the UI. See #38535. - API tokens now have default time periods after which they expire. Authentication tokens expire after 90 days, while kubeconfig tokens expire after 30 days. See #41919.
Rancher Webhook
Features and Enhancements
- You can now configure the webhook to authenticate requests. This ensures that requests come from the
kube-apiserver
and not from illegitimate sources. By default, the webhook can accept traffic from any source. See Hardening the Rancher Webhook for advice on using authentication or network policy resources to protect your clusters.
Behavior Changes
- Rancher's webhook now honors the
bind
andescalate
verbs for GlobalRoles. Users who have*
set on GlobalRoles will now have both of these verbs, and could potentially use them to escalate privileges in Rancher v2.8.0 and later. You should review current custom GlobalRoles, especially cases wherebind
,escalate
, or*
are granted, before you upgrade.
Continuous Delivery
Features and Enhancements
- All clusters can now be assigned to Fleet workspaces even if they aren't running RKE, via setting the experimental
provisioningv2-fleet-workspace-back-population
feature. See #36132.
UI Plug-in Operator
Features and Enhancements
- The UI plugin operator can be installed in air-gapped mode. See #42092.
Apps & Marketplace
Major Bug Fixes
- Helm repositories will successfully access the Internet even if the Rancher chart contains
--set useBundledSystemChart=true
. See #39532.
Behavior Changes
-
Legacy code for the following v1 charts is no longer available in the
rancher/system-charts
repository:rancher-cis-benchmark
rancher-gatekeeper-operator
rancher-istio
rancher-logging
rancher-monitoring
The code for these charts will remain available for previous versions of Rancher.
-
Helm v2 support is deprecated as of the Rancher v2.7 line and will be removed in Rancher v2.9.
OPA Gatekeeper
Behavior Changes
- OPA Gatekeeper is now deprecated and will be removed in a future release. As a replacement for OPA Gatekeeper, consider switching to Kubewarden. See #42627.
Monitoring
Major Bug Fixes
- Some components in Rancher Monitoring no longer ignore
SystemDefaultRegistry
, allowing for successful setup in air-gapped environments. See #44006.
Known Issues
-
Read-only project permissions and the View Monitoring role aren't sufficient to view links on the Monitoring index page. Users won't be able to see monitoring links. As a workaround, you can perform the following steps:
- If you haven't already, install Monitoring on the project.
- Move the
cattle-monitoring-system
namespace into the project. - Grant project users the View Monitoring (
monitoring-ui-view
) role, andread-only
or higher permissions on at least one project in the cluster.
See #4466.
Install/Upgrade Notes
- If you're installing Rancher for the first time, your environment must fulfill the installation requirements.
Upgrade Requirements
- Creating backups: Create a backup before you upgrade Rancher. To roll back Rancher after an upgrade, you must first back up and restore Rancher to the previous Rancher version. Because Rancher will be restored to the same state as when the backup was created, any changes post-upgrade will not be included after the restore.
- CNI requirements:
- For Kubernetes v1.19 and later, disable firewalld as it's incompatible with various CNI plugins. See #28840.
- When upgrading or installing a Linux distribution which uses nf_tables as the backend packet filter, such as SLES 15, RHEL 8, Ubuntu 20.10, Debian 10, or later, upgrade to RKE v1.19.2 or later to get Flannel v0.13.0. Flannel v0.13.0 supports nf_tables. See Flannel #1317.
- Requirements for air gapped environments:
- When using a proxy in front of an air-gapped Rancher instance, you must pass additional parameters to
NO_PROXY
. See the documentation and issue #2725. - When installing Rancher with Docker in an air-gapped environment, you must supply a custom
registries.yaml
file to thedocker run
command, as shown in the K3s documentation. If the registry has certificates, then you'll also need to supply those. See #28969.
- When using a proxy in front of an air-gapped Rancher instance, you must pass additional parameters to
- Requirements for general Docker installs:
- When starting the Rancher Docker container, you must use the
privileged
flag. See documentation. - When upgrading a Docker installation, a panic may occur in the container, which causes it to restart. After restarting, the container will come up and work as expected. See #33685.
- When starting the Rancher Docker container, you must use the
Versions
Please refer to the README for the latest and stable Rancher versions.
Please review our version documentation for more details on versioning and tagging conventions.
Images
- rancher/rancher:v2.8.2
Tools
Kubernetes Versions
- v1.27.8 (Default)
- v1.26.11
- v1.25.16
Rancher Helm Chart Versions
In Rancher v2.6.0 and later, in the Apps & Marketplace UI, many Rancher Helm charts are named with a major version that starts with 100. This avoids simultaneous upstream changes and Rancher changes from causing conflicting version increments. This also complies with semantic versioning (SemVer), which is a requirement for Helm. You can see the upstream version number of a chart in the build metadata, for example: 100.0.0+up2.1.0
. See #32294.
Other Notes
Experimental Features
Dual-stack and IPv6-only support for RKE1 clusters using the Flannel CNI has been experimental since v1.23.x. See the upstream Kubernetes docs. Dual-stack is not currently supported on Windows. See #165.
Deprecated Upstream Projects
In June 2023, Microsoft deprecated the Azure AD Graph API that Rancher had been using for authentication via Azure AD. When updating Rancher, update the configuration to make sure that users can still use Rancher with Azure AD. See the documentation and issue #29306 for details.
Removed Legacy Features
Apps functionality in the cluster manager has been deprecated as of the Rancher v2.7 line. This functionality has been replaced by the Apps & Marketplace section of the Rancher UI.
Also, rancher-external-dns
and rancher-global-dns
have been deprecated as of the Rancher v2.7 line.
The following legacy features have been removed as of Rancher v2.7.0. The deprecation and removal of these features was announced in previous releases. See #6864.
UI and Backend
- CIS Scans v1 (Cluster)
- Pipelines (Project)
- Istio v1 (Project)
- Logging v1 (Project)
- RancherD
UI
- Multiclusterapps (Global): Apps within the Multicluster Apps section of the Rancher UI.
Previous Rancher Behavior Changes
Previous Rancher Behavior Changes - Cluster Provisioning
- Rancher v2.7.2:
- When you provision a downstream cluster, the cluster's name must conform to RFC-1123. Previously, characters that did not follow the specification, such as
.
, were permitted and would result in clusters being provisioned without the necessary Fleet components. See #39248. - Privilege escalation is disabled by default when creating deployments from the Rancher API. See #7165.
- When you provision a downstream cluster, the cluster's name must conform to RFC-1123. Previously, characters that did not follow the specification, such as
Previous Rancher Behavior Changes - Cluster API
- Rancher v2.7.7:
- The
cluster-api
core provider controllers run in a pod in thecattle-provisioning-cattle-system
namespace, within the local cluster. These controllers are installed with a Helm chart. Previously, Rancher rancluster-api
controllers in an embedded fashion. This change makes it easier to maintaincluster-api
versioning. See #41094. - The token hashing algorithm generates new tokens using SHA3. Existing tokens that don't use SHA3 won't be re-hashed. This change affects ClusterAuthTokens (the downstream synced version of tokens for ACE) and Tokens (only when token hashing is enabled). SHA3 tokens should work with ACE and Token Hashing. Tokens that don't use SHA3 may not work when ACE and token hashing are used in combination. If, after upgrading to Rancher v2.7.7, you experience issues with ACE while token hashing is enabled, re-generate any applicable tokens. See #42062.
- The
Previous Rancher Behavior Changes - Rancher App (Helm Chart)
- Rancher v2.7.0:
- When installing or upgrading an official Rancher Helm chart app in a RKE2/K3s cluster, if a private registry exists in the cluster configuration, that registry will be used for pulling images. If no cluster-scoped registry is found, the global container registry will be used. A custom default registry can be specified during the Helm chart install and upgrade workflows. Previously, only the global container registry was used when installing or upgrading an official Rancher Helm chart app for RKE2/K3s node driver clusters.
Previous Rancher Behavior Changes - Pod Security Standard (PSS) & Pod Security Admission (PSA)
- Rancher v2.7.2:
- You must manually change the
psp.enabled
value in the chart install yaml when you install or upgrade v102.x.y charts on hardened RKE2 clusters. Instructions for updating the value are available. See #41018.
- You must manually change the
Previous Rancher Behavior Changes - Authentication
- Rancher v2.7.2:
- Rancher might retain resources from a disabled auth provider configuration in the local cluster, even after you configure another auth provider. To manually trigger cleanup for a disabled auth provider, add the
management.cattle.io/auth-provider-cleanup
annotation with theunlocked
value to its auth config. See #40378.
- Rancher might retain resources from a disabled auth provider configuration in the local cluster, even after you configure another auth provider. To manually trigger cleanup for a disabled auth provider, add the
Previous Rancher Behavior Changes - Rancher Webhook
- Rancher v2.7.5:
- Rancher installs the same pinned version of the
rancher-webhook
chart not only in the local cluster but also in all downstream clusters. Restoring Rancher from v2.7.5 to an earlier version will result in downstream clusters' webhooks being at the version set by Rancher v2.7.5, which might cause incompatibility issues. Local and downstream webhook versions need to be in sync. See #41730 and #41917. - The mutating webhook configuration for secrets is no longer active in downstream clusters. See #41613.
- Rancher installs the same pinned version of the
Previous Rancher Behavior Changes - Apps & Marketplace
- Rancher v2.7.0:
- Rancher no longer validates an app registration's permissions to use Microsoft Graph on endpoint updates or initial setup. You should add
Directory.Read.All
permissions of typeApplication
. If you configure a different set of permissions, Rancher may not have sufficient privileges to perform some necessary actions within Azure AD, causing errors.
- Rancher no longer validates an app registration's permissions to use Microsoft Graph on endpoint updates or initial setup. You should add
Previous Rancher Behavior Changes - Feature Charts
- Rancher v2.7.0:
- A configurable
priorityClass
is available in the Rancher pod and its feature charts. Previously, pods critical to running Rancher didn't use a priority class. This could cause a cluster with limited resources to evict Rancher pods before other noncritical pods. See #37927.
- A configurable
Previous Rancher Behavior Changes - Backup/Restore
- Rancher v2.7.7:
- If you use a version of backup-restore older than v102.0.2+up3.1.2 to take a backup of Rancher v2.7.7, the migration will encounter a
capi-webhook
error. Make sure that the chart version used for backups is v102.0.2+up3.1.2, which hascluster.x-k8s.io/v1alpha4
resources removed from the resourceSet. If you can't use v102.0.2+up3.1.2 for backups, delete allcluster.x-k8s.io/v1alpha4
resources from the backup tar before using it. See #382.
- If you use a version of backup-restore older than v102.0.2+up3.1.2 to take a backup of Rancher v2.7.7, the migration will encounter a
Previous Rancher Behavior Changes - Logging
- Rancher v2.7.0:
- Rancher defaults to using the bci-micro image for sidecar audit logging. Previously, the default image was Busybox. See #35587.
Previous Rancher Behavior Changes - Monitoring
- Rancher v2.7.2:
- Rancher maintains a
/v1/counts
endpoint that the UI uses to display resource counts. The UI subscribes to changes to the counts for all resources through a websocket to receive the new counts for resources.- Rancher aggregates the changed counts and only sends a message every 5 seconds. This, in turn, requires the UI to update the counts at most once every 5 seconds, improving UI performance. Previously, Rancher would send a message each time the resource counts changed for a resource type. This lead to the UI needing to constantly stop other areas of processing to update the resource counts. See #36682.
- Rancher now only sends back a count for a resource type if the count has changed from the previously known number, improving UI performance. Previously, each message from this socket would include all counts for every resource type in the cluster, even if the counts only changed for one specific resource type. This would cause the UI to need to re-update resource counts for every resource type at a high frequency, with a significant performance impact. See #36681.
- Rancher maintains a
Previous Rancher Behavior Changes - Project Monitoring
- Rancher v2.7.2:
- The Helm Controller in RKE2/K3s respects the
managedBy
annotation. In its initial release, Project Monitoring V2 required a workaround to sethelmProjectOperator.helmController.enabled: false
, since the Helm Controller operated on a cluster-wide level and ignored themanagedBy
annotation. See #39724.
- The Helm Controller in RKE2/K3s respects the
Long-standing Known Issues
Long-standing Known Issues - Cluster Provisioning
-
Not all cluster tools can be installed on a hardened cluster.
-
Rancher v2.7.2:
- You need to force-refresh the Rancher UI after initial Rancher setup, to trigger the prompt to accept the self-signed certificate. As a workaround, visit the Rancher portal, accept the self-signed certificate, and go through the setup process. Once done, go to the address bar of your browser and click the lock icon. Select the option to allow you to receive certificate errors for the Rancher website. You'll then be prompted again to accept the new certificate. See #7867.
- When you upgrade your Kubernetes cluster, you might see the following error:
Cluster health check failed
. This is a benign error that occurs as part of the upgrade process, and will self-resolve. It's caused by the Kubernetes API server becoming temporarily unavailable as it is being upgraded within your cluster. See #41012. - Once you configure a setting with an environmental variable, it can't be updated through the Rancher API or the UI. It can only be updated through changing the value of the environmental variable. Setting the environmental variable to "" (the empty string) changes the value in the Rancher API but not in Kubernetes. As a workaround, run
kubectl edit setting <setting-name>
, then set the value and source fields to""
, and re-deploy Rancher. See #37998.
-
Rancher 2.6.1:
- When using the Rancher UI to add a new port of type
ClusterIP
to an existing Deployment created using the legacy UI, the new port won't be created upon your first attempt to save the new port. You must repeat the procedure to add the port again. The Service Type field will displayDo not create a service
during the second procedure. Change this toClusterIP
and save to create the new port. See #4280.
- When using the Rancher UI to add a new port of type
Long-standing Known Issues - RKE2 Provisioning
-
Rancher v2.7.6:
-
Rancher v2.6.3:
- When provisioning clusters with an RKE2 cluster template, the
rootSize
for AWS EC2 provisioners doesn't take an integer when it should, and an error is thrown. As a workaround, wrap the EC2rootSize
in quotes. See #40128.
- When provisioning clusters with an RKE2 cluster template, the
Long-standing Known Issues - K3s Provisioning
- Rancher v2.7.6:
Long-standing Known Issues - Hosted Rancher
- Rancher v2.7.5:
- The Cluster page shows the Registration tab when updating or upgrading a hosted cluster. See #8524.
Long-standing Known Issues - Cloud Credentials
- Rancher v2.7.2:
- When enabling some custom node drivers, the Cloud Credential creation page does not show the correct default fields and has an uneditable
foo
key. See #8563.
- When enabling some custom node drivers, the Cloud Credential creation page does not show the correct default fields and has an uneditable
Long-standing Known Issues - Docker Install
-
Rancher v2.5.0:
-
Rancher v2.6.4:
-
In certain cases, no users are listed in Users and Authentication, and selecting Create does not display the entire new user creation form. As a workaround, perform a hard refresh to log back in. See #37531.
-
Single node Rancher won't start on Apple M1 devices with Docker Desktop 4.3.0 or later. See #35930.
-
Rancher v2.6.3:
-
On a Docker install upgrade and rollback, Rancher logs repeatedly display the messages "Updating workload
ingress-nginx/nginx-ingress-controller
" and "Updating servicefrontend
with public endpoints". Ingresses and clusters are functional and active, and logs resolve eventually. See #35798.
-
Long-standing Known Issues - Windows
-
Rancher v2.7.6:
- Downstream Windows clusters get stuck after a Rancher upgrade. Windows nodes become stuck in an unavailable state, with an
failed to list *v1.ConfigMap: configmaps "kube-root-ca.crt" is forbidden
error message. As a workaround, reboot the node. See #42426.
- Downstream Windows clusters get stuck after a Rancher upgrade. Windows nodes become stuck in an unavailable state, with an
-
Rancher v2.6.7:
- CSI Proxy for Windows will now work in an air-gapped environment.
-
Rancher v2.5.8:
- Windows nodeAgents are not deleted when performing a helm upgrade after disabling Windows logging on a Windows cluster. See #32325.
- If you deploy Monitoring V2 on a Windows cluster with
win_prefix_path
set, you must deploy Rancher Wins Upgrader to restart wins on the hosts. This will allow Rancher to start collecting metrics in Prometheus. See #32535.
Long-standing Known Issues - RKE
-
Rancher v2.7.7:
- The SAML authentication pop-up throws a
404
error on high-availability RKE installations. Single node Docker installations aren't affected. If you refresh the browser window and select Resend, the authentication request will succeed, and you will be able to log in. See #31163.
- The SAML authentication pop-up throws a
-
Rancher 2.7.4:
-
RKE clusters may be unable to restore from an etcd snapshot. See #41547.
There is a workaround available to re-start the kubelet container on affected worker nodes. This approach recreates the kubelet container on all nodes, not just the worker nodes. Rancher handles the process to ensure zero downtime, just like for any other modification of the cluster. If you prefer to use the
extra_args
blob, be sure both the arg and its value are valid. See the Kubernetes documentation for available options:- On the Cluster Management page, select Edit config on the target cluster.
- In the "Cluster Options" section, select Edit as YAML.
- Modify any value under the
etra_args
orextra_env
section underrancher_kubernetes_engine_config.services.kubelet
. - Save the change to trigger a cluster upgrade.
It may take a few moments for the upgrade to finish.
- When this problem is triggered, containers from previous pods will become orphaned on downstream nodes. To remove the orphaned nodes, run
docker restart kubelet
on every affected worker node.
-
-
Rancher v2.7.2:
- When running CIS scans on RKE and RKE2 clusters on Kubernetes v1.25, some tests will fail if the
rke-profile-hardened-1.23
or therke2-profile-hardened-1.23
profile is used. These RKE and RKE2 test cases failing is expected as they rely on PSPs, which have been removed in Kubernetes v1.25. See #39851.
- When running CIS scans on RKE and RKE2 clusters on Kubernetes v1.25, some tests will fail if the
Long-standing Known Issues - RKE2
-
Rancher v2.7.7:
-
Due to the backoff logic in various components, downstream provisioned K3s and RKE2 clusters may take longer to re-achieve
Active
status after a migration. If you see that a downstream cluster is still updating or in an error state immediately after a migration, please let it attempt to resolve itself. This might take up to an hour to complete. See #34518 and #42834. -
Rancher v2.7.6:
- Scaling down etcd nodes on K3s/RKE2 machine-provisioned clusters may inadvertently delete all etcd nodes in the pool. This is linked to an upstream cluster-api bug that causes the controllers to delete more than the desired quantity of etcd nodes when reconciling an RKE Machine Pool. This issue affects etcd node scale-down operations on K3s/RKE2 machine-provisioned clusters. To help mitigate the issue, have a robust backup strategy and store your etcd snapshots in a safe location. See #42582.
-
Rancher v2.7.4:
- RKE2 clusters with invalid values for tolerations or affinity agent customizations don't display an error message, and remain in an
Updating
state. This causes cluster creation to hang. See #41606.
- RKE2 clusters with invalid values for tolerations or affinity agent customizations don't display an error message, and remain in an
-
Rancher v2.7.2:
- When running CIS scans on RKE and RKE2 clusters on Kubernetes v1.25, some tests will fail if the
rke-profile-hardened-1.23
or therke2-profile-hardened-1.23
profile is used. These RKE and RKE2 test cases failing is expected as they rely on PSPs, which have been removed in Kubernetes v1.25. See #39851. - When viewing or editing the YAML configuration of downstream RKE2 clusters through the UI,
spec.rkeConfig.machineGlobalConfig.profile
is set tonull
, which is an invalid configuration. See #8480. - Deleting nodes from custom RKE2/K3s clusters in Rancher v2.7.2 can cause unexpected behavior, if the underlying infrastructure isn't thoroughly cleaned. When deleting a custom node from your cluster, ensure that you delete the underlying infrastructure for it, or run the corresponding uninstall script for the Kubernetes distribution installed on the node. See #41034:
- When running CIS scans on RKE and RKE2 clusters on Kubernetes v1.25, some tests will fail if the
-
Rancher v2.6.x:
- Deleting a control plane node results in worker nodes also reconciling. See #39021.
-
Rancher v2.6.4:
- Communication between the ingress controller and the pods doesn't work when you create an RKE2 cluster with Cilium as the CNI and activate project network isolation. See documentation and #34275.
-
Rancher v2.6.0:
- Amazon ECR Private Registries don't work from RKE2/K3s. See #33920.
Long-standing Known Issues - Windows Nodes in RKE2 Clusters
-
Rancher v2.6.5:
- When upgrading Windows nodes in RKE2 clusters via the Rancher UI, Windows worker nodes must reboot after the upgrade completes. See #3376457645.
-
Rancher v2.6.4:
- NodePorts do not work on Windows Server 2022 in RKE2 clusters due to a Windows kernel bug. See #159.
Long-standing Known Issues - K3s
-
Rancher v2.7.7:
-
Due to the backoff logic in various components, downstream provisioned K3s and RKE2 clusters may take longer to re-achieve
Active
status after a migration. If you see that a downstream cluster is still updating or in an error state immediately after a migration, please let it attempt to resolve itself. This might take up to an hour to complete. See #34518 and #42834. -
Rancher v2.7.6:
- Scaling down etcd nodes on K3s/RKE2 machine-provisioned clusters may inadvertently delete all etcd nodes in the pool. This is linked to an upstream cluster-api bug that causes the controllers to delete more than the desired quantity of etcd nodes when reconciling an RKE Machine Pool. This issue affects etcd node scale-down operations on K3s/RKE2 machine-provisioned clusters. To help mitigate the issue, have a robust backup strategy and store your etcd snapshots in a safe location. See #42582.
-
Rancher v2.7.2:
- Clusters remain in an
Updating
state even when they contain nodes in anError
state. See #39164. - Deleting nodes from custom RKE2/K3s clusters in Rancher v2.7.2 can cause unexpected behavior, if the underlying infrastructure isn't thoroughly cleaned. When deleting a custom node from your cluster, ensure that you delete the underlying infrastructure for it, or run the corresponding uninstall script for the Kubernetes distribution installed on the node. See #41034:
- Clusters remain in an
-
Rancher v2.6.0:
Long-standing Known Issues - AKS
-
Rancher v2.7.2:
- Imported Azure Kubernetes Service (AKS) clusters don't display workload level metrics. This bug affects Monitoring V1. A workaround is available. See #4658.
-
Rancher v2.6.x:
- Windows node pools are not currently supported. See #32586.
-
Rancher v2.6.0:
- When editing or upgrading an Azure Kubernetes Service (AKS) cluster, do not make changes from the Azure console or CLI at the same time. These actions must be done separately. See #33561.
Long-standing Known Issues - EKS
- Rancher v2.7.0:
- EKS clusters on Kubernetes v1.21 or below on Rancher v2.7 cannot be upgraded. See #39392.
Long-standing Known Issues - GKE
- Rancher v2.5.8:
- Basic authentication must be explicitly disabled in GCP before upgrading a GKE cluster to Kubernetes v1.19+ in Rancher. See #32312.
Long-standing Known Issues - Pod Security Standard (PSS) & Pod Security Admission (PSA)
- Rancher v2.6.4:
- The deployment's
securityContext
section is missing when a new workload is created. This prevents pods from starting when Pod Security Policy (PSP) support is enabled. See #4815.
- The deployment's
Long-standing Known Issues - Authentication
- Rancher v2.6.2:
- Users on certain LDAP setups don't have permission to search LDAP. When they attempt to perform a search, they receive the error message,
Result Code 32 "No Such Object"
. See #35259.
- Users on certain LDAP setups don't have permission to search LDAP. When they attempt to perform a search, they receive the error message,
Long-standing Known Issues - Encryption
- Rancher v2.5.4:
- Rotating encryption keys with a custom encryption provider is not supported. See #30539.
Long-standing Known Issues - Rancher Webhook
- Rancher v2.7.2:
- A webhook is installed in all downstream clusters. There are several issues that users may encounter with this functionality:
- If you rollback from a version of Rancher v2.7.2 or later, to a Rancher version earlier than v2.7.2, the webhooks will remain in downstream clusters. Since the webhook is designed to be 1:1 compatible with specific versions of Rancher, this can cause unexpected behaviors to occur downstream. The Rancher team has developed a script which should be used after rollback is complete (meaning after a Rancher version earlier than v2.7.2 is running). This removes the webhook from affected downstream clusters. See #40816.
- A webhook is installed in all downstream clusters. There are several issues that users may encounter with this functionality:
Long-standing Known Issues - Harvester
-
Upgrades from Harvester v0.3.0 are not supported.
-
Rancher v2.7.2:
- If you're using Rancher v2.7.2 with Harvester v1.1.1 clusters, you won't be able to select the Harvester cloud provider when deploying or updating guest clusters. The Harvester release notes contain instructions on how to resolve this. See #3750.
-
Rancher v2.6.1:
- Deploying Fleet to Harvester clusters is not yet supported. Clusters, whether Harvester or non-Harvester, imported using the Virtualization Management page will result in the cluster not being listed on the Continuous Delivery page. See #35049.
Long-standing Known Issues - Continuous Delivery
-
Rancher v2.7.6:
- Target customization can produce custom resources that exceed the Rancher API's maximum bundle size. This results in
Request entity too large
errors when attempting to add a GitHub repo. Only target customizations that modify the Helm chart URL or version are affected. As a workaround, use multiple paths or GitHub repos instead of target customization. See #1650. - When updating Rancher, sometimes Fleet is not upgraded to the latest version. See #1590. To ensure you upgrade Fleet:
- Refresh the
rancher-charts
catalog resource. - Go to Apps, select All Workspaces, and note the versions of
fleet-crd
andfleet-chart
. - Click ⋮ and select Upgrade to check if there is a newer Fleet version listed.
- Update the
fleet
chart. Fleet will automatically update the agents.
- Refresh the
- Target customization can produce custom resources that exceed the Rancher API's maximum bundle size. This results in
-
Rancher v2.6.1:
- Deploying Fleet to Harvester clusters is not yet supported. Clusters, whether Harvester or non-Harvester, imported using the Virtualization Management page will result in the cluster not being listed on the Continuous Delivery page. See #35049.
-
Rancher v2.6.0:
- Multiple
fleet-agent
pods may be created and deleted during initial downstream agent deployment, rather than just one. This resolves itself quickly, but is unintentional behavior. See #33293.
- Multiple
Long-standing Known Issues - Apps & Marketplace
- Rancher v2.7.0
- The multi-cluster app legacy feature is no longer available. See #39525.
Long-standing Known Issues - Feature Charts
- Rancher v2.6.5:
- After installing an app from a partner chart repo, the partner chart will upgrade to feature charts if the chart also exists in the feature charts default repo. See #5655.
Long-standing Known Issues - CIS Scan
- Rancher v2.7.2:
- When running CIS scans on RKE and RKE2 clusters on Kubernetes v1.25, some tests will fail if the
rke-profile-hardened-1.23
or therke2-profile-hardened-1.23
profile is used. These RKE and RKE2 test cases failing is expected as they rely on PSPs, which have been removed in Kubernetes v1.25. See #39851.
- When running CIS scans on RKE and RKE2 clusters on Kubernetes v1.25, some tests will fail if the
Long-standing Known Issues - Backup/Restore
-
When migrating to a cluster with the Rancher Backup feature, the server-url cannot be changed to a different location. It must continue to use the same URL.
-
Rancher v2.7.7:
- Due to the backoff logic in various components, downstream provisioned K3s and RKE2 clusters may take longer to re-achieve
Active
status after a migration. If you see that a downstream cluster is still updating or in an error state immediately after a migration, please let it attempt to resolve itself. This might take up to an hour to complete. See #34518 and #42834.
- Due to the backoff logic in various components, downstream provisioned K3s and RKE2 clusters may take longer to re-achieve
-
Rancher v2.6.3:
- Because Kubernetes v1.22 drops the apiVersion
apiextensions.k8s.io/v1beta1
, trying to restore an existing backup file into a v1.22+ cluster will fail. The backup file contains CRDs with the apiVersionv1beta1
. There are two workarounds for this issue: update the defaultresourceSet
to collect the CRDs with the apiVersion v1, or update the defaultresourceSet
and the client to use the new APIs internally. See the documentation and #34154.
- Because Kubernetes v1.22 drops the apiVersion
Long-standing Known Issues - Istio
-
Istio v1.12 and below do not work on Kubernetes v1.23 clusters. To use the Istio charts, please do not update to Kubernetes v1.23 until the next charts' release.
-
Rancher v2.6.4:
- Applications injecting Istio sidecars, fail on SELinux RHEL 8.4 enabled clusters. A temporary workaround for this issue is to run the following command on each cluster node before creating a cluster:
mkdir -p /var/run/istio-cni && semanage fcontext -a -t container_file_t /var/run/istio-cni && restorecon -v /var/run/istio-cni
. See #33291.
- Applications injecting Istio sidecars, fail on SELinux RHEL 8.4 enabled clusters. A temporary workaround for this issue is to run the following command on each cluster node before creating a cluster:
-
Rancher v2.6.1:
- Deprecated resources are not automatically removed and will cause errors during upgrades. Manual steps must be taken to migrate and/or cleanup resources before an upgrade is performed. See #34699.
Long-standing Known Issues - Logging
- Rancher v2.5.8:
- Windows nodeAgents are not deleted when performing a helm upgrade after disabling Windows logging on a Windows cluster. See #32325.
Long-standing Known Issues - Monitoring
- Rancher v2.7.2:
- Imported Azure Kubernetes Service (AKS) clusters don't display workload level metrics. This bug affects Monitoring V1. A workaround is available. See #4658.
Long-standing Known Issues - Project Monitoring
- Rancher v2.5.8:
- If you deploy Monitoring V2 on a Windows cluster with
win_prefix_path
set, you must deploy Rancher Wins Upgrader to restart wins on the hosts. This will allow Rancher to start collecting metrics in Prometheus. See #32535.
- If you deploy Monitoring V2 on a Windows cluster with