percona/percona-server-mongodb-operator v1.21.0 on GitHub

Release Highlights

This release of Percona Operator for MongoDB includes the following new features and improvements:

Percona Server for MongoDB 8.0 is now the default version

For you to enjoy all features and improvements that come with the latest major version out of the box, the Operator now deploys the cluster with Percona Server for MongoDB 8.0 by default. You can always change the version to your desired one for the installation and update. Check the list of Percona certified images for the database versions available for this release. For previous Operator versions, learn how to query the Version Service and retrieve the available images from it.

PMM3 support

The Operator is natively integrated with PMM 3, enabling you to monitor the health and performance of your Percona Distribution for MongoDB deployment and at the same time enjoy enhanced performance, new features, and improved security that PMM 3 provides.

Note that the Operator supports both PMM2 and PMM3. The decision on what PMM version is used depends on the authentication method you provide in the Operator configuration: PMM2 uses API keys while PMM3 uses service account tokens. If the Operator configuration contains both authentication methods with non-empty values, PMM3 takes the priority.

To use PMM, ensure that the PMM client image is compatible with the PMM Server version. Check Percona certified images for the correct client image.

For how to configure monitoring with PMM see the documentation.

Hidden nodes support

In addition to arbiters and non-voting nodes, you can now deploy hidden nodes in your Percona Server for MongoDB cluster. These nodes hold a full copy of the data but remain invisible to client applications. They are good for tasks like backups and reporting, since they access the data without affecting normal traffic.

Hidden nodes are added as voting members and can participate in primary elections. Therefore, the Operator enforces rules to ensure the number of voting members is odd and doesn't exceed seven, which is the maximum allowed number of voting members:

If the total number of voting members is even, the Operator converts one node to non-voting to maintain an odd number of voters. The node to convert is typically the last Pod in the list.
If the number of voting members is odd and not more than 7, all nodes participate in elections.
If the number of voting members exceeds 7, the Operator automatically converts some nodes to non-voting to stay within MongoDB’s limit.

To inspect the current configuration, connect to the cluster with the clusterAdmin privileges and run the rs.config().members command.

Support for Google Cloud Client library in PBM

The Operator comes with the latest PBM version 2.11.0, which includes the support of Google Cloud Client library and authentication with service account keys.

To use Google Cloud Storage for backups with service account keys, you need to do the following:

Create a service account key
Create a Secrets object with this key
Configure the storage in the Custom Resource

See the Configure Google Cloud Storage documentation for detailed steps.

The configuration of Google Cloud Storage with HMAC keys remains unchanged.
However, PBM has a known issue for using HMAC keys with GCS, which was
reported in PBM-1605. The issue is in uploading large files (~512MB+) to the storage when the network is unstable. Such backups may be corrupted or incomplete but they are incorrectly treated as valid backups and pose a risk of restore failures. Therefore, we recommend migrating to the native GCS connection type with service account (JSON) keys after the upgrade.

Improve operational resilience and observability with persistent cluster-level logging for MongoDB Pods

Debugging distributed systems just got easier. The Percona Operator for MongoDB now supports cluster-level logging, ensuring that logs from your mongod instances are stored persistently, even across Pod restarts.

Cluster-level logging is done with Fluent Bit, running as a sidecar container within each database Pods.

Currently, logs are collected only for the mongod instances. All other logs are ephemeral, meaning they will not persist after a Pod restart. Logs are stored for 7 days and are rotated afterwards.

Learn more about cluster-level logging in the documentation

Improved backup retention for streamlined management of scheduled backups in cloud storage

A new backup retention configuration gives you more control over how backups are managed in storage and retained in Kubernetes.

With the deleteFromStorage flag, you can disable automatic deletion from AWS S3 or Azure Blob storage and instead rely on native cloud lifecycle policies. This makes backup cleanup more efficient and better aligned with flexible storage strategies.

The legacy keep option is now deprecated and mapped to the new retention block for compatibility. We encourage you to start using the backup.tasks.retention configuration:

spec:
  backup:
    tasks:
      - name: daily-s3-us-west
        enabled: true
        schedule: "0 0 ** *"
        retention:
          count: 3
          type: count
          deleteFromStorage: true
        storageName: s3-us-west
        compressionType: gzip
        compressionLevel: 6

Improve operational efficiency with the support for concurrent cluster reconciliation

Reconciliation is a Kubernetes mechanism to keep your cluster in sync with its desired state. Previously, the Operator ran only one reconciliation loop at a time. This sequential processing meant that other clusters managed by the same Operator had to wait for the current reconciliation to complete before receiving updates.

With this release, the Operator supports concurrent reconciling and can process several clusters simultaneously. You can define the maximum number of concurrent reconciles as the environment variable for the Operator deployment.

This enhancement significantly improves scalability and responsiveness, especially in multi-cluster environments.

Added labels to identify the version of the Operator

Custom Resource Definition (CRD) is compatible with the last three Operator versions. To know which Operator version is attached to it, we've added labels to all Custom Resource Definitions. The labels help you identify the current Operator version and decide if you need to update the CRD.

To view the labels, run:

$ kubectl get crd perconaservermongodbs.psmdb.percona.com --show-labels

View backup size

You can now see the size of each backup when viewing the backup list either via the command line or from Everest or other apps integrated with the Operator. This improvement makes it easier to monitor storage usage and manage your backups efficiently.

Delegate PVC resizing to an external autoscaler

You can now configure the Operator to use an external storage autoscaler instead of its own resizing logic. This ability may be useful for organizations needing centralized, advanced, or cross-application scaling policies.

To use an external autoscaler, set the spec.enableExternalVolumeAutoscaling option to true in the Custom Resource manifest.

Deprecation, rename and removal

The backup.schedule.keep field is deprecated and will be removed in future releases. We recommend using the backup.schedule.retention instead as follows:

schedule:
  - name: "sat-night-backup"
    schedule: "0 0 ** 6"
    retention:
      count: 3
      type: count
      deleteFromStorage: true
    storageName: s3-us-west

The S3-compatible implementation of Google Cloud Storage (GCS) with using HMAC keys is deprecated in the Operator. We encourage you to switch to using to the native GCS connection type with service account (JSON) keys after the upgrade.

Changelog

New features

K8SPSMDB-297: Added cluster-wide logging with the Fluent Bit log collector
K8SPSMDB-1268 - Added support for PMM v3.
K8SPSMDB-723 - Added the ability to add hidden members to MongoDB replica sets for specialized purposes.

Improvements

K8SPSMDB-1072 - Added the ability to configure retention policy for scheduled backups

K8SPSMDB-1216 - Updated the command to describe the mongod instance role to db.hello(), which is the currently used one.

K8SPSMDB-1243 - Added the ability to pass PBM restore configuration options to the Operator.
K8SPSMDB-1261 - Improved the test suite for physical backups to run on every supported platform individually.

K8SPSMDB-1262 - Improved the test suite foron demand backups to run on OpenShift

K8SPSMDB-1272 - The helm upgrade command now displays warnings to clarify when CRDs are not updated.
K8SPSMDB-1284 - Clearer error messages are now displayed if a filesystem backup deletion fails.
K8SPSMDB-1285 - CRDs now include labels that make it easy to identify their associated Operator version.
K8SPSMDB-1304 - Added labels recommended by Kubernetes to the Operator deployment object
K8SPSMDB-1318 - Added the ability to configure concurrent reconciles to speed up cluster reconciliation in setups where the Operator manages several database clusters.
K8SPSMDB-1319 - Scheduled database backups now wait for the database to be healthy before starting, preventing unnecessary failures.

k8spsmdb-1339 - Added validation for the selected restore time, preventing the point-in-time restore process from starting with an invalid date or time.
K8SPSMDB-1344, K8SPSMDB-871 - Added the ability to retrieve and store the backup size

K8SPSMDB-1398 - Added the ability to configure the use of an external autoscaler (Thank you Terry for contribution)
K8SPSMDB-1412 - Added the support for Google Cloud Storage with authentication via service account keys.

Fixed bugs

K8SPSMDB-1154 - MongoDB clusters using the inMemory storage engine now deploy correctly (Thank you user KOS for reporting this issue).
K8SPSMDB-1292 - Fixed the issue with physical restores failing when TLS configuration is defined by using it to construct the correct MongoDB connection string URL.
K8SPSMDB-1297 - Exposed the data directory for the pmm-client sidecar container to enable it to gather required metrics.

K8SPSMDB-1308 - Improved PBM restore logging to store logs for the latest restore in the /data/db/pbm-restore-logs.

K8SPSMDB-1336 - Logical backups can now be restored to a new cluster without encountering Time monotonicity violation errors or service restarts.
K8SPSMDB-1371 - Physical point-in-time recovery using the latest type no longer crashes but gracefully fails the restore process when oplog data is unavailable.
K8SPSMDB-1400 - Resolved an issue that caused physical restores to fail on AKS and EKS environments.

K8SPSMDB-1425 - Restoring a MongoDB cluster with point-in-time recovery now succeeds even when source and target storage prefixes differ.

K8SPSMDB-1480 - Fixed an issue that caused cluster errors when scaling replica sets resulted in an invalid number of voting members.

Documentation improvements

The multi-cluster and multi-region deployment section has been improved and expanded with the information about multi-cluster deployment and its value as well as how it works. It provides improved guidance on multi-cluster services, a step-by-step tutorial for enabling multi-cluster deployments on GKE, and revised instructions for deploying and interconnecting sites for replication. The docs also walk you through planned switchover and controlled failover procedures in disaster scenarios.
Updated the Scale Percona Server for MongoDB on Kubernetes topic with the information about the pvc-resize-in-progress annotation and how it works.
Updated the Configure backup storage with the Google Cloud Storage configuration.
Configuration for config server split horizons is now accurately documented, simplifying multi-cluster deployments and external DNS integration.
The Data-at-rest encryption topic is updated with the correct steps for using HashiCorp Vault.
New documentation is available detailing important considerations for upgrading your Kubernetes cluster before updating any Operator.

Supported software

The Operator was developed and tested with the following software:

Percona Server for MongoDB 6.0.25-20, 7.0.24-13, and 8.0.12-4.
Percona Backup for MongoDB 2.11.0.
PMM Client: 3.4.1
LogCollector based on fluent-bit 4.0.1

Other options may also work but have not been tested.

Supported platforms

Percona Operators are designed for compatibility with all CNCF-certified Kubernetes distributions. Our release process includes targeted testing and validation on major cloud provider platforms and OpenShift, as detailed below for Operator version {{release}}:

Google Kubernetes Engine (GKE) 1.31-1.33
Amazon Elastic Container Service for Kubernetes (EKS) 1.31-1.34
OpenShift Container Platform 4.16 - 4.19
Azure Kubernetes Service (AKS) 1.31-1.33
Minikube 1.37.0 based on Kubernetes 1.34.0

This list only includes the platforms that the Percona Operators are specifically tested on as part of the release process. Other Kubernetes flavors and versions depend on the backward compatibility offered by Kubernetes itself.