We're delighted to announce the release of GitLab Environment Toolkit 3.0.0
!
This being a major release contains numerous large changes and, as always, we recommend reviewing these release notes in full, in particular the Upgrade Notes section before upgrading as well as following the general upgrading advice in the documentation.
-
New Key Features
- AWS refactors - EKS IAM Roles for Service Accounts (IRSA), IMDSv2 and more!
- GCP refactors - Service Accounts and Application Default Credentials
- GCP Cloud SQL support
- gitlab-sshd support
- NFS requirement removal
- Object Storage refactor
- Further Updates and Improvements
-
Upgrade Notes
- Maintenance Downtime Requirements
- AWS EKS IRSA and Kubernetes namespace
- GCP Service Account key cleanup
- GCP Service Account prefix
- NFS Removal & cleanup
- Expected Terraform Changes
-
Breaking Changes
- Grafana Removal
- Elasticsearch Removal
- AWS OpenSearch variable rename
- AWS Internal Load Balancer rebuild
- GKE Backups via Toolbox pod key requirement
- Unintended use of internal NFS Server
- Feedback
ℹ The Toolkit is a collection of opinionated Terraform and Ansible scripts to assist with the deployment of a self managed GitLab environment. It's recommended that users review the Before You Start section before use. Users should have a good working knowledge of Infrastructure management, Terraform, Ansible and GitLab administration as well as be aware that ultimately self managed environments are the responsibility of the user. As such, it's strongly recommended that you independently review the Toolkit in full to ensure it meets your requirements, especially around security or data integrity.
New Key Features
AWS refactors - EKS IAM Roles for Service Accounts (IRSA), IMDSv2 and more!
AWS environments have been fully refactored to use several latest security best practices:
- AWS EKS environments now use the recommended IAM Roles for Service Accounts (IRSA) throughout.
- AWS environments now configure IMDSv2 to be enabled only. IMDSv1 has been disabled.
- Internal Security groups have been refactored to be membership based instead of CIDR based where applicable. Of note the optional Internal Load Balancer has been rebuilt to have SG membership also.
- Cluster Autoscaler now supports scaling node pools to and from zero on EKS Clusters version
1.24
and up.
To enable this for EKS Launch Templates are now used by default as well, which allows for more graceful node upgrades in certain cases and more flexibility in the future.
ℹ Users who are using a different Kubernetes namespace for the GitLab charts deployment will need to follow some additional steps for this upgrade as detailed here in the Upgrade Notes section.
GCP refactors - Service Accounts and Application Default Credentials
- Merge Request(s): !1034
GCP environments have been refactored as follows:
- The default Service Account is no longer used across the stack. Infrastructure that requires a Service Account will now be given a dedicated SA with minimal permissions.
- Authentication to GCP services has been switched to use Application Default Credentials via dedicated Service Accounts. Users no longer need to pass in a Service Account key in most cases - Refer to this section in the Upgrade Notes for more information on how to cleanup old data.
- GCP Buckets now have public access prevention enabled by default.
ℹ Users who are deploying more than one environment to the same GCP project will need to provide a separate service account prefix as detailed here in the Upgrade Notes section.
GCP Cloud SQL support
- Merge Request(s): !1055
- Documentation
Support for Google Cloud SQL has been added for the main and praefect databases!
Note that Geo support will come in a future update.
gitlab-sshd
support
- Merge Request(s): !996 !1076
- Documentation
Support for gitlab-sshd
as the SSH daemon has been added!
Additionally GitLab SSH ports can now be configured fully as required.
NFS requirement removal
- Merge Request(s): !996 !1044
- Documentation
The Toolkit no longer requires a NFS server to be present to propagate config. Users who are already using Object Storage can remove the NFS node as detailed in the Upgrade Notes section.
NFS servers can still be configured optionally for GitLab object data only but it should be noted that Object Storage is recommended.
Object Storage refactor
- Merge Request(s): !996 !1044
- Documentation
It's now possible to fully configure what Object Storage buckets to provision via Terraform and configure via Ansible.
Additionally a new bucket for the CI Secure Files feature has been added by default.
Further Updates and Improvements
- Minimum Ansible version has been updated to
8.x
. !1039 - Minimum Terraform version has been updated to
1.5.x+
. !1089 (thanks @bwilkerson13!) - Terraform AWS provider has been updated to
5.x
. !1057 - Terraform GCP provider has been updated to
4.77+
. !1088 - Custom AWS Security Groups can be configured for the frontend Rails and Monitor nodes. Refer to the documentation for more information. !1073 !1095
- Object Storage proxy download setting is now disabled by default to match the the Linux package (Omnibus). !1030
- Sidekiq can now be deployed separately in setups that have a single node Postgres server. !1046
- Geo replication is now supported from a multi-node Primary site using Patroni to a single-node secondary site without Patroni. !1031 !1032 (thanks @nwestbury!)
- Gitaly node root disk IOPS can now be configured in AWS environments. !1045
- Docker chunksize setting is now configurable for Container Registry. !1038 !1043 (thanks @psodre!)
- Container Registry settings have been fixed on separate Sidekiq nodes. !1068
- Consul updates for the Linux package (Omnibus) are now sequential across nodes as recommended. !1077- The Troubleshooting documentation page has been expanded. !982
- AWS OpenSearch service variables have been renamed from
opensearch_*
toopensearch_service_*
to be more reflective. Refer to the related Upgrade Notes section for more information. !1086 (thanks @bwilkerson13!) - Support for deploying Node Exporter on ARM nodes when required has been added. !1100 (thanks @niskhakova!)
- Support for deploying Grafana has been removed due to licensing restrictions. Refer to the related Breaking Changes section for more information. !1071 !1082 !1083
- Support for deploying Elasticsearch servers has been removed due to licensing restrictions. Refer to the Breaking Changes section for more information. !1048
- Various deprecated code has been removed. !1094
- Various other small updates, improvements and fixes.
Upgrade Notes
Please review this section in full for Upgrade guidance.
Maintenance Downtime Requirements
Being a major release we've opted to make several larger changes here that have been long awaited. Due to limitations on Cloud Providers however, upgrades will result in periods of downtime for environments. As such, we recommend that you upgrade during a maintenance window.
Below is a list of the changes that will result in downtime:
- AWS EKS IRSA refactor requires the rebuild of EKS Node Pools. Downtime will occur when this rebuild is happening. Estimate time: 30 minutes to rebuild and a subsequent Ansible run will be required after (~40 minutes).
- AWS Internal Load Balancer will be recreated to set Security Group membership due to AWS limitations. It's address will unfortunately change as part of this process and this will need to be configured again in Ansible. Estimate time: 5 minutes to rebuild the Load Balancer and a subsequent Ansible run will be required after (~40 minutes).
- GCP Service Account refactor will require nodes to be restarted. Downtime will occur as part of this process. Estimate time: 20 minutes to reconfigure and a subsequent Ansible run will be required after (~40 minutes)
- Numerous under the hood tweaks to configuration such as Security Group rules (AWS) or Object Storage config will apply quickly but may take a small amount of time to propagate full on AWS and GCP's end. Estimate time: 10 minutes (but please not this may vary).
ℹ Ansible runs are not required each time. Only one after Terraform has finished should be required.
AWS EKS IRSA and Kubernetes namespace
With the switch to use EKS IRSA the Kubernetes namespace used by the GitLab charts must now match between Terraform and Ansible. By default this namespace is set to default
.
If you already have a different namespace configure you will need to add configuration in Terraform to ensure this matches with Ansible. Refer to the documentation for more information.
GCP Service Account key cleanup
As part of the GCP Service Account refactor the passing of a Service Account key via the gcp_service_account_host_file
variable in Ansible is no longer required and can be removed.
Key will still be present on machines in existing environments. As a one of task we recommend running the maintenance/gcp_service_account_cleanup.yml
playbook as a one off to remove any old keys.
Finally please note however that there is still one specific case where this is required at this time to enable backups on GKE Cloud Native Hybrid environments. Refer to the documentation for more information.
GCP Service Account prefix
The Toolkit will now create Service Accounts by default in GCP and this should be seamless in the background. However, due to naming restrictions there will be a clash if you are deploying more than one environment in the same GCP project.
As a workaround it's possible to define the prefix used for the created Service Accounts specifically in the Toolkit. Refer to the documentation for more information.
NFS Removal & cleanup
As noted, the Toolkit no longer requires a NFS server to propagate out configuration. As such, you can remove the node in full if you're using Object Storage as recommended. If you are still using NFS for object data though you should keep the node in place accordingly.
If you are looking to remove the NFS server you should also run the maintenance/nfs_cleanup.yml
playbook in Ansible as a one off to clean up old mounting config.
ℹ Users who specifically did not have a NFS node deployed but had the Ansible variable gitlab_object_storage_type
set to nfs
will need to follow additional steps before running cleanup as this becomes a breaking change. Please refer to this Breaking Changes section for more information. Users who were using a separate NFS node or Object Storage already are unaffected and can run cleanup as normal.
Expected Terraform Changes
Below is a list of Terraform changes that are expected as part of the upgrade:
- Various updates to AWS S3 Policies.
- Various AWS Security Group changes. Some of these changes include the switching to newer recommended resources in Terraform.
- EKS nodes being switched to use Launch Templates and updated naming conventions.
- Changes to AWS node metadata to enforce the use of IMDSv2
- AWS Internal Load Balancer being rebuilt with added Security Group membership
- Various GCP Service Accounts being created and attached to resources
- Various updates to GCP Object Storage config
- A new bucket for CI Secure Files added in AWS, GCP and Azure (along with standard access config)
Breaking Changes
We endeavour to keep breaking changes to a minimum but there are some that may apply depending on our setup as detailed below.
Grafana Removal
Grafana has been deprecated and removed in GitLab packages due to Licensing restrictions. As a result support for Grafana has been removed in the Toolkit in kind.
Users will need to switch to a different install method for Grafana as desired to comply with the license.
ℹ Users who are intending to upgrade to 16.3
and have the monitor_enable_deprecated_grafana
enabled should follow the same advice given in the 2.8.6
release notes to avoid failures.
Elasticsearch Removal
Elasticsearch deployment support has been removed in the Toolkit due to Licensing restrictions.
Please note GitLab still supports the use of Elasticsearch as an Advanced Search backend but the Toolkit can't deploy that Elasticsearch backend for you.
Users should explore deploying Elasticsearch directly via a different install method or switch to use OpenSearch, which is now provided by the Toolkit.
AWS OpenSearch variable rename
As noted, the AWS OpenSearch variables in Terraform have been renamed.
Users who have AWS OpenSearch Service deployed will need to rename all opensearch_*
variables over to opensearch_service_*
variables in Terraform.
AWS Internal Load Balancer rebuild
As noted, the optional AWS Internal Load Balancer will need to be rebuilt to enable Security Group membership.
Users who have the AWS Internal Load Balancer deployed will need to retrieve it's new address after rebuild from Terraform to configure in Ansible.
GKE Backups via Toolbox pod key requirement
As noted, the GCP Service Account refactor has one specific exception on Cloud Native Hybrid environments on GKE.
To continue enabling backups to be taken on the Toolbox a Service Account key will still be needed to be passed. Refer to the documentation for more information.
Unintended use of internal NFS Server
Users who specifically did not have a NFS node deployed but had the Ansible variable gitlab_object_storage_type
set to nfs
will potentially have object data stored on an internal NFS server that was deployed on the first Gitaly or Rails node (if no separate Gitaly node was configured) under the /mnt/gitlab-nfs
directory (/srv/gitlab-nfs
on Azure environments).
This internal NFS server setup was only intended for use by the Toolkit but, while unlikely, out of an abundance of caution to avoid any data loss any user who has the above combination of settings may have led to unintended usage by GitLab itself and must go through additional actions. In this specific scenario you must first migrate this data to Object Storage or a separate NFS node before upgrading and running the cleanup playbook as detailed in the NFS Cleanup section. For further assistance please reach out to our Support team.
Users who had a separate NFS node or are already using Object Storage are unaffected and should ignore this section.
Feedback
Got any feedback or found an issue? Please feel free to create an issue on our tracker