github runs-on/runs-on v2.8.1

latest releases: v2.8.9, v2.8.8, v2.8.7...
pre-release4 months ago

A large release: can now use external networking stack ; enable encryption on all S3 buckets ; lots of quality of life improvements and bug fixes ; halve Windows boot times and enable Cloudwatch agent monitoring. Be sure to read the upgrade notes.

What's changed

Networking

  • Can now reuse existing networking stack. If NetworkingStack stack parameter is set to external instead of embedded. Fixes #198, fixes #265, fixes #230 (community-provided networking stack can provide this feature).

  • Some not-so-useful stack outputs have been removed. Some outputs may be - if using an external VPC.

Caching

  • Fix invalid cache key restoration for Magic Cache. Thanks @erikburt from ChainlinkLabs for the troubleshooting.

Security

  • Enable server-side encryption using AWS-managed KMS key on all S3 buckets. Fixes #276.

  • No longer expose JIT token in cloud-init-output logs. The token is no longer valid after a job is run, but still.

QoL improvements

  • Add AppDebug (true or false) stack parameter, which allows to disable the auto-shutdown of runners when the bootstrap fails. Useful to investigate what is going on when the runner initializes.

  • Add AppCustomPolicy stack parameter: Optional managed IAM Policy ARN to assign to the App runner service role. Can be used to e.g. allow access to KMS decryption keys for AMIs. Thanks @dsme94!

  • Add AppGithubApiStrategy (normal or conservative) stack parameter to opt into minimizing GitHub API usage. If set to conservative, runners won't be automatically unregistered in GitHub internal database (GitHub will still clean them up after 24h). This helps for users with very large number (20k+) of jobs launched every day. Fixes #285.

  • Now bootstraps runners using runs-on/bootstrap binary, preinstalled on official RunsOn images (faster and more extensible).

  • On spot interruption, give more time to the job to possibly complete before shutdown is triggered. Shutdown is now triggered 20s before the expected time sent by AWS, instead of 15 seconds after the notification is received. Fixes #277.

Windows

  • Shaved about 50s from Windows boot times: SSH is no longer automatically installed on Windows (SSM agent is available now), and no longer using Invoke-WebRequest helped a lot (TIL).

  • CloudWatch agent is automatically installed on Windows AMIs, and EC2Launch logs are shipped to CloudWatch (same naming as for Linux runners: e.g. LOG_GROUP_NAME/INSTANCE_ID/cloud-init-output.log). Also added support for roc connect on Windows AMIs in the RunsOn CLI.

Bug fixes

  • Fix for invalid CreateTags requests - Fixes #288.

  • Fix for invalid EC2 rate-limiter being used when uploading user-data file to S3. Fixes #286 .

  • Adjust ownership rule for S3 bucket logging, from BucketOwnerPreferred to BucketOwnerEnforced. Fixes #291.

Upgrade notes

Service interruption

Important: this upgrade is better performed during slow hours. During the few minutes that the upgrade lasts, and a few minutes after (while the old AppRunner instance(s) are taken out of traffic), jobs that are still hitting the old AppRunner instance(s) of the RunsOn service will fail to launch because the bootstrap script can no longer be downloaded. This is a side-effect of enabling S3 encryption, which disables non-authenticated downloads.

Be prepared to have to re-run some of the jobs. Or you can also create and configure a new stack from scratch, and remove the old one when finished (there will be some duplicate instances launched, but they will be cleaned up after 10 minutes).

Switching to an external networking stack

When switching from the embedded to an external Networking Stack:

  • if you were using the VPC peering CloudFormation template, you will need to delete that stack first.
  • your external networking stack will have to take care of VPC endpoints, NAT gateways etc, as you see fit. An S3 gateway endpoints is strongly recommended (S3 traffic is kept within the VPC and it's free).
  • RunsOn will perform networking checks when booting: if you have enabled the Private mode, it will ensure you have at least one private subnet with some kind of external connection (NAT gateway, Transit gateway, VPC peering, EC2 instance acting as gateway, etc.). If you don't have Private mode set, it will verify that there is at least one public subnet with an internet gateway attached. If the verification fails, you will get alerted with the SNS topic, and the RunsOn service will fail to boot.

Upgrade

Don't miss a new runs-on release

NewReleases is sending notifications on new releases.