New spot circuit breaker for snoozing spot requests if too many interruptions detected. Monitoring improvements. StepSecurity integration, and more.
What's changed
Spot circuit breaker
- Allow to switch to on-demand requests if spot interruption frequency is too high over a defined time interval. Fixes #226.
For instance, if SpotCircuitBreaker
is set to 2/30/60
, it means that after at least 2 interruptions in the last 30 minutes, RunsOn will switch to on-demand requests for the next 60 minutes.
Monitoring
- Add workflow job conclusion to prometheus labels. Fixes #178. Also add
job_conclusion
andrun_attempt
to all log lines. - Support SQS queue oldest message age alarms. Helps with compliance and to detect whether RunsOn has issues dequeuing messages fast enough. Fixes #228.
- Use scheduled event to compute and send cost reports at midnight UTC. Fixes #216.
Native integration with StepSecurity
- Support StepSecurity integration with new images (announcement soon).
jobs:
job-with-stepsecurity:
runs-on: "runs-on=${{ github.run_id }}/runner=2cpu-linux-x64/image=ubuntu22-stepsecurity-x64"
steps:
- name: External call
run: curl https://google.com
Misc
- Reduce agent binary size.
- Update Go dependencies.
- Allow injection of custom runner agent (internal testing only).
- Remove magic cache ON annotation. Fixes #234.
Upgrade
- Upgrade Guide
- CloudFormation Versioned template URL: https://runs-on.s3.eu-west-1.amazonaws.com/cloudformation/template-v2.6.7.yaml