Terrapod v0.11.0 — fixes binary cache download failures and adds comprehensive test coverage and operational documentation.
Highlights
- Binary cache redirect fix — runner entrypoint now follows S3 redirects on presigned URLs, preventing "not a valid zip" errors when the storage bucket region differs from the API configuration
- Upstream fallback for binary downloads — if the binary cache is unavailable or disabled, runners fall back to downloading terraform/tofu directly from upstream (releases.hashicorp.com / GitHub Releases)
- Zip validation — runner validates downloaded files are valid zip archives before attempting extraction, with actionable error messages pointing to storage configuration
- Configurable stale timeout — reconciler's stale run timeout is now configurable via
runners.staleTimeoutSeconds(Helm values) instead of a hardcoded 1 hour - 839 backend tests — added 136 new unit tests covering auth modules (runner tokens, CA, RBAC), VCS services (GitHub, GitLab), agent pool service, and integration tests
- Playwright E2E suite — expanded from 16 to 41 end-to-end tests across 11 spec files
- Operational runbooks — 8 failure-scenario runbooks covering stale runs, state divergence, scheduler stall, listener offline, run queue backlog, DB pool exhaustion, Redis connection loss, and storage errors
Bug Fixes
- Fix missing
Content-Encoding: noneheader for run-events SSE endpoint — Next.js compression was silently buffering SSE messages - Fix React key anti-pattern in sessions page (array index replaced with
token_hint) - Fix binary cache download failures when S3 presigned URLs return redirects (#138)
Full Changelog: v0.10.0...v0.11.0