feat(terraform): registry cache, RC management, and multi-platform mirror @osterman (#2582)
## what- Add a transparent Terraform/OpenTofu registry cache: an ephemeral local HTTPS network-mirror proxy (
pkg/http/proxy,pkg/terraform/{cache,registry}) that caches providers and modules in the canonicalfilesystem_mirrorlayout, enabled withcomponents.terraform.cache.enabled: true. - Add the
atmos terraform cachecommand group —list,stats,prune,delete, plusmirror(aliaswarm) for eager multi-platform pre-seeding andtrust/untrustfor the proxy certificate. - Add declarative Terraform CLI-config (
.terraformrc) management viacomponents.terraform.rc, exposed to the subprocess throughTF_CLI_CONFIG_FILE/TOFU_CLI_CONFIG_FILE. - Add a first-class
components.terraform.platformssetting (target<os>_<arch>list) that drives both eageratmos terraform cache mirrorpre-seeding (--all/--components/--query/-s, package-manager-style TUI,--format json|yaml) and automatic completion of.terraform.lock.hcl. - Keep
.terraform.lock.hclcomplete across platforms: a built-inafter.terraform.initprovisioner runsterraform/tofu providers lock -platform=…for the declaredplatformswhenever a customized provider installation method (the default plugin cache, or the registry cache) is active. Because it runs afterinit, it sees the fully JIT-vendored and code-generated working directory, so the generated provider set (including stack-config provider versions) is what gets locked — and committed lock files install cleanly on every platform in a fleet. - Generate and cache a self-signed loopback certificate so the proxy can serve HTTPS (required by Terraform/OpenTofu network mirrors); trusted automatically via
SSL_CERT_FILEon Linux/CI and via a one-timeatmos terraform cache truston macOS/Windows. - Add
examples/caching(auto-installs OpenTofu via the toolchain), PRDs, command + configuration docs, blog posts, and a roadmap update.
why
- Repeated and CI runs re-download the same providers and modules; the cache eliminates that, keeps runs working through registry outages, and preserves the exact versions a deployment used.
- Atmos enables a provider plugin cache (
TF_PLUGIN_CACHE_DIR) by default, and network mirrors behave the same way: Terraform can no longer record the registry's signed cross-platform checksums, soinitwrites a.terraform.lock.hclwith hashes for only the current platform and prints the "Incomplete lock file information for providers" warning. Declaringcomponents.terraform.platformslets Atmos complete the lock automatically for every target platform. - The lazy proxy only caches the host platform, so mixed CI/developer fleets and air-gapped reproducible builds need declarative multi-platform pre-seeding —
components.terraform.platforms+cache mirrorprovide it. - Declarative
rclets teams manage provider mirrors, credentials, and other CLI-config directives fromatmos.yamlinstead of per-machine.terraformrcfiles.
references
- Closes #2150
docs/prd/terraform-registry-cache.md,docs/prd/terraform-rc-management.md,docs/prd/terraform-registry-cache-tls.md
Summary by CodeRabbit
Release Notes
-
New Features
- Added an experimental Terraform/OpenTofu registry cache with disk-backed mirroring, metadata freshness controls (TTL + stale-while-revalidate), per-key locking, and a savings report.
- Added
atmos terraform cachesubcommands:list/stats(table/JSON/YAML output),prune(--older-than,--dry-run,--all),delete <key>,mirror(warmalias, optional eager pre-seeding). - Added
trust/untrustfor HTTPS certificate trust on macOS/Windows (actionable when untrusted).
-
Documentation
- Added/updated guides and CLI references for registry cache, CLI RC management (
components.terraform.rc), and all cache subcommands.
- Added/updated guides and CLI references for registry cache, CLI RC management (
feat: Atmos Git — foundational capability for GitOps enablement @osterman (#2597)
## whatAtmos Git: Git becomes a foundational platform capability, on par with Toolchain, Auth, and Hooks — the enablement layer for GitOps workflows where Atmos commits generated artifacts to a source-of-truth repository. PRD: docs/prd/git-ops.md.
- Top-level
gitconfig —git.repositories.<name>declares managed repositories (uri, branch, remote, clone depth/filter/single-branch/submodules,auth.identity,commit.signing/commit.author,push.retries),git.hooksdeclares local Git hooks,git.listconfigures list output. Workdirs default to automatic XDG cache locations ($XDG_CACHE_HOME/atmos/git/repositories/<name>) so the native CI cache captures and restores managed clones. pkg/gitservice — provider registry (registry pattern) with thecliprovider in v1 (chosen because GitHub STS materializes credentials asGIT_CONFIG_*env vars, which subprocess git honors and go-git ignores). Clone is defined as reconcile (clone-if-absent, else fetch + checkout + ff-only) so stale CI-cache restores are just faster clones. Safety rules: ff-only pull, no force push ever, push retry-with-rebase on non-fast-forward rejection, path-scoped commits that fail on unrelated dirty files, worktree path-traversal validation, per-invocation commit author injection (CI runners need nouser.name), provenance trailers (Atmos-Stack,Atmos-Component,Atmos-Source-SHA).atmos gitcommand group —clone,pull,status,diff,commit,push,list,clean, plusgit hooks install|uninstall|run, registered under the Git help group.--allbulk operations (bounded concurrency, attempt-all witherrors.Join). Clone accepts configured names, plain URLs, and go-gettergit::...?ref=&depth=URIs. No-arg clone in native CI (ci.enabled: true) infers the current repository from CI metadata and clones into the workspace — anactions/checkoutreplacement.atmos list git-repositoriesalias registered.githook kind — publishes generated artifacts on lifecycle events (after.terraform.apply, ...) to the current repository by default or a named managed repository, with templated commit messages, trailers, clean no-ops, and push-after-commit with retry. Inherits--skip-hooksandon_failure.- Local Git hook shims —
atmos git hooks installwrites worktree-aware.git/hooks/*shims (marker-protected,--forceto overwrite, warns whencore.hooksPathis set);rundispatches configured commands with stdin forwarding and exit-code propagation. - Error handling — new sentinels (
ErrGitRepositoryNotFound,ErrGitAuthFailed,ErrGitPushRejected,ErrGitDirtyUnmanagedFiles,ErrGitPathEscapesWorktree,ErrGitHookNotConfigured,ErrGitRepositoryRequired,ErrGitProviderNotFound) with error-builder hints and exit-code mapping. Git stderr streams to the masked writer and is never embedded in error chains. - Docs & example — command pages under
website/docs/cli/commands/git/,gitconfiguration reference, hook kind docs, changelog blog post (atmos-gitops), roadmap milestone (CI/CD Simplification initiative), and a GitOps publishing demo atexamples/gitops(reconcile → review → publish against a managed deployment repo via custom commands).
What this is — and isn't
Atmos owns the publishing side of GitOps: render → diff → commit → push, with centralized safety rules. Reconciliation stays with the consumer — Argo CD or Flux pulls from the repository, or CI applies on merge. There are no agents and no drift-correction loop in Atmos itself (explicit non-goal in the PRD); Atmos is the producer feeding the reconciler. This also isn't a replacement for the existing GitHub Actions plan/apply integration — it's the Git plumbing those pipelines use.
why
GitOps workflows have always needed glue: ad hoc scripts to render manifests into deployment repos, commit them, survive push races, and wire credentials. Atmos already owns rendering, lifecycle events, toolchain, and credentials (GitHub STS) — this PR gives it the Git operations between them, with centralized safety rules instead of per-pipeline shell scripts. It is the foundation for Kubernetes deployment-repository provisioning (Argo CD / Flux rendered-manifest publishing, on the kubernetes component branch) and a future github provider for pull-request-based publishing to protected branches.
references
- PRD:
docs/prd/git-ops.md(in this PR) - Coverage:
pkg/git86%,pkg/git/providers/cli88%,pkg/hooks/kinds/git94%,cmd/git81% - Related: native CI cache (XDG-root archiving) and the Kubernetes component branch (consumes
provision.gitnext)
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
-
New Features
- Added the experimental
atmos gitcommand group:clone,pull(fast-forward-only),status,diff,commit,push(retries on contention),clean,list,init, plushooks(install,run,uninstall). - Introduced managed Git repositories in
atmos.yaml(git.repositories) with deterministic workdirs,git::URI query params (ref,depth), CI no-arg checkout, and concurrent--alloperations. atmos git listnow supports configurable columns/formatting and optional status probing.
- Added the experimental
-
Documentation
- Added/updated CLI, configuration, GitOps, and hook documentation for the new Git surfaces.
feat: support dotenv files in !include @osterman (#1930)
## SummaryAdds explicit dotenv file support to the existing !include YAML function. Dotenv files now resolve to maps, so they can be used directly in CLI and stack env sections and with YAML merge keys.
env:
<<: !include .env
AWS_REGION: us-east-2Dotenv files can also be layered with YAML merge sequences. This uses YAML's << merge-key syntax, the same YAML mechanism commonly used with anchors and aliases:
env:
<<:
- !include .env.local
- !include .env
AWS_REGION: us-east-2YAML merge sequence precedence is earlier item wins, and inline keys under env override all merged values.
What Changed
- Parse
.env,.env.*, and exact*.envfilenames as dotenv files when used with!include - Support
env: !include .envandenv: { <<: !include .env }/ block merge forms in stack config - Support dotenv
!includeinatmos.yamlenv, including merge sequences for layered dotenv files - Preserve
!include.rawbehavior for raw file contents - Keep
.envrcandfoo.env.localunsupported/raw; Atmos does not auto-load or execute dotenv files - Preserve YAML custom tags during schema validation so
env: !include .envsatisfies stack manifest schema rules - Update the stack manifest JSON schema description for
envto document the!includestring form - Document dotenv includes in both CLI
envand stackenvdocs, including YAML merge-key behavior, include path resolution, and layered files - Add a short blog post for explicit dotenv inclusion
- Add a roadmap milestone entry for the shipped dotenv
!includesupport - Add coverage-focused tests for dotenv merge-key retry handling, include path helpers, case-preservation helpers, and YAML custom-tag conversion
- Harden the LocalStack demo provider config to use the local edge endpoint directly, path-style S3, and skip AWS account-ID discovery so Terraform does not hang before reaching LocalStack in CI
Tests
cd examples/demo-localstack && ATMOS_IDENTITY=false go run ../.. describe component demo -s dev --format json --logs-level Off | jq '.providers.aws'cd examples/demo-localstack && ATMOS_IDENTITY=false go run ../.. validate stacks --logs-level Offgo test ./pkg/config ./pkg/validator ./pkg/filetypego test ./internal/exec -run 'TestGenerateProviderOverrides|TestGenerateProviderOverridesForAliases|TestProcessStackConfigProviderSection'go test ./pkg/config ./pkg/validator -coverprofile=.context/dotenv-include-coverage.outgo test ./pkg/utils -run 'TestInclude(Dotenv|ExtensionBased|RawFunction|WithNoExtension)'node -e "import('./website/src/data/roadmap.js').then(() => console.log('roadmap import ok'))"git diff --check- Real stack manifest schema regression:
env: !include .envvalidates againsttests/fixtures/schemas/atmos/atmos-manifest/1.0/atmos-manifest.json - Commit hooks passed: go-fumpt, Go build, go mod tidy, golangci-lint, whitespace/EOF/large-file checks
Closes DEV-2990
feat(ci): GitHub Actions build cache (atmos ci cache) @osterman (#2579)
## what- Add a CI build cache that restores the well-known Atmos cache root (
~/.cache/atmos— toolchain binaries, vendored components, remote import clones, provider/plugin caches) at startup and saves it at exit, using the same storeactions/cacheuses (GitHub Actions Cache Service v2). - New
atmos ci cachesubcommands:restore,save,list,delete— so the lifecycle can run in one invocation or be spread across CI steps. - New
ci.cacheconfiguration block (enabled,auto: off|restore|save|both,root,paths,key,restore_keys,compression) withATMOS_CI_CACHE_*env overrides. - Model it as a CI-provider capability (
provider.CacheProvider+ci.DetectCache()) with a backend registry (pkg/ci/cache) and a GitHub Actions implementation (pkg/ci/cache/github), mirroring the existing artifact subsystem; outside a runner it's a clean no-op. - Consolidate the default toolchain install path under the XDG cache root (
~/.cache/atmos/toolchain) so a single cache captures it; add a PRD, command/config docs, blog post, and roadmap entry.
why
- In CI, every job re-downloads the toolchain, providers, and modules from upstream — wasting time/bandwidth and exposing runs to transient and rate-limit failures. Persisting the cache root across jobs makes executions faster, more reliable, and reduces supply-chain exposure.
- Teams otherwise hand-wire an
actions/cachestep and own thekey/pathlogic themselves; Atmos already knows its cache root and can derive a stable key fromtoolchain.lock.yaml+ OS/arch, so it's two settings to enable. - Cache entries are write-once; a per-run state marker makes automatic and manual usage idempotent (an exact-key hit on restore skips the save), so the same operations work whether triggered automatically or via the subcommands.
references
- PRD:
docs/prd/native-ci/framework/ci-cache.md - Docs:
/cli/commands/ci/cacheand/cli/configuration/ci/cache - GitHub Actions Cache Service v2 (the store
actions/cacheuses)
Summary by CodeRabbit
- New Features
- Added native CI build caching:
atmos ci cachegroup withpaths,restore,save,list, anddelete, including GitHub Actions-backed caching, admin list/delete, and template-based key/restore-key generation. - Automatic restore-on-start and save-on-exit when enabled and cache-capable; provider capability is respected outside supported CI.
- Added native CI build caching:
- Documentation
- New/updated docs for CLI commands,
ci.cacheconfiguration, PRD/blog, and supporting GitHub Actions.
- New/updated docs for CLI commands,
- Tests
- Expanded unit/integration coverage for archive safety, key/config resolution, backend behavior, manager lifecycle, and CLI output.
- Chores
- Updated acceptance caching, snapshots/docs, and aligned toolchain default install path with XDG cache.
🚀 Enhancements
fix(flags): scope --skip-hooks to the terraform command subtree @osterman (#2578)
## what- Scope
--skip-hooksto the terraform command subtree. The flag (andATMOS_SKIP_HOOKS) moved off the global flag set ontoatmos terraformand its subcommands, so it no longer appears in the help of unrelated commands (auth,helmfile,atlantis,toolchain,about,secret, …). Lifecycle hooks only ever run onterraform plan/apply/deploy. - Stop tracking native-ci CI scratch output.
tests/fixtures/scenarios/native-ci/{github-output,github-step-summary}.txtare runtime artifacts; gitignored and untracked (matching the newernative-ci-gha-planscenario). - Standardize the CLI test suite on OpenTofu. The suite forces
ATMOS_COMPONENTS_TERRAFORM_COMMAND=tofuvia a single test-harness default, gates every binary-invoking test on a precondition so a missing binary skips cleanly (instead of baking "executable file not found" into goldens), and sanitizes the harness-injected env var out of debug snapshots. A small parity set (terraform -help/-version passthrough) opts back into terraform. - Provision test tooling via the Atmos toolchain (dogfooding).
TestMaininstalls any missing pinned binary (terraform/tofu/packer/helmfile/helm) through the Atmos toolchain itself and prepends it toPATH— "install as necessary", so CI (which supplies them viasetup-*actions) downloads nothing while local runs become self-contained. No host binaries (brew, etc.) required.
why
--skip-hookson every command was misleading — hooks only run on terraform. Mirrors the existing--github-token/toolchain scoping precedent.- The native-ci scratch files were tracked, so every local run without terraform dirtied them. They're CI artifacts, not fixtures.
- Test runs depended on whatever terraform/tofu binary was on the host; a missing binary silently corrupted golden snapshots and tracked fixtures. Standardizing on a single, license-clean (MPL) OpenTofu — with explicit preconditions — makes the suite deterministic and host-independent. The product runtime default stays
terraform; only tests change. - Provisioning tools through the toolchain dogfoods the feature and removes the dependency on host-installed binaries, so the suite runs the same way everywhere.
references
- Follows the
--github-token/toolchain flag-scoping precedent inpkg/flags/global_builder.go.
Summary by CodeRabbit
-
New Features
- Added
--skillflag for AI context features across CLI commands (requires--ai).
- Added
-
Changes
- Moved
--skip-hooksfrom global flags to theatmos terraformcommand flags. --skip-hooksapplies to Terraform subcommands (plan/apply/deploy) and supports both no-value usage and comma-separated hook-name selection.
- Moved
-
Documentation
- Added/updated
--skip-hooksdocumentation under Terraform command usage. - Removed
--skip-hooksandATMOS_SKIP_HOOKSfrom core global flag/environment variable references; updated hooks documentation accordingly.
- Added/updated
fix(toolchain): retry cosign verification on transport-level network errors @osterman (#2604)
## what- Add a
transportFlakeMarkersallowlist to the cosign retry classifier (pkg/toolchain/verification/signature_rekor.go) so transport-level network errors are retried like other transient Sigstore Rekor flakes:stream error: stream ID(Gonet/http2stream errors — covers all HTTP/2 error codes and both send/recv variants)connection reset by peerTLS handshake timeouti/o timeoutunexpected EOF
- Extend
TestClassifyCosignErrorwith the exact error observed in CI plus one case per new marker, and addTestRunCosignWithRetry_RecoversFromTransportFlakecovering end-to-end retry recovery.
why
CI failed on TestToolchainCustomCommands_InstallAllTools/Install_tofu while toolchain install opentofu/opentofu@1.9.0 was verifying the download signature. Cosign's query to the Sigstore Rekor transparency log died with:
searching log query: stream error: stream ID 1; INTERNAL_ERROR; received from peer
Atmos already retries cosign flakes (runCosignWithRetry, 5 attempts with exponential backoff), but the retryable classification is a deliberate allowlist that only recognized Rekor HTTP response flakes (searchLogQueryBadRequest, the IEEE_P1363 decode error, and 5xx scoped to the tlog retrieve endpoint). An HTTP/2 transport error matched none of the markers, so it surfaced on the first attempt with no retry.
Broadening to transport-level failures is safe within the allowlist's design rule: the allowlist exists so a real signature verdict (tampering, identity mismatch, expired cert) is never silently retried away. A transport failure means the request never completed and no verdict was rendered, so retrying it categorically cannot mask tampering. Existing negative tests (tampered artifact, identity mismatch, generic failure) continue to assert those still fail on the first attempt.
references
- Observed failure: Acceptance Tests (linux),
TestToolchainCustomCommands_InstallAllTools/Install_tofu
Summary by CodeRabbit
- Bug Fixes
- Signature verification now automatically retries on transient network/transport failures (e.g., HTTP/2 stream errors, connection resets, TLS handshake/timeouts, I/O timeouts, unexpected EOF), improving reliability during temporary infrastructure disruptions.
- Tests
- Added tests that validate retry behavior and recovery from transport-layer flakes.