github cloudposse/atmos v1.217.0

latest release: v1.218.0-rc.0
7 hours ago
docs(roadmap): curate featured; drop internal-refactor changelog posts @osterman (#2384)

what

  • Cap featured[] in website/src/data/roadmap.js at 6 curated strategic initiatives. Drop devcontainer, workflows, instance-status-upload, and chunked-stack-uploads. Final 6: atmos-ai, cloud-auth, native-ci, pro-commit, source-provisioning, toolchain.
  • Add equivalent milestones to the ci-cd initiative for the two demoted Atmos Pro items so their changelogs stay reachable from the roadmap. Recalc ci-cd.progress 89 → 92.
  • Delete three internal-only refactor blog posts and their corresponding quality initiative milestones: process-args-flags-refactor, refactoring-executeterraform-for-testability, describe-stacks-complexity-reduction. Recalc quality.progress 86 → 75.
  • Update .claude/agents/roadmap.md with two new rules: (1) featured[] is manually curated, max 6, edited only when the user explicitly asks; (2) internal-only refactors with no user-visible change do not get changelog posts. Adds matching schema docs and quality-check items.

why

  • The featured section had drifted into a per-release announcement feed — every minor Atmos Pro plumbing improvement (chunked uploads, instance status, etc.) was rendering at the top of /roadmap next to transformative initiatives like Atmos AI and Cloud Auth. That diluted its meaning.
  • The roadmap maintainer agent had no documented rule for featured[], so it was being modified on every release. Codifying "max 6, opt-in only" stops the drift at the source.
  • Internal refactor posts (cyclomatic complexity reductions, function decomposition) are engineering wins but produce zero user-visible change. They belong in PR descriptions and git log, not the user-facing changelog.

references

  • No issue tracker reference.
  • no-release — content/data only; no Go code, no user-visible CLI behavior change. Removing already-published changelog entries that should not have been published.

Summary by CodeRabbit

  • Documentation

    • Removed three technical blog posts documenting internal refactors
    • Clarified roadmap maintenance guidance: changelog should omit internal-only refactors; featured entries are curated with a hard cap of 6 and should not be modified unless explicitly requested
  • Chores

    • Reorganized featured initiatives and adjusted roadmap milestone tracking
    • Updated CI/CD progress to 92% and Quality progress to 75%
    • Updated NOTICE with concrete upstream license URLs
  • Quality

    • Added checks to prevent improper featured changes and to omit internal refactors from the changelog
docs: document the component `provision:` block (provision.backend, provision.workdir) @osterman (#2378)

what

  • Add a new stack-config schema page at website/docs/stacks/components/provision.mdx that documents the entire provision: block as a coherent feature, with sections for provision.backend.enabled (terraform-only), provision.workdir.enabled (all four toolchains), toolchain defaults, component-level overrides, and global defaults via settings.provision.workdir.{enabled,ttl} in atmos.yaml.
  • Add a :::tip callout to website/docs/stacks/components/terraform/backend.mdx clarifying that backend: (where state is stored) is distinct from provision.backend: (auto-create that location).
  • Cross-link website/docs/components/terraform/backend-provisioning.mdx to the new schema page so the conceptual deep-dive points at the schema reference.

why

  • The provision: block (with provision.backend.enabled and provision.workdir.enabled) is functional and used in fixtures, but had no dedicated documentation page in Stack Configuration. The only references were a CLI command page (atmos terraform workdir), a passing mention in cli/configuration/components/terraform.mdx, and the backend-provisioning conceptual page — none of which document the schema directly.
  • Closes a discoverability gap: a user reviewing the components sidebar saw entries for *.metadata, ansible, helmfile, packer, terraform/backend and noticed provision was missing entirely.
  • The added :::tip on the backend page resolves long-standing confusion between the backend: block (state location) and the provision.backend: block (whether to bootstrap that location).

references

  • Schema source: pkg/schema/schema.go:402-414 (ProvisionWorkdirSettings), pkg/provisioner/workdir/types.go:57-61 (WorkdirConfig), pkg/provisioner/backend_hook.go:111-125 (provision.backend.enabled).
  • Canonical fixture: tests/fixtures/scenarios/workdir/stacks/catalog/workdir-defaults.yaml.
  • Verified with cd website && npm run build (zero broken links; new page registered as the 544th content route).

Summary by CodeRabbit

  • New Features

    • Added list: CLI configuration for customizable components, instances, and stacks output
    • Added global settings.provision and provision.workdir docs for workdir defaults/TTL
  • Documentation

    • Added component provisioning docs (backend + workdir) and relocated backend provisioning links/site redirects
    • Added telemetry configuration page with privacy guarantees
    • Expanded VCS token injection docs; Bitbucket username standardized to BITBUCKET_USERNAME
  • Chores

    • Updated NOTICE license URL entries to Unknown
docs: refresh CI page with native CI workflows; deprecate legacy GH Actions @osterman (#2373)

what

  • Move six legacy Cloud Posse GitHub Action docs (affected-stacks, atmos-terraform-plan/apply, drift detection/remediation, and the index) from website/docs/integrations/github-actions/ to website/docs/deprecated/github-actions/, with :::warning Deprecated banners on each page pointing readers to /ci.
  • Rebuild website/docs/ci/ci.mdx around the two reference repos cloudposse-examples/atmos-native-ci and cloudposse-examples/atmos-native-ci-advanced — concrete excerpts for deploy-on-PR, deploy-on-merge, preview cleanup, and an atmos describe affected --format=matrix fan-out, plus a discreet pointer to the deprecated content.
  • Add client-redirects from all six legacy /integrations/github-actions/* URLs to /ci, add a collapsed "Deprecated" sub-category at the bottom of the Resources sidebar, keep setup-atmos and component-updater in the (now smaller) GitHub Actions sidebar entry, and repoint cross-links across docs, two blog posts, and the roadmap.

why

  • Atmos now ships native CI integration (job summaries, output variables, status checks, planfile storage) directly in the CLI, so atmos terraform plan/apply/deploy already produces the artifacts the wrapper actions used to provide — the legacy actions are no longer the recommended path for new projects.
  • The previous structure buried native CI behind a single page while giving the legacy actions a first-class sidebar section, conflicting with the recommendation in the legacy index page itself; this PR aligns navigation with the recommended path and makes the deprecated material reachable but de-emphasized.

references

Summary by CodeRabbit

  • New Features

    • describe affected --format=matrix now auto-writes matrix output to $GITHUB_OUTPUT in CI when enabled (explicit --output-file still wins); workflow examples updated.
  • Documentation

    • Added a comprehensive "Native CI for GitHub Actions" guide, updated many docs and blog posts to use native CI wording, and published a blog post about the matrix auto-output behavior.
    • Marked legacy GitHub Actions wrapper actions/pages as deprecated with warning guidance.
  • Chores

    • Added legacy URL redirects and a new "Deprecated" docs section.
feat(list): --process-templates and --process-functions flags; fix list instances --upload auth @aknysh (#2363)

what

  • Added --process-templates and --process-functions CLI flags (and ATMOS_PROCESS_TEMPLATES / ATMOS_PROCESS_FUNCTIONS env vars) to every atmos list subcommand that processes stack manifests: list instances, list components, list metadata, list sources, list stacks. Defaults are true, matching atmos describe affected / atmos describe stacks / atmos describe component.
  • Clarified the flag descriptions that used to conflate YAML functions with Go template functions. --process-templates toggles Go templates (including atmos.Component(...)); --process-functions toggles YAML functions (!terraform.state, !terraform.output, !store, !aws.*, …).
  • Fixed the underlying atmos list instances --upload hang in CI: per-component auth resolution in internal/exec/describe_stacks_component_processor.go was gated on processYamlFunctions only, so the template-only path (atmos.Component(...) inside Go templates) ran terraform init with an empty AuthContext against remote backends and failed with No valid credential sources found. Guard now fires when either templates or YAML functions will run.
  • Refactored the per-component auth resolver for testability: extracted shouldResolvePerComponentAuth(...) predicate, resolveComponentAuthManager(...) method, and an injectable componentAuthManagerResolver field on describeStacksProcessor so the decision can be exercised without running real OIDC/STS.
  • Threaded the two flags through InstancesCommandOptions / MetadataOptions in pkg/list/ and through both the matrix-format and tree-format branches of list_instances.go, so every output path of the same invocation honors the same flag values.
  • Added three layers of regression coverage for each command that just got the flags (parser wiring, options struct, flag propagation to ExecuteDescribeStacks) plus a dedicated auth-guard regression suite (TestShouldResolvePerComponentAuth, TestResolveComponentAuthManager 6-row table, TestResolveComponentAuthManager_ResolverErrorFallsBackToParent).
  • Documented the two flags on every affected atmos list command page, added a blog post announcing the feature, and added a shipped milestone to the Discoverability & List Commands roadmap initiative.
  • Bumped Go modules to latest where compatible (aws-sdk-go-v2/service/s3 → 1.100.0, smithy-go → 1.25.1, anthropic-sdk-go → 1.38.0, hashicorp/terraform-exec → 0.25.1, posthog-go → 1.12.1, k8s.io/client-go → 0.36.0, plus many transitive indirects). Three transitive pins remain, now documented inline in go.mod: sentry-go v0.45.1 (cockroachdb/errors v1.12.0 still references the removed Extra field), gocloud.dev v0.41.0 (gomplate/v3 s3blob uses removed ConfigProvider), hairyhenderson/go-fsimpl v0.3.1 (transitive via the gocloud.dev pin).

why

  • atmos list instances --upload was broken in CI for any repo whose component sections call atmos.Component(...) inside Go templates with a stack-level default identity — the exact shape used by the Atmos Pro release workflow. Users reported the command failing with No valid credential sources found while atmos describe affected --upload in the same workflow succeeded.
  • Root cause: atmos.Component(...) is a Go template function, not a YAML function. The processor's per-component auth resolver assumed YAML functions were the only consumer of info.AuthContext and gated itself on processYamlFunctions. The template path reads the same AuthContext and shells out to terraform init + terraform output, so disabling per-component auth broke template-only invocations.
  • Users expected atmos list flags to line up with atmos describe flags. They didn't: only list affected, list settings, and list values had the two knobs. A user workflow actually relied on --process-functions on list instances (where it didn't exist), which produced an unknown flag error and a confusing escape hatch. Adding the two flags everywhere the command processes stacks closes that gap.
  • The flag rollout intentionally defaults both flags to true for parity. Users who run atmos list locally without tofu / terraform on $PATH can opt out with --process-functions=false or ATMOS_PROCESS_FUNCTIONS=false; the auth-guard fix above ensures the true, true default works end-to-end in CI.
  • Module update was due. The three remaining pins are annotated so the next go get -u ./... pass doesn't trip over them blindly.

references

  • Fix design doc: `docs/fixes/2026-04-24-list-instances-per-component-auth.md`
  • Blog post: `website/blog/2026-04-24-list-process-flags.mdx`
  • Roadmap milestone: `website/src/data/roadmap.js` (Discoverability & List Commands initiative)
  • Previous related fix: `docs/fixes/2026-04-08-atmos-auth-identity-resolution-fixes.md` (Category A vs B caller split that this change builds on)

Summary by CodeRabbit

  • New Features

    • Added --process-templates and --process-functions flags to list subcommands to control Go template vs YAML function processing (both default to enabled).
  • Bug Fixes

    • Restored per-component authentication resolution when templates are processed, fixing upload failures in CI.
  • Documentation

    • Updated CLI docs, added a blog post and roadmap entry describing the new flags and examples.
  • Tests

    • Extensive new and updated unit/integration tests covering flag parsing, behavior permutations, and regressions.
  • Chores

    • Updated NOTICE/license references, added missing license entries, bumped dependencies and example default version to 1.217.0.
Document CI statuses configuration options @goruha (#2362)

what

  • Document CI statuses configuration options

why

  • Improve documentation

Summary by CodeRabbit

  • Documentation
    • Added docs and example configuration for new CI post-commit status summary options: component, add, change, and destroy (flags default to true in the example).
    • Clarified required permissions to enable these status checks (GitHub checks: write or a commit-status-scoped API token for GitLab).

🚀 Enhancements

fix(jit): honor metadata.component subpath for JIT source-provisioned components @zack-is-cool (#2371)

What

JIT-provisioned components can now point at a submodule inside a cloned upstream repo via metadata.component, the same way non-JIT components already can. Every JIT-capable code path is coveredterraform plan/apply, terraform generate varfile, terraform shell, helmfile, packer, and ansible — and the resolver now lives in pkg/component/ rather than being curve-fitted to Terraform inside internal/exec/.

Before this PR, metadata.component: modules/iam-policy was silently ignored on the JIT/workdir code path — atmos cloned the repo to .workdir/<type>/<stack>-<component>/ and ran the underlying tool against that root, so generated files (backend.tf.json, varfile, .terraform/, helmfile state, packer cache, ansible inventory) all landed at the repo root instead of at .workdir/<type>/<stack>-<component>/modules/iam-policy/. The tools then either failed with confusing errors or silently ran against the wrong directory (e.g. a repo root with no .tf files).

Fixes #2364.

Why this matters

Some upstream repos organize modules under modules/<name>/ rather than at the repo root (e.g. terraform-aws-modules/terraform-aws-iam). For JIT to be useful against those repos, atmos needs to clone the whole repo into a workdir (so relative parent-path references like ../../shared-vars.tf resolve) and then run the tool against a specific submodule inside it. Non-JIT components already support this via metadata.component; this PR brings JIT/non-JIT parity for that capability and applies it uniformly across every executor.

What changed (revised after maintainer review)

This PR was originally scoped to Terraform with helpers in internal/exec/. Maintainer feedback on the first review was clear: the helpers belong in pkg/, and the same fix should apply to Helmfile/Packer/Ansible — not be curve-fitted to Terraform.

Both items are addressed in this revision.

New package: pkg/component/

Helpers extracted into a single component-type-parameterized package, used by all four executors:

package component

// Pure path logic (no I/O outside stat).
func ResolveWorkdirSubpath(metadataSubpath, workdirRoot string) (string, error)

// In-place mutation of WorkdirPathKey, idempotent via private sentinel.
func ApplyWorkdirSubpathToSection(info *schema.ConfigAndStacksInfo) (string, error)

// Post-ProcessStacks resolver: BuildPath(componentType) + Resolve.
func BuildAndResolveWorkdirPath(
    atmosConfig *schema.AtmosConfiguration,
    info *schema.ConfigAndStacksInfo,
    componentType string,
) (string, bool, error)

// Full orchestrator: existence check → AutoProvisionSource → Apply → re-check.
// componentType is one of cfg.{Terraform,Helmfile,Packer,Ansible}ComponentType.
func ProvisionAndResolveComponentPath(
    ctx context.Context,
    atmosConfig *schema.AtmosConfiguration,
    info *schema.ConfigAndStacksInfo,
    componentType, fallbackComponentPath string,
) (string, bool, error)

All five executor entry points collapse to one call

Entry point Before After
internal/exec/terraform_execute_helpers.go (plan/apply/etc.) provisionComponentSource(...) (terraform-only helper) component.ProvisionAndResolveComponentPath(ctx, ..., cfg.TerraformComponentType, ...)
internal/exec/terraform_generate_varfile.go tryJITProvision(...) + private checkDirectoryExists (terraform-only reimpl, subpath bolted on) component.ProvisionAndResolveComponentPath(ctx, ..., cfg.TerraformComponentType, ...)
internal/exec/helmfile.go (prev. lines 121-155) 35 lines of inline existence check + provSource.AutoProvisionSource + raw WorkdirPathKey lookup (subpath ignored) component.ProvisionAndResolveComponentPath(ctx, ..., cfg.HelmfileComponentType, ...)
internal/exec/packer.go (prev. lines 121-155) Same pattern as helmfile (subpath ignored) component.ProvisionAndResolveComponentPath(ctx, ..., cfg.PackerComponentType, ...)
pkg/component/ansible/executor.go (prev. lines 380-429) Same pattern as helmfile/packer (subpath ignored) component.ProvisionAndResolveComponentPath(ctx, ..., cfg.AnsibleComponentType, ...)

Helmfile, Packer, Ansible, and terraform generate varfile silently inherited the same #2364 bug; this PR fixes them at the same time as Terraform plan/apply, with zero curve-fitting. terraform shell and the post-ProcessStacks resolvers (terraform plan-diff, terraform verify-plan) call component.ApplyWorkdirSubpathToSection and component.BuildAndResolveWorkdirPath respectively for the same reason.

Adjacent behavior change: JIT runs whenever source.uri is set

Before this PR, helmfile/packer/ansible/terraform generate varfile only invoked AutoProvisionSource when the local fallback component dir was missing. Ansible and terraform generate varfile additionally short-circuited the moment that dir existed, never running JIT. After the refactor, all five entry points (terraform plan/apply, terraform generate varfile, helmfile, packer, ansible) take the same path: when source.uri is declared, the source provisioner runs unconditionally, and only the YAML's source.uri decides whether JIT is in play.

This is safe under steady-state operation because AutoProvisionSource already self-debounces via two cache layers — invocationDoneKey (no-ops a second call within the same command lifecycle) and needsProvisioning (skips re-provisioning when the version, URI, and freshness pin all match) — both in pkg/provisioner/source/provision_hook.go. Net effect for users: every JIT-capable entry point now honors source.uri the same way terraform plan/apply always has, and the previously preferred lazy-skip-on-stale-local-dir path is gone.

Post-ProcessStacks resolvers also use the shared helper

internal/exec/terraform_plan_diff.go and internal/exec/terraform_verify_plan.go previously called the terraform-private resolveWorkdirComponentPath; both now call component.BuildAndResolveWorkdirPath(atmosConfig, info, cfg.TerraformComponentType).

Existence-gated subpath join (the disambiguation)

metadata.component has two valid uses for JIT components — a real subdirectory inside the cloned repo, or an inheritance/identity pointer to an abstract base. The fix consults the filesystem rather than guessing from the string. After clone, either the joined subdirectory exists (case 1, apply the join) or it doesn't (case 2, leave the workdir root alone). String-shape heuristics (e.g. checking for /) are unreliable; the filesystem already encodes the right answer after git clone.

An unexported workdirSubpathAppliedKey constant + private subpathAppliedMarker struct type (both in pkg/component/) guard against double-joining if the orchestrator is invoked twice against the same info.ComponentSection map. YAML deserialization can't produce this type, so a stack manifest containing _workdir_subpath_applied: <anything> cannot bypass the join. The constant lives in pkg/component/ rather than pkg/provisioner/workdir/ because read/write access is confined to this package — keeping the protocol single-sourced next to the only code that uses it.

Error sentinel precision

Three sentinels carry distinct meaning across the orchestrator and its callers:

  • errUtils.ErrProvisionerFailedAutoProvisionSource (the JIT hook) failed.
  • errUtils.ErrWorkdirProvision — path resolution / stat / abs-subpath rejection failure on the workdir path.
  • errUtils.ErrInvalidComponent — stat failure on the local component directory (the no-source fallback path).

Two related fixes during review:

  1. The first revision wrapped AutoProvisionSource failures with ErrWorkdirProvision, which conflicted with the established pattern in pkg/provisioner/registry.go, internal/exec/terraform_shell.go, and the (now-removed) ansible executor — all of which used ErrProvisionerFailed. Bringing ansible's existing semantics back.
  2. The orchestrator's componentDirExists helper used to wrap every stat failure with ErrWorkdirProvision, including the !HasSource branch where the path is a local component directory, not a workdir. It now takes a sentinel parameter so the wrap matches the actual classification.
  3. terraform_shell.go no longer re-wraps an already-wrapped ErrWorkdirProvision with ErrProvisionerFailed; the original sentinel survives in the chain so errors.Is triage works correctly.

Design notes

.. in metadata.component is allowed. Many upstream Terraform modules reference shared files via relative parent paths (../../shared-vars.tf) and need the full repo on disk with the working directory at a subdirectory. Restricting to strict subpaths would break those layouts. Atmos's threat model assumes a trusted operator running atmos against their own stack configs — metadata.component is YAML-author-controlled, on par with !exec, !template, and !terraform.state, all of which can read or invoke arbitrary host resources. The godoc on ResolveWorkdirSubpath spells this out for future readers.

Absolute metadata.component is rejected. An absolute value violates the documented contract (metadata.component is a relative subpath inside the workdir). filepath.Join would silently coerce it into a child of the workdir root on Unix and apply drive-letter semantics on Windows — coercing it would mask author error. Rejected up-front with a wrapped ErrWorkdirProvision.

Same-name inheritance is a no-op. When metadata.component equals the component instance name (e.g. a component named vpc with metadata.component: vpc), atmos already clears the field during stack processing, so info.BaseComponentPath is empty and filepath.Join is never called.

Trade-off: typos fall through. A typo in metadata.component (e.g. modules/iam-polic when the user meant modules/iam-policy) silently falls back to the workdir root rather than failing fast. This matches pre-PR behavior for invalid subpaths and is logged at debug level for traceability. A future enhancement could surface a warning when the join falls back, distinguishing typos from intentional inheritance-pointer use.

What is not changed

The orchestrator short-circuits at !provSource.HasSource(...), so non-source components never reach the new code. Behavior of each non-JIT shape:

Component shape Effect
Plain local (no source, no workdir) Zero change — the orchestrator returns the fallback path immediately.
Workdir-only + metadata.component Zero functional change — workdir-only copies local files to the workdir root, so the candidate <workdirRoot>/<subpath> doesn't exist on disk; the resolver returns exists=false and the original componentPath is preserved.
Workdir-only without metadata.component Small related fix in plan-diff and verify-plan only: these two commands now resolve to the provisioned workdir root (where state actually lives) instead of falling back to the local component path. Live execution paths were already correct.

Other deferred scope:

  • atmos terraform generate planfile — this command builds componentPath directly from atmosConfig.TerraformDirAbsolutePath + info.FinalComponent and never invokes JIT provisioning. As a result, generate planfile does not support JIT components today (with or without a metadata.component subpath) — a pre-existing limitation, not a regression introduced by this PR. Wiring up JIT provisioning there is out of scope.

Tests

Unit (pkg/component/workdir_path_test.go):

Test Covers
TestResolveWorkdirSubpath_JoinedPathExists Joined path is used when the subpath is a directory on disk
TestResolveWorkdirSubpath_JoinedPathMissingFallsBack Falls back to the workdir root when the subpath does not exist (inheritance-pointer case)
TestResolveWorkdirSubpath_EmptySubpathReturnsRoot Empty metadata.component short-circuits to the root
TestResolveWorkdirSubpath_AllowsParentSegment .. resolves naturally — codifies the design decision
TestResolveWorkdirSubpath_RejectsAbsolutePath Absolute subpath wraps ErrWorkdirProvision
TestResolveWorkdirSubpath_RejectsAbsolutePathOutsideWorkdir Absolute path outside the workdir is also rejected
TestResolveWorkdirSubpath_RegularFileAtCandidate Wraps ErrWorkdirProvision when the candidate exists but is not a directory
TestApplyWorkdirSubpathToSection_JoinsSubpath Mutates WorkdirPathKey to the joined subpath and sets the typed sentinel
TestApplyWorkdirSubpathToSection_InheritancePointerPreservesRoot Regression guard: when the subpath does not exist, WorkdirPathKey stays at the workdir root
TestApplyWorkdirSubpathToSection_NoWorkdirPathKey No-op when WorkdirPathKey is absent
TestApplyWorkdirSubpathToSection_EmptyWorkdirPath No-op when WorkdirPathKey is the empty string
TestApplyWorkdirSubpathToSection_DoubleCallAppliesOnce Idempotent across repeat calls (init then plan)
TestApplyWorkdirSubpathToSection_SentinelGatesDoubleJoin Negative-path: deleting the sentinel re-enables the join, proving the sentinel is the gate
TestApplyWorkdirSubpathToSection_UserYAMLCannotForgeSentinel YAML-author values (bool, string, int, map) cannot impersonate the typed sentinel
TestBuildAndResolveWorkdirPath_ExistingDir Returns (joined-path, true, nil) when the workdir subpath exists
TestBuildAndResolveWorkdirPath_AllComponentTypes Component-type parity: terraform/helmfile/packer/ansible all resolve under .workdir/<componentType>/
TestBuildAndResolveWorkdirPath_AllComponentTypesWithSubpath Component-type parity: all four honor metadata.component subpath join (issue #2364 across executors)
TestBuildAndResolveWorkdirPath_InheritancePointerFallsBack Returns (workdir-root, true, nil) when the workdir root exists but the subpath doesn't
TestBuildAndResolveWorkdirPath_NonExistentDir Returns (candidate, false, nil) when the workdir is not provisioned yet
TestBuildAndResolveWorkdirPath_RegularFileAtCandidate Wraps ErrWorkdirProvision when the candidate exists but is not a directory
TestBuildAndResolveWorkdirPath_StatErrorPropagates Wraps ErrWorkdirProvision for non-ENOENT stat failures (EACCES)
TestProvisionAndResolveComponentPath_NoSourceReturnsFallback Orchestrator short-circuits cleanly when no source declared
TestProvisionAndResolveComponentPath_NoSourceMissingDir Reports exists=false correctly when fallback dir is absent

Integration (tests/cli_source_provisioner_workdir_test.go):

Test Drives Asserts
TestJITSource_MetadataComponentSubpath atmos terraform generate varfile null-label-exports -s dev Generated *.terraform.tfvars.json lands at <workdir>/exports/, not the workdir root. Reverting the fix moves the varfile and fails this assertion.
TestJITSource_MetadataComponentSubpath_TerraformShell atmos terraform shell null-label-exports -s dev --dry-run Captures the dry-run banner from stderr and asserts both the printed Working directory and Component path include the metadata.component subpath. Reverting the fix prints the bare workdir root and fails this assertion.

Both use github.com/cloudposse/terraform-null-label@0.25.0 with metadata.component: exports and skip gracefully on offline runners (no GitHub access or no git binary) via the existing precondition helpers.

References

Summary by CodeRabbit

  • New Features

    • Workdir-aware component resolution: metadata.component subpaths are applied when resolving JIT-provisioned components so modules can live under workdir subdirectories (e.g., exports/).
  • Bug Fixes

    • Consolidated and standardized component provisioning/resolution with improved validation and clearer error propagation across executors.
  • Tests

    • Added end-to-end and unit tests covering workdir subpath behavior, provisioning, and error cases.
test(describe-affected): accept all three valid Source values in TestResolveBase_PullRequest_Closed @aknysh (#2388)

what

Fix a CI-blocking test bug in `pkg/ci/providers/github/base_test.go` introduced silently by PR #2380.

`TestResolveBase_PullRequest_Closed` passes on PR runs (where `merge-base` or `HEAD~1` is reachable) but fails on post-merge runs to `main` (where only the documented `event.pull_request.base.sha` fallback is available):

why

The assertion

```go
assert.Contains(t, res.Source, "merge-base", "HEAD~1")
```

has a silent bug: testify treats the 4th positional argument to `assert.Contains` as the failure message, not an alternate value to match. So the test only ever checked for `"merge-base"`, and quietly passed on PR runs (where `merge-base` or `HEAD~1` was reachable) while failing on post-merge runs to `main` (where the GitHub Actions checkout depth and missing `origin/` fetch leave only the third documented fallback, `event.pull_request.base.sha`).

`ResolveBase()`'s closed-PR fallback chain in `pkg/ci/providers/github/base.go` documents three valid Sources:

  1. `"merge-base(HEAD, origin/)"`
  2. `"HEAD~1 (merged PR, merge-base unavailable)"`
  3. `"event.pull_request.base.sha"`

Replace the broken `Contains`-with-msg call with an explicit OR over the three substrings. The fix matches the test's own existing comment ("merge-base and HEAD~1 may or may not work; either way we get a valid resolution") -- the bug was that the assertion didn't actually check for "either way."

No production code change -- `ResolveBase()` already implements all three fallback paths correctly.

references

fix(ci): use terraform exit code as the source of truth for CI status @osterman (#2382)

what

  • Make the terraform exit code the authoritative signal for success/failure (and, for terraform plan with -detailed-exitcode, for change detection) in the CI summary path. Text parsing of stdout/stderr is downgraded to enrichment only — it still extracts resource counts, output values, and error message bodies, but no longer drives the binary HasErrors / HasChanges decisions.
  • Plumb the exit code through cmd/terraform/utils.gopkg/hooks RunCIHookspkg/ci ExecuteOptionsplugin.HookContext so the plugin handler has a clean signal independent of output format.
  • Rewrite parseOutputWithError (pkg/ci/plugins/terraform/handlers.go) so that:
    • apply/deploy: HasErrors = (exitCode != 0)
    • plan: HasErrors = (exitCode == 1); exitCode == 2 also implies HasChanges
    • other commands: HasErrors = (exitCode != 0)
    • exit-code success discards spurious "Error:" matches from text; exit-code failure still falls back to CommandError.Error() for the body if text parsing didn't find one.
  • Wire the enriched *plugin.OutputResult from parseOutputWithError through writeSummary and buildTemplateContext (it had been silently dropped — writeSummary had _ *plugin.OutputResult as its second arg, and buildTemplateContext re-parsed ctx.Output from scratch). buildTemplateContext keeps a nil-fallback so legacy callers continue to work.
  • Refactor RunCIHooks to take a *RunCIHooksOptions struct (per the repo's options pattern) since the parameter list grew past the linter's max-args limit.
  • Add tests covering all the new branches: exit-code-only failure rendering, exit-code 2 → HasChanges for plan, apply exit 0 with stray Error: in output → no error, plus the original failure-summary tests for plan/apply/deploy.

why

  • Reported regression: atmos terraform deploy <component> -s <stack> --upload-status failing at the authentication step (before terraform itself ran, exit code 1) still produced a job summary that read ## No Changes Applied for eks/karpenter-node-pool in e98d-gov-use1-dss with a NO CHANGE badge. The check run was correctly marked failed, but the summary contradicted it.
  • Root cause was architectural: the CI summary path used text parsing as the primary source of truth for failure/change state. The auth-failure stderr did not match ExtractErrors's ^Error: regex (it's emitted as **Error:** in markdown form), and writeSummary silently dropped the already-enriched OutputResult, so the apply template fell through to the no-changes branch. Anything that fails before terraform runs — auth, OOM, signal kill, network — would have hit the same bug.
  • Terraform exit codes are well-defined and stable (apply: 0 = success / non-zero = error; plan -detailed-exitcode: 0/1/2). Using them as the authoritative signal makes the hook robust against output-format drift between Terraform and OpenTofu, and against any pre-terraform failure that produces no parseable output. errUtils.GetExitCode already unwraps exec.ExitError, ExecError, exitCoder, and WorkflowStepError, so the existing error chains carry it through without further plumbing.

references

  • Affected handlers: pkg/ci/plugins/terraform/handlers.go (parseOutputWithError, writeSummary).
  • Affected helper: pkg/ci/plugins/terraform/plugin.go (buildTemplateContext).
  • Plumbing: pkg/ci/internal/plugin/types.go, pkg/ci/executor.go, pkg/hooks/hooks.go, cmd/terraform/utils.go.
  • Templates (unchanged): pkg/ci/plugins/terraform/templates/{apply,plan}.md already had {{ if .Result.HasErrors }} branches; they just weren't being reached.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved error detection and failure reporting by treating command exit codes as the authoritative indicator of success/failure, fixing edge cases where errors occur before terraform produces output.
    • Enhanced CI/check-run status accuracy for plan and apply operations, properly handling plan changes and command execution failures.
  • Tests

    • Added comprehensive test coverage for exit code handling, error state reconciliation, and CI hook execution workflows.
fix(auth): preserve AWS SDK error in assume-role / web-identity / assume-root failures @aknysh (#2385)

what

  • Adds WithCause(err) at the three STS error sites in
    pkg/auth/identities/aws/:
    • assume_role.go — standard AssumeRole path.
    • assume_role.goAssumeRoleWithWebIdentity (OIDC) path.
    • assume_root.goAssumeRoot (centralized root access) path.
  • Adds regression tests in pkg/auth/identities/aws/assume_sdk_error_test.go
    that point STS at a local httptest.Server returning AWS-style XML
    error envelopes (via the existing aws.resolver.url mechanism). Each
    test asserts the sentinel is preserved (errors.Is(err, ErrAuthenticationFailed)), the AWS error code and message are
    reachable in err.Error(), and the SDK error is also reachable
    through errors.As(err, &smithy.APIError).
  • Adds docs/fixes/2026-05-01-assume-role-error-swallows-aws-cause.md
    documenting the issue and fix.

why

  • The three error sites built an enriched error with
    errUtils.Build(ErrAuthenticationFailed).WithExplanation(...).WithHint(...).Err()
    but never threaded the underlying SDK err into the chain. Operators
    saw only authentication failed: identity=<name> step=<n>: authentication failed with no AWS context.
  • That made it impossible to tell, without re-running under
    ATMOS_LOGS_LEVEL=Debug, whether the failure was AccessDenied,
    NoSuchEntity, InvalidIdentityToken, ExpiredTokenException,
    MalformedPolicyDocumentException, throttling, etc. Each has a
    different remediation; the hint list ("verify the role ARN", "check
    the trust policy", ...) effectively enumerated every plausible cause
    because the actual one had been dropped.
  • The error builder already exposes WithCause(err) for exactly this
    case (errors/builder.go:104-167). It chains the cause via
    fmt.Errorf("%w: %w", sentinel, cause), preserves the sentinel for
    errors.Is checks, and merges any hints/safe details the cause
    already carried. The canonical pattern is already used at
    pkg/auth/identities/aws/webflow_token.go:88-97. The three assume
    sites just hadn't adopted it yet.
  • After the fix, the same failure renders with the AWS-side reason
    inline:
    authentication failed: identity=<name> step=<n>: authentication failed: operation error STS: AssumeRoleWithWebIdentity, https response error StatusCode: 403, RequestID: ..., api error AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
    — which makes the trust-policy / token / role-ARN problems
    diagnosable from the first run.
  • Verified by reverting just the three WithCause(err) lines and
    confirming the new tests fail; restoring the fix turns them green
    again. Full pkg/auth/... test suite (~25 packages) passes.

references

  • docs/fixes/2026-05-01-assume-role-error-swallows-aws-cause.md
    full root-cause writeup, code paths, and rationale (added in this
    PR).
  • errors/builder.go:104-167WithCause / WithCausef helpers
    used by the fix.
  • pkg/auth/identities/aws/webflow_token.go:88-97 — canonical
    pattern referenced as the model for these three sites.
  • pkg/auth/manager_chain.go:570 — chain wrapper that already
    expected the leaf to thread the cause via the trailing %w; this
    PR makes the leaf actually do so.

Summary by CodeRabbit

  • Bug Fixes

    • Preserve and surface underlying AWS STS error details in authentication failures while retaining existing sentinel behavior.
  • Tests

    • Added regression tests that verify sentinel preservation, inclusion of AWS error text, and access to typed SDK errors across multiple STS error scenarios.
  • Documentation

    • Added a doc with before/after examples and end-to-end test descriptions for the error-handling change.
fix(describe-affected): resolve PR base via merge-base with shallow-clone self-heal @osterman (#2380)

what

Fix atmos describe affected reporting many more affected components
than the PR actually modified, specifically when the PR is out of
date with the target branch
.

  • pkg/git/merge_base.go adds MergeBaseWithAutoFetch that runs a
    targeted git fetch origin <target> (and optionally one
    --deepen=200) when MergeBase can't resolve. Bounded retries.
  • pkg/ci/providers/github/base.go:resolvePRBase keeps merge-base as
    the primary strategy and drops the buggy last-resort path that
    returned refs/remotes/origin/<target> (which downstream resolved
    to the current tip of the target branch, producing the
    false-positives). New last-resort is event.pull_request.base.sha,
    which is frozen at the last PR sync and never points to the
    current tip.
  • ExecuteDescribeAffectedWithTargetRefCheckout accepts a new
    targetBranch parameter and self-heals via git fetch when
    worktree creation hits a missing target commit.
  • Adds pkg/git/fetch.go (FetchRef, DeepenFetch) lifted from
    PR #2285. New TargetBranch field on BaseResolution and
    DescribeAffectedCmdArgs.

why

A customer reported that atmos describe affected on an out-of-date
PR listed components the PR did not touch. The root cause was a
fallback path documented as "handles this gracefully" in the PRD
that, in practice, silently produced wrong results when the local
repo was a shallow checkout (the actions/checkout@v4 default).
Walkthrough and rationale are in
docs/fixes/2026-04-30-describe-affected-out-of-date-pr.md.

The user's suggestion — using pull_request.merge_commit_sha as
the base — would also work and is documented as a considered
alternative in the fixes doc. We chose merge-base + auto-fetch
because it preserves the existing PRD architecture, doesn't require
fetching M's parent separately, and works naturally with
actions/checkout@v4's default merge-ref checkout.

supersedes #2285

PR #2285 proposed promoting pull_request.base.sha to the primary
strategy. This PR keeps merge-base as primary (gold standard) and
uses base.sha only as a fallback that replaces the buggy
ref-tip path. The fetch helpers and signature plumbing are lifted
from #2285; credit to the original work.

tests

  • pkg/git/merge_base_test.go: new
    TestMergeBaseWithAutoFetch_RecoversFromMissingRef builds an
    origin/clone pair, deletes origin/main to simulate a shallow
    CI checkout, and asserts the recovered SHA is the fork point —
    not the current main tip.
  • pkg/ci/providers/github/base_test.go:
    TestResolveBase_PullRequest_OutOfDate_FallsBackToPayloadSHA
    reproduces the customer scenario at unit-test level.
  • internal/exec/describe_affected_test.go:TestResolveBaseFromCI
    hardened to require describe.SHA is populated and
    describe.Ref empty — guards against any future regression that
    re-introduces the ref-tip fallback.

references

  • supersedes #2285
  • closes the customer-reported regression introduced in #2241

Summary by CodeRabbit

  • Bug Fixes
    • More reliable PR base resolution in CI: auto-fetch + one-step deepen for shallow checkouts, targeted ref retry when refs are missing, and safer fallback to event payload base SHA to reduce false positives.
  • New Features
    • Merge-base recovery with targeted fetch/deepen, worktree retry on missing commits, and explicit propagation of PR target branch for CI resolutions.
  • Documentation
    • Updated CI/base-resolution docs and troubleshooting note for out-of-date PRs.
  • Tests
    • New and expanded unit/integration tests covering recovery, fetch/deepen, and fallback paths.
fix: authbridge resolver reads auth context from manager's stackInfo, not caller's @MrZablah (#2379)

what

  • Fix !store.get failing with "AWS auth context not available" when a store backend is configured with an identity: field
  • authbridge.Resolver now reads the post-authentication AWS/Azure/GCP context from the auth manager's own internal stackInfo (via GetStackInfo())
    instead of the caller's stackInfo
  • Add regression test TestResolveAWSAuthContext_PointerMismatch that directly reproduces the pointer mismatch scenario

why

  • pkg/auth.createAuthManagerInstance allocates its own *schema.ConfigAndStacksInfo for the auth manager — a different pointer than the info passed by
    the terraform executor to authbridge.NewResolver
  • After AuthManager.Authenticate() succeeds, PostAuthenticate writes credential file paths and profile info into the manager's own
    stackInfo.AuthContext.AWS, never the caller's info
  • The resolver was checking r.stackInfo.AuthContext.AWS (the caller's pointer, always nil) instead of r.authManager.GetStackInfo().AuthContext.AWS (the
    manager's pointer, populated by auth)
  • Result: every !store.get call with an identity: configured would succeed at authentication but then immediately fail with "AWS auth context not
    available"

references

Summary by CodeRabbit

  • Bug Fixes

    • Fixed auth context resolution so cloud-specific authentication is sourced from the auth manager rather than resolver-held data.
  • Chores

    • Pinned Go toolchain to 1.26.2.
  • Tests

    • Updated resolver tests to model manager-owned stack info separately and added a regression test for pointer-mismatch behavior.
test: increase test coverage in pkg/flags, pkg/filesystem, pkg/http, and pkg/function @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2173) - [x] Explore all affected files - [x] internal/exec/stack_processor_utils_test.go: Convert hardcoded path strings to filepath.Join (both test functions) - [x] pkg/filesystem/export_test.go: Add trailing period to inline comment on line 35 - [x] Build & test verification

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Summary by CodeRabbit

  • New Features
    • Bounded, configurable glob-pattern cache (TTL, max entries, empty-result toggle) with runtime metrics exposed via /debug/vars
    • Safer GitHub auth handling with host allowlisting and Authorization stripping on cross-host redirects
  • Bug Fixes
    • Consistent non-nil empty-slice result for glob no-matches and improved cache correctness
  • Documentation
    • Added changelog and minimum Go toolchain guidance (go.mod → Go 1.26+)
  • Tests
    • Large suite of new tests across globbing, atomic writes, flags, and HTTP client
  • Chores
    • New test-race Makefile target (race detector + shuffled execution)
fix(output): stop after-* hooks from corrupting backend.tf.json when backend uses !terraform.state @zack-is-cool (#2358)

Summary

Fixes #2356. The after-terraform-apply store hook path regenerated
backend.tf.json / providers_override.tf.json from un-rendered
component sections when the backend referenced !terraform.state,
overwriting a correctly-rendered file with literal YAML-function strings:

-        "bucket": "atmos-tfstate-dev",
-        "dynamodb_table": "atmos-tfstate-lock-dev",
+        "bucket": "!terraform.state tfstate-backend dev s3_bucket_id",
+        "dynamodb_table": "!terraform.state tfstate-backend dev dynamodb_table_name",

The hook then failed its tofu output call with:

Error: Backend initialization required: please run "tofu init"
Reason: Backend configuration block has changed

Why

Regression introduced in v1.216.0 by #2309 (commit 3c0e748ce) +
follow-up commit c7ef142a9 ("fix: skip-init should skip yaml function
evaluation"
). c7ef142a9 added a guard disabling YAML-function
evaluation when SkipInit && authManager == nil to avoid failing on
auth-requiring functions in the post-hook context. The guard is overly
broad — it also disables evaluation of non-auth functions like
!terraform.state — so sections returned from DescribeComponent retain
literal YAML-function strings. execute() then extracts config.Backend
from those sections and writes them to disk via GenerateBackendIfNeeded.

Fix

Thread processYamlFunctions bool through execute() in
pkg/terraform/output/executor.go and guard the artifact-regeneration
block (Step 4 / Step 5) behind it. When YAML functions were not
evaluated upstream, execute() must not regenerate artifacts from the
un-rendered sections. The backend file on disk from the init/apply phase
is already correct; leaving it alone is always safe. Output reading
(tofu output) still works via the on-disk state.

Minimal, localized diff — four commits:

  1. refactor(output): inject BackendGenerator and thread processYamlFunctions through execute() — pure DI plumbing, no behavior change.
  2. fix(output): skip artifact regeneration when YAML functions were not processed — the actual guard.
  3. test(output): assert backend-generator calls match processYamlFunctions in SkipInit tests — locks in the invariant in four existing SkipInit tests.
  4. test(output): regression test for #2356 backend.tf.json corruption — byte-identical integration assertion.

Test plan

  • New unit test TestExecutor_Execute_SkipsArtifactRegen_WhenYamlFunctionsNotProcessed (demonstrably red before the guard, green after).
  • Four existing SkipInit tests strengthened with zero-call expectations on the backend-generator mock.
  • Inverse assertion in TestExecutor_GetAllOutputs_SkipInit_WithAuthManager_ProcessesYamlFunctions: GenerateBackendIfNeeded + GenerateProvidersIfNeeded called exactly once when auth is present.
  • Integration regression test TestExecutor_Regression_Issue2356_BackendFileUnchangedInSkipInitPath: writes a rendered backend.tf.json, drives GetOutputWithOptions(SkipInit=true, authManager=nil), asserts the file is byte-identical. Fails without the guard; passes with it.
  • go test ./pkg/terraform/output/... -count=1 green.
  • make lint / ./custom-gcl run --new-from-rev=origin/main clean (one dupl warning on the new test vs the existing SkipInit test is suppressed with //nolint:dupl + justification — they test contrasting invariants at the same call site; extracting shared scaffolding would obscure the red/green comparison).
  • Manual end-to-end via LocalStack + Redis repro (ignore/issues/post-apply-hook-backend-racecondition/repro.sh in the branch, referenced from #2356). Exits 0 with FIX VERIFIED on this branch; backend file byte-diff is empty after the after-apply hook.
  • CI full suite — opening this PR runs it.

Follow-up

The processYamlFunctions = false guard in GetOutputWithOptions /
fetchAndCacheOutputs is the deeper design issue — auth availability
should not gate evaluation of non-auth YAML functions. Tracked in #2357.
This PR is the minimal regression fix for v1.216.x.

Release

  • fix: conventional commit → patch release (v1.216.1).
  • No schema changes, no user-facing config changes.
  • No roadmap update (regression fix, not a feature).

Summary by CodeRabbit

  • Bug Fixes

    • Backend and provider override files are regenerated only when YAML functions are processed, preventing unnecessary rewrites.
    • Fixed a case where skip-initialization could overwrite already-rendered backend/provider files, preserving existing configurations.
  • Tests

    • Added regression tests to ensure backend/provider files remain unchanged in the skip-initialization path and to validate correct conditional regeneration behavior.

Associated Pull Requests

Deployment Status

To view the Atmos Pro deployment status of this release, see #2390.

Don't miss a new atmos release

NewReleases is sending notifications on new releases.