docs(roadmap): curate featured; drop internal-refactor changelog posts @osterman (#2384)
what
- Cap
featured[]inwebsite/src/data/roadmap.jsat 6 curated strategic initiatives. Dropdevcontainer,workflows,instance-status-upload, andchunked-stack-uploads. Final 6:atmos-ai,cloud-auth,native-ci,pro-commit,source-provisioning,toolchain. - Add equivalent milestones to the
ci-cdinitiative for the two demoted Atmos Pro items so their changelogs stay reachable from the roadmap. Recalcci-cd.progress89 → 92. - Delete three internal-only refactor blog posts and their corresponding
qualityinitiative milestones:process-args-flags-refactor,refactoring-executeterraform-for-testability,describe-stacks-complexity-reduction. Recalcquality.progress86 → 75. - Update
.claude/agents/roadmap.mdwith two new rules: (1)featured[]is manually curated, max 6, edited only when the user explicitly asks; (2) internal-only refactors with no user-visible change do not get changelog posts. Adds matching schema docs and quality-check items.
why
- The featured section had drifted into a per-release announcement feed — every minor Atmos Pro plumbing improvement (chunked uploads, instance status, etc.) was rendering at the top of
/roadmapnext to transformative initiatives like Atmos AI and Cloud Auth. That diluted its meaning. - The roadmap maintainer agent had no documented rule for
featured[], so it was being modified on every release. Codifying "max 6, opt-in only" stops the drift at the source. - Internal refactor posts (cyclomatic complexity reductions, function decomposition) are engineering wins but produce zero user-visible change. They belong in PR descriptions and
git log, not the user-facing changelog.
references
- No issue tracker reference.
no-release— content/data only; no Go code, no user-visible CLI behavior change. Removing already-published changelog entries that should not have been published.
Summary by CodeRabbit
-
Documentation
- Removed three technical blog posts documenting internal refactors
- Clarified roadmap maintenance guidance: changelog should omit internal-only refactors; featured entries are curated with a hard cap of 6 and should not be modified unless explicitly requested
-
Chores
- Reorganized featured initiatives and adjusted roadmap milestone tracking
- Updated CI/CD progress to 92% and Quality progress to 75%
- Updated NOTICE with concrete upstream license URLs
-
Quality
- Added checks to prevent improper featured changes and to omit internal refactors from the changelog
docs: document the component `provision:` block (provision.backend, provision.workdir) @osterman (#2378)
what
- Add a new stack-config schema page at
website/docs/stacks/components/provision.mdxthat documents the entireprovision:block as a coherent feature, with sections forprovision.backend.enabled(terraform-only),provision.workdir.enabled(all four toolchains), toolchain defaults, component-level overrides, and global defaults viasettings.provision.workdir.{enabled,ttl}inatmos.yaml. - Add a
:::tipcallout towebsite/docs/stacks/components/terraform/backend.mdxclarifying thatbackend:(where state is stored) is distinct fromprovision.backend:(auto-create that location). - Cross-link
website/docs/components/terraform/backend-provisioning.mdxto the new schema page so the conceptual deep-dive points at the schema reference.
why
- The
provision:block (withprovision.backend.enabledandprovision.workdir.enabled) is functional and used in fixtures, but had no dedicated documentation page in Stack Configuration. The only references were a CLI command page (atmos terraform workdir), a passing mention incli/configuration/components/terraform.mdx, and the backend-provisioning conceptual page — none of which document the schema directly. - Closes a discoverability gap: a user reviewing the components sidebar saw entries for
*.metadata,ansible,helmfile,packer,terraform/backendand noticedprovisionwas missing entirely. - The added
:::tipon the backend page resolves long-standing confusion between thebackend:block (state location) and theprovision.backend:block (whether to bootstrap that location).
references
- Schema source:
pkg/schema/schema.go:402-414(ProvisionWorkdirSettings),pkg/provisioner/workdir/types.go:57-61(WorkdirConfig),pkg/provisioner/backend_hook.go:111-125(provision.backend.enabled). - Canonical fixture:
tests/fixtures/scenarios/workdir/stacks/catalog/workdir-defaults.yaml. - Verified with
cd website && npm run build(zero broken links; new page registered as the 544th content route).
Summary by CodeRabbit
-
New Features
- Added
list:CLI configuration for customizablecomponents,instances, andstacksoutput - Added global
settings.provisionandprovision.workdirdocs for workdir defaults/TTL
- Added
-
Documentation
- Added component provisioning docs (backend + workdir) and relocated backend provisioning links/site redirects
- Added telemetry configuration page with privacy guarantees
- Expanded VCS token injection docs; Bitbucket username standardized to
BITBUCKET_USERNAME
-
Chores
- Updated NOTICE license URL entries to
Unknown
- Updated NOTICE license URL entries to
docs: refresh CI page with native CI workflows; deprecate legacy GH Actions @osterman (#2373)
what
- Move six legacy Cloud Posse GitHub Action docs (
affected-stacks,atmos-terraform-plan/apply, drift detection/remediation, and the index) fromwebsite/docs/integrations/github-actions/towebsite/docs/deprecated/github-actions/, with:::warning Deprecatedbanners on each page pointing readers to/ci. - Rebuild
website/docs/ci/ci.mdxaround the two reference reposcloudposse-examples/atmos-native-ciandcloudposse-examples/atmos-native-ci-advanced— concrete excerpts for deploy-on-PR, deploy-on-merge, preview cleanup, and anatmos describe affected --format=matrixfan-out, plus a discreet pointer to the deprecated content. - Add client-redirects from all six legacy
/integrations/github-actions/*URLs to/ci, add a collapsed "Deprecated" sub-category at the bottom of the Resources sidebar, keepsetup-atmosandcomponent-updaterin the (now smaller) GitHub Actions sidebar entry, and repoint cross-links across docs, two blog posts, and the roadmap.
why
- Atmos now ships native CI integration (job summaries, output variables, status checks, planfile storage) directly in the CLI, so
atmos terraform plan/apply/deployalready produces the artifacts the wrapper actions used to provide — the legacy actions are no longer the recommended path for new projects. - The previous structure buried native CI behind a single page while giving the legacy actions a first-class sidebar section, conflicting with the recommendation in the legacy index page itself; this PR aligns navigation with the recommended path and makes the deprecated material reachable but de-emphasized.
references
cloudposse-examples/atmos-native-ci— basic example workflows excerpted in the new/cipagecloudposse-examples/atmos-native-ci-advanced— matrix workflow excerpted in the new/cipage
Summary by CodeRabbit
-
New Features
- describe affected --format=matrix now auto-writes matrix output to $GITHUB_OUTPUT in CI when enabled (explicit --output-file still wins); workflow examples updated.
-
Documentation
- Added a comprehensive "Native CI for GitHub Actions" guide, updated many docs and blog posts to use native CI wording, and published a blog post about the matrix auto-output behavior.
- Marked legacy GitHub Actions wrapper actions/pages as deprecated with warning guidance.
-
Chores
- Added legacy URL redirects and a new "Deprecated" docs section.
feat(list): --process-templates and --process-functions flags; fix list instances --upload auth @aknysh (#2363)
what
- Added
--process-templatesand--process-functionsCLI flags (andATMOS_PROCESS_TEMPLATES/ATMOS_PROCESS_FUNCTIONSenv vars) to everyatmos listsubcommand that processes stack manifests:list instances,list components,list metadata,list sources,list stacks. Defaults aretrue, matchingatmos describe affected/atmos describe stacks/atmos describe component. - Clarified the flag descriptions that used to conflate YAML functions with Go template functions.
--process-templatestoggles Go templates (includingatmos.Component(...));--process-functionstoggles YAML functions (!terraform.state,!terraform.output,!store,!aws.*, …). - Fixed the underlying
atmos list instances --uploadhang in CI: per-component auth resolution ininternal/exec/describe_stacks_component_processor.gowas gated onprocessYamlFunctionsonly, so the template-only path (atmos.Component(...)inside Go templates) ranterraform initwith an emptyAuthContextagainst remote backends and failed withNo valid credential sources found. Guard now fires when either templates or YAML functions will run. - Refactored the per-component auth resolver for testability: extracted
shouldResolvePerComponentAuth(...)predicate,resolveComponentAuthManager(...)method, and an injectablecomponentAuthManagerResolverfield ondescribeStacksProcessorso the decision can be exercised without running real OIDC/STS. - Threaded the two flags through
InstancesCommandOptions/MetadataOptionsinpkg/list/and through both the matrix-format and tree-format branches oflist_instances.go, so every output path of the same invocation honors the same flag values. - Added three layers of regression coverage for each command that just got the flags (parser wiring, options struct, flag propagation to
ExecuteDescribeStacks) plus a dedicated auth-guard regression suite (TestShouldResolvePerComponentAuth,TestResolveComponentAuthManager6-row table,TestResolveComponentAuthManager_ResolverErrorFallsBackToParent). - Documented the two flags on every affected
atmos listcommand page, added a blog post announcing the feature, and added a shipped milestone to the Discoverability & List Commands roadmap initiative. - Bumped Go modules to latest where compatible (aws-sdk-go-v2/service/s3 → 1.100.0, smithy-go → 1.25.1, anthropic-sdk-go → 1.38.0, hashicorp/terraform-exec → 0.25.1, posthog-go → 1.12.1, k8s.io/client-go → 0.36.0, plus many transitive indirects). Three transitive pins remain, now documented inline in
go.mod:sentry-go v0.45.1(cockroachdb/errors v1.12.0 still references the removedExtrafield),gocloud.dev v0.41.0(gomplate/v3 s3blob uses removedConfigProvider),hairyhenderson/go-fsimpl v0.3.1(transitive via the gocloud.dev pin).
why
atmos list instances --uploadwas broken in CI for any repo whose component sections callatmos.Component(...)inside Go templates with a stack-level default identity — the exact shape used by the Atmos Pro release workflow. Users reported the command failing withNo valid credential sources foundwhileatmos describe affected --uploadin the same workflow succeeded.- Root cause:
atmos.Component(...)is a Go template function, not a YAML function. The processor's per-component auth resolver assumed YAML functions were the only consumer ofinfo.AuthContextand gated itself onprocessYamlFunctions. The template path reads the sameAuthContextand shells out toterraform init+terraform output, so disabling per-component auth broke template-only invocations. - Users expected
atmos listflags to line up withatmos describeflags. They didn't: onlylist affected,list settings, andlist valueshad the two knobs. A user workflow actually relied on--process-functionsonlist instances(where it didn't exist), which produced anunknown flagerror and a confusing escape hatch. Adding the two flags everywhere the command processes stacks closes that gap. - The flag rollout intentionally defaults both flags to
truefor parity. Users who runatmos listlocally withouttofu/terraformon$PATHcan opt out with--process-functions=falseorATMOS_PROCESS_FUNCTIONS=false; the auth-guard fix above ensures thetrue, truedefault works end-to-end in CI. - Module update was due. The three remaining pins are annotated so the next
go get -u ./...pass doesn't trip over them blindly.
references
- Fix design doc: `docs/fixes/2026-04-24-list-instances-per-component-auth.md`
- Blog post: `website/blog/2026-04-24-list-process-flags.mdx`
- Roadmap milestone: `website/src/data/roadmap.js` (Discoverability & List Commands initiative)
- Previous related fix: `docs/fixes/2026-04-08-atmos-auth-identity-resolution-fixes.md` (Category A vs B caller split that this change builds on)
Summary by CodeRabbit
-
New Features
- Added --process-templates and --process-functions flags to list subcommands to control Go template vs YAML function processing (both default to enabled).
-
Bug Fixes
- Restored per-component authentication resolution when templates are processed, fixing upload failures in CI.
-
Documentation
- Updated CLI docs, added a blog post and roadmap entry describing the new flags and examples.
-
Tests
- Extensive new and updated unit/integration tests covering flag parsing, behavior permutations, and regressions.
-
Chores
- Updated NOTICE/license references, added missing license entries, bumped dependencies and example default version to 1.217.0.
Document CI statuses configuration options @goruha (#2362)
what
- Document CI statuses configuration options
why
- Improve documentation
Summary by CodeRabbit
- Documentation
- Added docs and example configuration for new CI post-commit status summary options: component, add, change, and destroy (flags default to true in the example).
- Clarified required permissions to enable these status checks (GitHub checks: write or a commit-status-scoped API token for GitLab).
🚀 Enhancements
fix(jit): honor metadata.component subpath for JIT source-provisioned components @zack-is-cool (#2371)
What
JIT-provisioned components can now point at a submodule inside a cloned upstream repo via metadata.component, the same way non-JIT components already can. Every JIT-capable code path is covered — terraform plan/apply, terraform generate varfile, terraform shell, helmfile, packer, and ansible — and the resolver now lives in pkg/component/ rather than being curve-fitted to Terraform inside internal/exec/.
Before this PR, metadata.component: modules/iam-policy was silently ignored on the JIT/workdir code path — atmos cloned the repo to .workdir/<type>/<stack>-<component>/ and ran the underlying tool against that root, so generated files (backend.tf.json, varfile, .terraform/, helmfile state, packer cache, ansible inventory) all landed at the repo root instead of at .workdir/<type>/<stack>-<component>/modules/iam-policy/. The tools then either failed with confusing errors or silently ran against the wrong directory (e.g. a repo root with no .tf files).
Fixes #2364.
Why this matters
Some upstream repos organize modules under modules/<name>/ rather than at the repo root (e.g. terraform-aws-modules/terraform-aws-iam). For JIT to be useful against those repos, atmos needs to clone the whole repo into a workdir (so relative parent-path references like ../../shared-vars.tf resolve) and then run the tool against a specific submodule inside it. Non-JIT components already support this via metadata.component; this PR brings JIT/non-JIT parity for that capability and applies it uniformly across every executor.
What changed (revised after maintainer review)
This PR was originally scoped to Terraform with helpers in internal/exec/. Maintainer feedback on the first review was clear: the helpers belong in pkg/, and the same fix should apply to Helmfile/Packer/Ansible — not be curve-fitted to Terraform.
Both items are addressed in this revision.
New package: pkg/component/
Helpers extracted into a single component-type-parameterized package, used by all four executors:
package component
// Pure path logic (no I/O outside stat).
func ResolveWorkdirSubpath(metadataSubpath, workdirRoot string) (string, error)
// In-place mutation of WorkdirPathKey, idempotent via private sentinel.
func ApplyWorkdirSubpathToSection(info *schema.ConfigAndStacksInfo) (string, error)
// Post-ProcessStacks resolver: BuildPath(componentType) + Resolve.
func BuildAndResolveWorkdirPath(
atmosConfig *schema.AtmosConfiguration,
info *schema.ConfigAndStacksInfo,
componentType string,
) (string, bool, error)
// Full orchestrator: existence check → AutoProvisionSource → Apply → re-check.
// componentType is one of cfg.{Terraform,Helmfile,Packer,Ansible}ComponentType.
func ProvisionAndResolveComponentPath(
ctx context.Context,
atmosConfig *schema.AtmosConfiguration,
info *schema.ConfigAndStacksInfo,
componentType, fallbackComponentPath string,
) (string, bool, error)All five executor entry points collapse to one call
| Entry point | Before | After |
|---|---|---|
internal/exec/terraform_execute_helpers.go (plan/apply/etc.)
| provisionComponentSource(...) (terraform-only helper)
| component.ProvisionAndResolveComponentPath(ctx, ..., cfg.TerraformComponentType, ...)
|
internal/exec/terraform_generate_varfile.go
| tryJITProvision(...) + private checkDirectoryExists (terraform-only reimpl, subpath bolted on)
| component.ProvisionAndResolveComponentPath(ctx, ..., cfg.TerraformComponentType, ...)
|
internal/exec/helmfile.go (prev. lines 121-155)
| 35 lines of inline existence check + provSource.AutoProvisionSource + raw WorkdirPathKey lookup (subpath ignored)
| component.ProvisionAndResolveComponentPath(ctx, ..., cfg.HelmfileComponentType, ...)
|
internal/exec/packer.go (prev. lines 121-155)
| Same pattern as helmfile (subpath ignored) | component.ProvisionAndResolveComponentPath(ctx, ..., cfg.PackerComponentType, ...)
|
pkg/component/ansible/executor.go (prev. lines 380-429)
| Same pattern as helmfile/packer (subpath ignored) | component.ProvisionAndResolveComponentPath(ctx, ..., cfg.AnsibleComponentType, ...)
|
Helmfile, Packer, Ansible, and terraform generate varfile silently inherited the same #2364 bug; this PR fixes them at the same time as Terraform plan/apply, with zero curve-fitting. terraform shell and the post-ProcessStacks resolvers (terraform plan-diff, terraform verify-plan) call component.ApplyWorkdirSubpathToSection and component.BuildAndResolveWorkdirPath respectively for the same reason.
Adjacent behavior change: JIT runs whenever source.uri is set
Before this PR, helmfile/packer/ansible/terraform generate varfile only invoked AutoProvisionSource when the local fallback component dir was missing. Ansible and terraform generate varfile additionally short-circuited the moment that dir existed, never running JIT. After the refactor, all five entry points (terraform plan/apply, terraform generate varfile, helmfile, packer, ansible) take the same path: when source.uri is declared, the source provisioner runs unconditionally, and only the YAML's source.uri decides whether JIT is in play.
This is safe under steady-state operation because AutoProvisionSource already self-debounces via two cache layers — invocationDoneKey (no-ops a second call within the same command lifecycle) and needsProvisioning (skips re-provisioning when the version, URI, and freshness pin all match) — both in pkg/provisioner/source/provision_hook.go. Net effect for users: every JIT-capable entry point now honors source.uri the same way terraform plan/apply always has, and the previously preferred lazy-skip-on-stale-local-dir path is gone.
Post-ProcessStacks resolvers also use the shared helper
internal/exec/terraform_plan_diff.go and internal/exec/terraform_verify_plan.go previously called the terraform-private resolveWorkdirComponentPath; both now call component.BuildAndResolveWorkdirPath(atmosConfig, info, cfg.TerraformComponentType).
Existence-gated subpath join (the disambiguation)
metadata.component has two valid uses for JIT components — a real subdirectory inside the cloned repo, or an inheritance/identity pointer to an abstract base. The fix consults the filesystem rather than guessing from the string. After clone, either the joined subdirectory exists (case 1, apply the join) or it doesn't (case 2, leave the workdir root alone). String-shape heuristics (e.g. checking for /) are unreliable; the filesystem already encodes the right answer after git clone.
An unexported workdirSubpathAppliedKey constant + private subpathAppliedMarker struct type (both in pkg/component/) guard against double-joining if the orchestrator is invoked twice against the same info.ComponentSection map. YAML deserialization can't produce this type, so a stack manifest containing _workdir_subpath_applied: <anything> cannot bypass the join. The constant lives in pkg/component/ rather than pkg/provisioner/workdir/ because read/write access is confined to this package — keeping the protocol single-sourced next to the only code that uses it.
Error sentinel precision
Three sentinels carry distinct meaning across the orchestrator and its callers:
errUtils.ErrProvisionerFailed—AutoProvisionSource(the JIT hook) failed.errUtils.ErrWorkdirProvision— path resolution / stat / abs-subpath rejection failure on the workdir path.errUtils.ErrInvalidComponent— stat failure on the local component directory (the no-source fallback path).
Two related fixes during review:
- The first revision wrapped
AutoProvisionSourcefailures withErrWorkdirProvision, which conflicted with the established pattern inpkg/provisioner/registry.go,internal/exec/terraform_shell.go, and the (now-removed) ansible executor — all of which usedErrProvisionerFailed. Bringing ansible's existing semantics back. - The orchestrator's
componentDirExistshelper used to wrap every stat failure withErrWorkdirProvision, including the!HasSourcebranch where the path is a local component directory, not a workdir. It now takes a sentinel parameter so the wrap matches the actual classification. terraform_shell.gono longer re-wraps an already-wrappedErrWorkdirProvisionwithErrProvisionerFailed; the original sentinel survives in the chain soerrors.Istriage works correctly.
Design notes
.. in metadata.component is allowed. Many upstream Terraform modules reference shared files via relative parent paths (../../shared-vars.tf) and need the full repo on disk with the working directory at a subdirectory. Restricting to strict subpaths would break those layouts. Atmos's threat model assumes a trusted operator running atmos against their own stack configs — metadata.component is YAML-author-controlled, on par with !exec, !template, and !terraform.state, all of which can read or invoke arbitrary host resources. The godoc on ResolveWorkdirSubpath spells this out for future readers.
Absolute metadata.component is rejected. An absolute value violates the documented contract (metadata.component is a relative subpath inside the workdir). filepath.Join would silently coerce it into a child of the workdir root on Unix and apply drive-letter semantics on Windows — coercing it would mask author error. Rejected up-front with a wrapped ErrWorkdirProvision.
Same-name inheritance is a no-op. When metadata.component equals the component instance name (e.g. a component named vpc with metadata.component: vpc), atmos already clears the field during stack processing, so info.BaseComponentPath is empty and filepath.Join is never called.
Trade-off: typos fall through. A typo in metadata.component (e.g. modules/iam-polic when the user meant modules/iam-policy) silently falls back to the workdir root rather than failing fast. This matches pre-PR behavior for invalid subpaths and is logged at debug level for traceability. A future enhancement could surface a warning when the join falls back, distinguishing typos from intentional inheritance-pointer use.
What is not changed
The orchestrator short-circuits at !provSource.HasSource(...), so non-source components never reach the new code. Behavior of each non-JIT shape:
| Component shape | Effect |
|---|---|
| Plain local (no source, no workdir) | Zero change — the orchestrator returns the fallback path immediately. |
Workdir-only + metadata.component
| Zero functional change — workdir-only copies local files to the workdir root, so the candidate <workdirRoot>/<subpath> doesn't exist on disk; the resolver returns exists=false and the original componentPath is preserved.
|
Workdir-only without metadata.component
| Small related fix in plan-diff and verify-plan only: these two commands now resolve to the provisioned workdir root (where state actually lives) instead of falling back to the local component path. Live execution paths were already correct.
|
Other deferred scope:
atmos terraform generate planfile— this command buildscomponentPathdirectly fromatmosConfig.TerraformDirAbsolutePath + info.FinalComponentand never invokes JIT provisioning. As a result,generate planfiledoes not support JIT components today (with or without ametadata.componentsubpath) — a pre-existing limitation, not a regression introduced by this PR. Wiring up JIT provisioning there is out of scope.
Tests
Unit (pkg/component/workdir_path_test.go):
| Test | Covers |
|---|---|
TestResolveWorkdirSubpath_JoinedPathExists
| Joined path is used when the subpath is a directory on disk |
TestResolveWorkdirSubpath_JoinedPathMissingFallsBack
| Falls back to the workdir root when the subpath does not exist (inheritance-pointer case) |
TestResolveWorkdirSubpath_EmptySubpathReturnsRoot
| Empty metadata.component short-circuits to the root
|
TestResolveWorkdirSubpath_AllowsParentSegment
| .. resolves naturally — codifies the design decision
|
TestResolveWorkdirSubpath_RejectsAbsolutePath
| Absolute subpath wraps ErrWorkdirProvision
|
TestResolveWorkdirSubpath_RejectsAbsolutePathOutsideWorkdir
| Absolute path outside the workdir is also rejected |
TestResolveWorkdirSubpath_RegularFileAtCandidate
| Wraps ErrWorkdirProvision when the candidate exists but is not a directory
|
TestApplyWorkdirSubpathToSection_JoinsSubpath
| Mutates WorkdirPathKey to the joined subpath and sets the typed sentinel
|
TestApplyWorkdirSubpathToSection_InheritancePointerPreservesRoot
| Regression guard: when the subpath does not exist, WorkdirPathKey stays at the workdir root
|
TestApplyWorkdirSubpathToSection_NoWorkdirPathKey
| No-op when WorkdirPathKey is absent
|
TestApplyWorkdirSubpathToSection_EmptyWorkdirPath
| No-op when WorkdirPathKey is the empty string
|
TestApplyWorkdirSubpathToSection_DoubleCallAppliesOnce
| Idempotent across repeat calls (init then plan) |
TestApplyWorkdirSubpathToSection_SentinelGatesDoubleJoin
| Negative-path: deleting the sentinel re-enables the join, proving the sentinel is the gate |
TestApplyWorkdirSubpathToSection_UserYAMLCannotForgeSentinel
| YAML-author values (bool, string, int, map) cannot impersonate the typed sentinel
|
TestBuildAndResolveWorkdirPath_ExistingDir
| Returns (joined-path, true, nil) when the workdir subpath exists
|
TestBuildAndResolveWorkdirPath_AllComponentTypes
| Component-type parity: terraform/helmfile/packer/ansible all resolve under .workdir/<componentType>/
|
TestBuildAndResolveWorkdirPath_AllComponentTypesWithSubpath
| Component-type parity: all four honor metadata.component subpath join (issue #2364 across executors)
|
TestBuildAndResolveWorkdirPath_InheritancePointerFallsBack
| Returns (workdir-root, true, nil) when the workdir root exists but the subpath doesn't
|
TestBuildAndResolveWorkdirPath_NonExistentDir
| Returns (candidate, false, nil) when the workdir is not provisioned yet
|
TestBuildAndResolveWorkdirPath_RegularFileAtCandidate
| Wraps ErrWorkdirProvision when the candidate exists but is not a directory
|
TestBuildAndResolveWorkdirPath_StatErrorPropagates
| Wraps ErrWorkdirProvision for non-ENOENT stat failures (EACCES)
|
TestProvisionAndResolveComponentPath_NoSourceReturnsFallback
| Orchestrator short-circuits cleanly when no source declared |
TestProvisionAndResolveComponentPath_NoSourceMissingDir
| Reports exists=false correctly when fallback dir is absent
|
Integration (tests/cli_source_provisioner_workdir_test.go):
| Test | Drives | Asserts |
|---|---|---|
TestJITSource_MetadataComponentSubpath
| atmos terraform generate varfile null-label-exports -s dev
| Generated *.terraform.tfvars.json lands at <workdir>/exports/, not the workdir root. Reverting the fix moves the varfile and fails this assertion.
|
TestJITSource_MetadataComponentSubpath_TerraformShell
| atmos terraform shell null-label-exports -s dev --dry-run
| Captures the dry-run banner from stderr and asserts both the printed Working directory and Component path include the metadata.component subpath. Reverting the fix prints the bare workdir root and fails this assertion.
|
Both use github.com/cloudposse/terraform-null-label@0.25.0 with metadata.component: exports and skip gracefully on offline runners (no GitHub access or no git binary) via the existing precondition helpers.
References
- Issue: #2364
Summary by CodeRabbit
-
New Features
- Workdir-aware component resolution: metadata.component subpaths are applied when resolving JIT-provisioned components so modules can live under workdir subdirectories (e.g., exports/).
-
Bug Fixes
- Consolidated and standardized component provisioning/resolution with improved validation and clearer error propagation across executors.
-
Tests
- Added end-to-end and unit tests covering workdir subpath behavior, provisioning, and error cases.
test(describe-affected): accept all three valid Source values in TestResolveBase_PullRequest_Closed @aknysh (#2388)
what
Fix a CI-blocking test bug in `pkg/ci/providers/github/base_test.go` introduced silently by PR #2380.
`TestResolveBase_PullRequest_Closed` passes on PR runs (where `merge-base` or `HEAD~1` is reachable) but fails on post-merge runs to `main` (where only the documented `event.pull_request.base.sha` fallback is available):
- https://github.com/cloudposse/atmos/actions/runs/25254046463/job/74053358101
- https://github.com/cloudposse/atmos/actions/runs/25254046463/job/74053358088
why
The assertion
```go
assert.Contains(t, res.Source, "merge-base", "HEAD~1")
```
has a silent bug: testify treats the 4th positional argument to `assert.Contains` as the failure message, not an alternate value to match. So the test only ever checked for `"merge-base"`, and quietly passed on PR runs (where `merge-base` or `HEAD~1` was reachable) while failing on post-merge runs to `main` (where the GitHub Actions checkout depth and missing `origin/` fetch leave only the third documented fallback, `event.pull_request.base.sha`).
`ResolveBase()`'s closed-PR fallback chain in `pkg/ci/providers/github/base.go` documents three valid Sources:
- `"merge-base(HEAD, origin/)"`
- `"HEAD~1 (merged PR, merge-base unavailable)"`
- `"event.pull_request.base.sha"`
Replace the broken `Contains`-with-msg call with an explicit OR over the three substrings. The fix matches the test's own existing comment ("merge-base and HEAD~1 may or may not work; either way we get a valid resolution") -- the bug was that the assertion didn't actually check for "either way."
No production code change -- `ResolveBase()` already implements all three fallback paths correctly.
references
fix(ci): use terraform exit code as the source of truth for CI status @osterman (#2382)
what
- Make the terraform exit code the authoritative signal for success/failure (and, for
terraform planwith-detailed-exitcode, for change detection) in the CI summary path. Text parsing of stdout/stderr is downgraded to enrichment only — it still extracts resource counts, output values, and error message bodies, but no longer drives the binaryHasErrors/HasChangesdecisions. - Plumb the exit code through
cmd/terraform/utils.go→pkg/hooks RunCIHooks→pkg/ci ExecuteOptions→plugin.HookContextso the plugin handler has a clean signal independent of output format. - Rewrite
parseOutputWithError(pkg/ci/plugins/terraform/handlers.go) so that:apply/deploy:HasErrors = (exitCode != 0)plan:HasErrors = (exitCode == 1);exitCode == 2also impliesHasChanges- other commands:
HasErrors = (exitCode != 0) - exit-code success discards spurious "Error:" matches from text; exit-code failure still falls back to
CommandError.Error()for the body if text parsing didn't find one.
- Wire the enriched
*plugin.OutputResultfromparseOutputWithErrorthroughwriteSummaryandbuildTemplateContext(it had been silently dropped —writeSummaryhad_ *plugin.OutputResultas its second arg, andbuildTemplateContextre-parsedctx.Outputfrom scratch).buildTemplateContextkeeps anil-fallback so legacy callers continue to work. - Refactor
RunCIHooksto take a*RunCIHooksOptionsstruct (per the repo's options pattern) since the parameter list grew past the linter's max-args limit. - Add tests covering all the new branches: exit-code-only failure rendering, exit-code 2 →
HasChangesfor plan, apply exit 0 with strayError:in output → no error, plus the original failure-summary tests for plan/apply/deploy.
why
- Reported regression:
atmos terraform deploy <component> -s <stack> --upload-statusfailing at the authentication step (before terraform itself ran, exit code 1) still produced a job summary that read## No Changes Applied for eks/karpenter-node-pool in e98d-gov-use1-dsswith aNO CHANGEbadge. The check run was correctly marked failed, but the summary contradicted it. - Root cause was architectural: the CI summary path used text parsing as the primary source of truth for failure/change state. The auth-failure stderr did not match
ExtractErrors's^Error:regex (it's emitted as**Error:**in markdown form), andwriteSummarysilently dropped the already-enrichedOutputResult, so the apply template fell through to the no-changes branch. Anything that fails before terraform runs — auth, OOM, signal kill, network — would have hit the same bug. - Terraform exit codes are well-defined and stable (
apply: 0 = success / non-zero = error;plan -detailed-exitcode: 0/1/2). Using them as the authoritative signal makes the hook robust against output-format drift between Terraform and OpenTofu, and against any pre-terraform failure that produces no parseable output.errUtils.GetExitCodealready unwrapsexec.ExitError,ExecError,exitCoder, andWorkflowStepError, so the existing error chains carry it through without further plumbing.
references
- Affected handlers:
pkg/ci/plugins/terraform/handlers.go(parseOutputWithError,writeSummary). - Affected helper:
pkg/ci/plugins/terraform/plugin.go(buildTemplateContext). - Plumbing:
pkg/ci/internal/plugin/types.go,pkg/ci/executor.go,pkg/hooks/hooks.go,cmd/terraform/utils.go. - Templates (unchanged):
pkg/ci/plugins/terraform/templates/{apply,plan}.mdalready had{{ if .Result.HasErrors }}branches; they just weren't being reached.
Summary by CodeRabbit
Release Notes
-
Bug Fixes
- Improved error detection and failure reporting by treating command exit codes as the authoritative indicator of success/failure, fixing edge cases where errors occur before terraform produces output.
- Enhanced CI/check-run status accuracy for
planandapplyoperations, properly handling plan changes and command execution failures.
-
Tests
- Added comprehensive test coverage for exit code handling, error state reconciliation, and CI hook execution workflows.
fix(auth): preserve AWS SDK error in assume-role / web-identity / assume-root failures @aknysh (#2385)
what
- Adds
WithCause(err)at the three STS error sites in
pkg/auth/identities/aws/:assume_role.go— standardAssumeRolepath.assume_role.go—AssumeRoleWithWebIdentity(OIDC) path.assume_root.go—AssumeRoot(centralized root access) path.
- Adds regression tests in
pkg/auth/identities/aws/assume_sdk_error_test.go
that point STS at a localhttptest.Serverreturning AWS-style XML
error envelopes (via the existingaws.resolver.urlmechanism). Each
test asserts the sentinel is preserved (errors.Is(err, ErrAuthenticationFailed)), the AWS error code and message are
reachable inerr.Error(), and the SDK error is also reachable
througherrors.As(err, &smithy.APIError). - Adds
docs/fixes/2026-05-01-assume-role-error-swallows-aws-cause.md
documenting the issue and fix.
why
- The three error sites built an enriched error with
errUtils.Build(ErrAuthenticationFailed).WithExplanation(...).WithHint(...).Err()
but never threaded the underlying SDKerrinto the chain. Operators
saw onlyauthentication failed: identity=<name> step=<n>: authentication failedwith no AWS context. - That made it impossible to tell, without re-running under
ATMOS_LOGS_LEVEL=Debug, whether the failure wasAccessDenied,
NoSuchEntity,InvalidIdentityToken,ExpiredTokenException,
MalformedPolicyDocumentException, throttling, etc. Each has a
different remediation; the hint list ("verify the role ARN", "check
the trust policy", ...) effectively enumerated every plausible cause
because the actual one had been dropped. - The error builder already exposes
WithCause(err)for exactly this
case (errors/builder.go:104-167). It chains the cause via
fmt.Errorf("%w: %w", sentinel, cause), preserves the sentinel for
errors.Ischecks, and merges any hints/safe details the cause
already carried. The canonical pattern is already used at
pkg/auth/identities/aws/webflow_token.go:88-97. The three assume
sites just hadn't adopted it yet. - After the fix, the same failure renders with the AWS-side reason
inline:
authentication failed: identity=<name> step=<n>: authentication failed: operation error STS: AssumeRoleWithWebIdentity, https response error StatusCode: 403, RequestID: ..., api error AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
— which makes the trust-policy / token / role-ARN problems
diagnosable from the first run. - Verified by reverting just the three
WithCause(err)lines and
confirming the new tests fail; restoring the fix turns them green
again. Fullpkg/auth/...test suite (~25 packages) passes.
references
docs/fixes/2026-05-01-assume-role-error-swallows-aws-cause.md—
full root-cause writeup, code paths, and rationale (added in this
PR).errors/builder.go:104-167—WithCause/WithCausefhelpers
used by the fix.pkg/auth/identities/aws/webflow_token.go:88-97— canonical
pattern referenced as the model for these three sites.pkg/auth/manager_chain.go:570— chain wrapper that already
expected the leaf to thread the cause via the trailing%w; this
PR makes the leaf actually do so.
Summary by CodeRabbit
-
Bug Fixes
- Preserve and surface underlying AWS STS error details in authentication failures while retaining existing sentinel behavior.
-
Tests
- Added regression tests that verify sentinel preservation, inclusion of AWS error text, and access to typed SDK errors across multiple STS error scenarios.
-
Documentation
- Added a doc with before/after examples and end-to-end test descriptions for the error-handling change.
fix(describe-affected): resolve PR base via merge-base with shallow-clone self-heal @osterman (#2380)
what
Fix atmos describe affected reporting many more affected components
than the PR actually modified, specifically when the PR is out of
date with the target branch.
pkg/git/merge_base.goaddsMergeBaseWithAutoFetchthat runs a
targetedgit fetch origin <target>(and optionally one
--deepen=200) whenMergeBasecan't resolve. Bounded retries.pkg/ci/providers/github/base.go:resolvePRBasekeeps merge-base as
the primary strategy and drops the buggy last-resort path that
returnedrefs/remotes/origin/<target>(which downstream resolved
to the current tip of the target branch, producing the
false-positives). New last-resort isevent.pull_request.base.sha,
which is frozen at the last PR sync and never points to the
current tip.ExecuteDescribeAffectedWithTargetRefCheckoutaccepts a new
targetBranchparameter and self-heals viagit fetchwhen
worktree creation hits a missing target commit.- Adds
pkg/git/fetch.go(FetchRef,DeepenFetch) lifted from
PR #2285. NewTargetBranchfield onBaseResolutionand
DescribeAffectedCmdArgs.
why
A customer reported that atmos describe affected on an out-of-date
PR listed components the PR did not touch. The root cause was a
fallback path documented as "handles this gracefully" in the PRD
that, in practice, silently produced wrong results when the local
repo was a shallow checkout (the actions/checkout@v4 default).
Walkthrough and rationale are in
docs/fixes/2026-04-30-describe-affected-out-of-date-pr.md.
The user's suggestion — using pull_request.merge_commit_sha as
the base — would also work and is documented as a considered
alternative in the fixes doc. We chose merge-base + auto-fetch
because it preserves the existing PRD architecture, doesn't require
fetching M's parent separately, and works naturally with
actions/checkout@v4's default merge-ref checkout.
supersedes #2285
PR #2285 proposed promoting pull_request.base.sha to the primary
strategy. This PR keeps merge-base as primary (gold standard) and
uses base.sha only as a fallback that replaces the buggy
ref-tip path. The fetch helpers and signature plumbing are lifted
from #2285; credit to the original work.
tests
pkg/git/merge_base_test.go: new
TestMergeBaseWithAutoFetch_RecoversFromMissingRefbuilds an
origin/clone pair, deletesorigin/mainto simulate a shallow
CI checkout, and asserts the recovered SHA is the fork point —
not the current main tip.pkg/ci/providers/github/base_test.go:
TestResolveBase_PullRequest_OutOfDate_FallsBackToPayloadSHA
reproduces the customer scenario at unit-test level.internal/exec/describe_affected_test.go:TestResolveBaseFromCI
hardened to requiredescribe.SHAis populated and
describe.Refempty — guards against any future regression that
re-introduces the ref-tip fallback.
references
Summary by CodeRabbit
- Bug Fixes
- More reliable PR base resolution in CI: auto-fetch + one-step deepen for shallow checkouts, targeted ref retry when refs are missing, and safer fallback to event payload base SHA to reduce false positives.
- New Features
- Merge-base recovery with targeted fetch/deepen, worktree retry on missing commits, and explicit propagation of PR target branch for CI resolutions.
- Documentation
- Updated CI/base-resolution docs and troubleshooting note for out-of-date PRs.
- Tests
- New and expanded unit/integration tests covering recovery, fetch/deepen, and fallback paths.
fix: authbridge resolver reads auth context from manager's stackInfo, not caller's @MrZablah (#2379)
what
- Fix !store.get failing with "AWS auth context not available" when a store backend is configured with an identity: field
- authbridge.Resolver now reads the post-authentication AWS/Azure/GCP context from the auth manager's own internal stackInfo (via GetStackInfo())
instead of the caller's stackInfo - Add regression test TestResolveAWSAuthContext_PointerMismatch that directly reproduces the pointer mismatch scenario
why
- pkg/auth.createAuthManagerInstance allocates its own *schema.ConfigAndStacksInfo for the auth manager — a different pointer than the info passed by
the terraform executor to authbridge.NewResolver - After AuthManager.Authenticate() succeeds, PostAuthenticate writes credential file paths and profile info into the manager's own
stackInfo.AuthContext.AWS, never the caller's info - The resolver was checking r.stackInfo.AuthContext.AWS (the caller's pointer, always nil) instead of r.authManager.GetStackInfo().AuthContext.AWS (the
manager's pointer, populated by auth) - Result: every !store.get call with an identity: configured would succeed at authentication but then immediately fail with "AWS auth context not
available"
references
- closes #2377
Summary by CodeRabbit
-
Bug Fixes
- Fixed auth context resolution so cloud-specific authentication is sourced from the auth manager rather than resolver-held data.
-
Chores
- Pinned Go toolchain to 1.26.2.
-
Tests
- Updated resolver tests to model manager-owned stack info separately and added a regression test for pointer-mismatch behavior.
test: increase test coverage in pkg/flags, pkg/filesystem, pkg/http, and pkg/function @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2173)
- [x] Explore all affected files - [x] internal/exec/stack_processor_utils_test.go: Convert hardcoded path strings to filepath.Join (both test functions) - [x] pkg/filesystem/export_test.go: Add trailing period to inline comment on line 35 - [x] Build & test verification💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.
Summary by CodeRabbit
- New Features
- Bounded, configurable glob-pattern cache (TTL, max entries, empty-result toggle) with runtime metrics exposed via /debug/vars
- Safer GitHub auth handling with host allowlisting and Authorization stripping on cross-host redirects
- Bug Fixes
- Consistent non-nil empty-slice result for glob no-matches and improved cache correctness
- Documentation
- Added changelog and minimum Go toolchain guidance (go.mod → Go 1.26+)
- Tests
- Large suite of new tests across globbing, atomic writes, flags, and HTTP client
- Chores
- New test-race Makefile target (race detector + shuffled execution)
fix(output): stop after-* hooks from corrupting backend.tf.json when backend uses !terraform.state @zack-is-cool (#2358)
Summary
Fixes #2356. The after-terraform-apply store hook path regenerated
backend.tf.json / providers_override.tf.json from un-rendered
component sections when the backend referenced !terraform.state,
overwriting a correctly-rendered file with literal YAML-function strings:
- "bucket": "atmos-tfstate-dev",
- "dynamodb_table": "atmos-tfstate-lock-dev",
+ "bucket": "!terraform.state tfstate-backend dev s3_bucket_id",
+ "dynamodb_table": "!terraform.state tfstate-backend dev dynamodb_table_name",The hook then failed its tofu output call with:
Error: Backend initialization required: please run "tofu init"
Reason: Backend configuration block has changed
Why
Regression introduced in v1.216.0 by #2309 (commit 3c0e748ce) +
follow-up commit c7ef142a9 ("fix: skip-init should skip yaml function
evaluation"). c7ef142a9 added a guard disabling YAML-function
evaluation when SkipInit && authManager == nil to avoid failing on
auth-requiring functions in the post-hook context. The guard is overly
broad — it also disables evaluation of non-auth functions like
!terraform.state — so sections returned from DescribeComponent retain
literal YAML-function strings. execute() then extracts config.Backend
from those sections and writes them to disk via GenerateBackendIfNeeded.
Fix
Thread processYamlFunctions bool through execute() in
pkg/terraform/output/executor.go and guard the artifact-regeneration
block (Step 4 / Step 5) behind it. When YAML functions were not
evaluated upstream, execute() must not regenerate artifacts from the
un-rendered sections. The backend file on disk from the init/apply phase
is already correct; leaving it alone is always safe. Output reading
(tofu output) still works via the on-disk state.
Minimal, localized diff — four commits:
refactor(output): inject BackendGenerator and thread processYamlFunctions through execute()— pure DI plumbing, no behavior change.fix(output): skip artifact regeneration when YAML functions were not processed— the actual guard.test(output): assert backend-generator calls match processYamlFunctions in SkipInit tests— locks in the invariant in four existing SkipInit tests.test(output): regression test for #2356 backend.tf.json corruption— byte-identical integration assertion.
Test plan
- New unit test
TestExecutor_Execute_SkipsArtifactRegen_WhenYamlFunctionsNotProcessed(demonstrably red before the guard, green after). - Four existing SkipInit tests strengthened with zero-call expectations on the backend-generator mock.
- Inverse assertion in
TestExecutor_GetAllOutputs_SkipInit_WithAuthManager_ProcessesYamlFunctions:GenerateBackendIfNeeded+GenerateProvidersIfNeededcalled exactly once when auth is present. - Integration regression test
TestExecutor_Regression_Issue2356_BackendFileUnchangedInSkipInitPath: writes a renderedbackend.tf.json, drivesGetOutputWithOptions(SkipInit=true, authManager=nil), asserts the file is byte-identical. Fails without the guard; passes with it. -
go test ./pkg/terraform/output/... -count=1green. -
make lint/./custom-gcl run --new-from-rev=origin/mainclean (oneduplwarning on the new test vs the existing SkipInit test is suppressed with//nolint:dupl+ justification — they test contrasting invariants at the same call site; extracting shared scaffolding would obscure the red/green comparison). - Manual end-to-end via LocalStack + Redis repro (
ignore/issues/post-apply-hook-backend-racecondition/repro.shin the branch, referenced from #2356). Exits 0 withFIX VERIFIEDon this branch; backend file byte-diff is empty after the after-apply hook. - CI full suite — opening this PR runs it.
Follow-up
The processYamlFunctions = false guard in GetOutputWithOptions /
fetchAndCacheOutputs is the deeper design issue — auth availability
should not gate evaluation of non-auth YAML functions. Tracked in #2357.
This PR is the minimal regression fix for v1.216.x.
Release
fix:conventional commit → patch release (v1.216.1).- No schema changes, no user-facing config changes.
- No roadmap update (regression fix, not a feature).
Summary by CodeRabbit
-
Bug Fixes
- Backend and provider override files are regenerated only when YAML functions are processed, preventing unnecessary rewrites.
- Fixed a case where skip-initialization could overwrite already-rendered backend/provider files, preserving existing configurations.
-
Tests
- Added regression tests to ensure backend/provider files remain unchanged in the skip-initialization path and to validate correct conditional regeneration behavior.
Associated Pull Requests
Deployment Status
To view the Atmos Pro deployment status of this release, see #2390.