feat(auth): interactive profile suggestion + profiles.default @osterman (#2333)
what
- Add interactive profile suggestion for two scenarios where the loaded
atmos.yamllacks the auth config the user is asking for:- Identity-specific:
atmos … --identity foowhenfooisn't defined in the base config but exists in one or more profiles. - Identity-agnostic:
atmos auth login/exec/shell/env/console/whoamiwhen the base config has noauth.identities/auth.providersat all but at least one profile does.
- Identity-specific:
- Interactive terminals get a themed
huhprompt (yes/no for a single match, select list for multiple). Picking a profile re-execs Atmos with--profile <picked>prepended to the original argv. - Non-interactive terminals (CI, pipes) get the original error enriched with hints naming the candidate profile(s) and the exact
--profile <name>flag to re-run. - Add
profiles.defaultconfig field — implicit default profile loaded when--profileandATMOS_PROFILEare unset; explicit selection always wins; nested defaults are ignored to prevent recursion. - Loop guard
ATMOS_PROFILE_FALLBACK=1prevents infinite re-prompt cycles when the picked profile also fails to resolve the identity. - Ctrl+C / Esc abort the prompt cleanly with a single
User aborted.message instead of cascading identity-not-found / authentication-failed errors. - Prompt titles use
ui.FormatInline()so backticks render as styled inline code, matching the rest of Atmos. - New
cfg.ProfilesWithIdentity()andcfg.ProfilesWithAuthConfig()helpers (scoped Viper, no global state pollution) to discover candidate profiles. - New
AuthManager.MaybeOfferAnyProfileFallback()interface method called by every relevant auth command before returning the terminal "no identity" error.
why
- Profile-based repos (e.g.
cloudposse/infra-livepatterns) put allauth.identities/auth.providersinsideprofiles/<name>/atmos.yamland leave the rootatmos.yamlminimal. Runningatmos auth loginthere errored withno providers availableand no actionable next step — users had to read source / docs to discover they needed--profile. - The fix turns a dead-end error into a one-keystroke recovery in interactive sessions and a copy-pasteable command in CI logs.
profiles.defaultlets teams pin a sensible profile (e.g.developer) without forcing every shell to exportATMOS_PROFILE, while keeping explicit overrides authoritative.
references
- PRD:
docs/prd/interactive-profile-suggestion.md - Blog:
website/blog/2026-04-16-interactive-profile-suggestion.mdx - Docs:
website/docs/cli/configuration/profiles.mdx
Summary by CodeRabbit
-
New Features
- profiles.default in atmos.yaml (precedence: --profile → ATMOS_PROFILE → profiles.default). Interactive profile suggestion to re-run with --profile when identities or auth config are missing; non‑TTY flows provide enriched rerun hints. Re-exec depth guard and portable re-exec helpers for safer re-run behavior.
-
Bug Fixes
- Clearer, more actionable error messages and improved fallback routing when identities or auth config are absent.
-
Documentation
- Added PRD, CLI docs, and blog post describing the behavior.
-
Tests
- Extensive unit/integration tests covering fallback flows, profile discovery, re-exec, and helpers.
fix: Add missing doc redirects for old core-concepts URLs @osterman (#2287)
what
- Adds 25 new client-side redirects for old
/core-concepts/URLs that are still indexed by Google and cached by LLMs, causing 404 errors - Fixes 2 existing redirects that had invalid trailing slashes on
/vendor/component-manifest/targets (was causing Docusaurus build validation errors)
New redirect categories:
- 4 screenshot-confirmed 404s (vendoring, component-management, provisioning, schemas)
- 7 project section redirects (
/core-concepts/projects/*→/projects/and/cli/configuration/) - 7 stacks sub-pages (define-components, settings, components, backend, vars, env, providers)
- 2 share-data / remote-state redirects
- 2 vendor sub-pages (component-manifest, vendor-manifest)
- 1 describe page redirect
- 2 component sub-pages (packer, ansible)
why
- Old
/core-concepts/URLs are still indexed by Google and widely cached in LLM training data - LLMs frequently generate links to these old URLs when helping users with Atmos, leading to broken links and poor developer experience
- Each broken URL was verified by live-fetching the page and confirming a 404 response
- Each redirect target was cross-referenced against
llms.txtto ensure validity
references
- Verified via
site:atmos.tools/core-conceptsGoogle searches - All redirect targets validated against the Docusaurus build (
npm run buildpasses)
Summary by CodeRabbit
-
Bug Fixes
- Fixed numerous broken documentation links and improved navigation by adding and updating redirect rules across Projects, Stacks, Components, Vendor, and related pages (including removal of trailing-slash redirect mismatches) so users are directed to correct docs URLs.
-
Chores
- Updated CI workflow runner constraints to refine automated job scheduling.
feat(list): add matrix output format to list instances command @johncblandii (#2322)
what
- Add
--format=matrixsupport toatmos list instances, producing GitHub Actions-compatible matrix JSON identical toatmos describe affected --format=matrix - Add
--output-fileflag for writing results inkey=valueformat (for$GITHUB_OUTPUT) - Extract shared matrix types and output logic into
pkg/matrix/for DRY reuse across bothdescribe affectedandlist instances
why
- CI/CD pipelines need matrix output from
list instancesto drive parallel GitHub Actions jobs, just likedescribe affectedalready supports - Sharing the matrix output logic between commands avoids duplication and ensures consistent output format
- The
--output-fileflag enables direct integration with GitHub Actions$GITHUB_OUTPUTwithout shell redirection
references
- Output format matches
atmos describe affected --format=matrixexactly:{"include":[{"stack":"...","component":"...","component_path":"...","component_type":"..."}]} - When using
--output-file, writesmatrix=<json>andaffected_count=<N>lines
Summary by CodeRabbit
-
New Features
- Added --format=matrix to emit GitHub Actions–compatible matrix JSON
- Added --output-file / -o to write matrix results as key=value (for $GITHUB_OUTPUT); only supported with --format=matrix
- Matrix entries include stack, component, component_path, and component_type
- --format=matrix disallows --upload and triggers CI-friendly output behavior
-
Tests
- Added coverage for matrix format, output-file flag, and file/stdout writing
-
Documentation
- Added docs, blog post, and roadmap entry for matrix support
chore: Update Atmos Pro workflow to use v1.215.0 container image @osterman (#2323)
what
- Updated the Atmos Pro CI workflow container image from
1.214.0to1.215.0 - Removed the "Build atmos from source" step that compiled a dev binary via
go build - Changed
atmos docs generate readmeandatmos pro committo use the container's pre-installedatmosbinary instead of/tmp/atmos-dev
why
- Atmos v1.215.0 ships with the
pro commitcommand built-in, so building from source is no longer necessary - Simplifies the CI workflow and reduces build time by eliminating the Go compilation step
references
- #2298 (
atmos pro commitfeature)
Summary by CodeRabbit
- Chores
- Updated GitHub Actions workflow to use atmos container image version 1.215.0 (upgraded from 1.214.0).
- Streamlined workflow execution by removing the local build step and invoking atmos directly from the container image.
🚀 Enhancements
Fix multi-region provider aliases generating incorrect Terraform JSON format @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2210)
When configuring providers with dot-notation aliases (e.g., aws.use1), the generated providers_override.tf.json emitted invalid structure — separate top-level keys instead of the array-of-objects format Terraform's JSON syntax requires for multiple provider instances.
Changes
pkg/terraform/output/backend.go: Added exportedProcessProviderAliasesthat detects dot-notation provider keys, groups all configurations for the same provider type into an array (base config first, aliases sorted), and leaves non-aliased providers unchangedinternal/exec/utils.go: UpdatedgenerateComponentProviderOverridesto delegate totfoutput.ProcessProviderAliases, eliminating duplicated logic
Example
Given stack config:
providers:
aws:
region: us-east-2
aws.use1:
region: us-east-1
alias: use1Before:
{ "provider": { "aws": { "region": "us-east-2" }, "aws.use1": { "alias": "use1", "region": "us-east-1" } } }After:
{
"provider": {
"aws": [
{ "region": "us-east-2" },
{ "alias": "use1", "region": "us-east-1" }
]
}
}Original prompt
This section details on the original issue you should resolve
<issue_title>Multi-Region with Provider Aliases example is not working</issue_title>
<issue_description>### Describe the Bughttps://atmos.tools/stacks/providers#multi-region-with-provider-aliases, this example is not working, the actual generated file is different from the example.
Expected Behavior
The generated file is the same as the example.
Steps to Reproduce
With the following atmos component config:
components: terraform: eip: providers: aws: region: us-east-2 aws.use1: region: us-east-1 alias: use1 metadata: component: eipRun atmos command and check the output of providers_override.tf.json
Screenshots
The content of the generated providers_override.tf.json
{ "provider": { "aws": { "region": "us-east-2" }, "aws.use1": { "alias": "use1", "region": "us-east-1" } } }Would expect it to be :
{ "provider": { "aws": [ { "region": "us-east-2" }, { "alias": "use1", "region": "us-east-1" } ] } }Environment
- OS: OSX
- Version: 1.209.0
- Terraform version: v1.14.7
Additional Context
No response</issue_description>
Comments on the Issue (you are @copilot in this section)
- Fixes #2208
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.
Summary by CodeRabbit
-
New Features
- Added support for provider aliases—both explicit and auto-derived from dot-notation provider keys (e.g.,
aws.use1). - Providers are now properly grouped into arrays in generated Terraform provider override files.
- Added support for provider aliases—both explicit and auto-derived from dot-notation provider keys (e.g.,
-
Tests
- Added integration tests for provider alias scenarios.
-
Documentation
- Updated provider documentation to clarify alias auto-derivation behavior.
fix(list): gate `list instances --upload` on `settings.pro.enabled` @osterman (#2330)
what
- Change
atmos list instances --uploadto filter instances bysettings.pro.enabled == true(strict boolean) instead ofsettings.pro.drift_detection.enabled == true. - Rename
isProDriftDetectionEnabled→isProEnabledand simplify the check to a single lookup onsettings.pro.enabled;drift_detection.enabledis no longer consulted. - Update all unit, integration, comprehensive, cmd, and benchmark tests to the new fixture shape; add an explicit case proving
pro.enabled: truewithdrift_detection.enabled: falseis now enabled. - Update
website/docs/cli/commands/list/list-instances.mdxto document the filter criterion under--upload, in the examples section, and in the:::tipblock (noting it must be a boolean, not the string"true").
why
- Users with
settings.pro.enabled: trueconfigured on their components were hittingNo Atmos Pro-enabled instances found; nothing to upload.even when Pro was clearly enabled, because the filter required the narrowerdrift_detection.enabledsub-key. settings.pro.enabledis the correct top-level enablement flag for Pro; drift detection is one feature among several and shouldn't gate the whole upload.- The docs previously described
--uploadwithout specifying what made an instance eligible, so the failure mode was invisible to users.
Behavior change (callout)
Components that previously qualified via only settings.pro.drift_detection.enabled: true (without pro.enabled: true) will now be excluded from --upload. Users in that shape must add settings.pro.enabled: true.
references
--uploadwas introduced in #2322
Summary by CodeRabbit
-
Bug Fixes
- Pro detection simplified: only an explicit boolean settings.pro.enabled=true marks an instance as Pro; missing/non-boolean values are treated as disabled.
- Upload behavior: all collected instances are uploaded; post-upload summary shows total uploaded plus enabled/disabled and drift-enabled counts.
- Improved Pro authentication hints for GitHub Actions and workspace ID.
-
Documentation
- CLI docs updated to reflect new upload semantics, payload shape, and the "No instances found; nothing to upload." message.
-
Tests
- Tests updated/added to cover the new Pro flag shape, counting, and upload behavior.
Fix: Identity names with dots incorrectly parsed by Viper @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2129)
- [x] Initial plan for fixing identity names with dots - [x] Add `fixAuthIdentities()` to re-parse identities from raw YAML - [x] Extract shared decode hooks into `getAtmosDecodeHookFunc()` - [x] Apply fix in `LoadConfig()` and `loadConfigFromCLIArgs()` - [x] Add test case `TestIdentityNamesWithDots` - [x] Use atmosConfig in perf.Track for consistency - [x] Remove debug log message that caused test snapshot failures - [x] Add error handling test cases to increase coverage to 84.6%Original prompt
This section details on the original issue you should resolve
<issue_title>Zero-Configuration AWS SSO Identity Management: identity containing dots break it.</issue_title>
<issue_description>### Describe the BugTesting
auth: providers: sso-prod: kind: aws/iam-identity-center start_url: https://my-org.awsapps.com/start region: us-east-1 auto_provision_identities: true # One line to enableI do get a list of identities in
~/.cache/atmos/auth/sso-prod/provisioned-identities.yaml.Some of them contains dots, e.g.
product.usa/ReadOnlyAccess: # <=== The "." here breaks it kind: aws/permission-set provider: sso-prod via: provider: sso-prod principal: account: id: "000000000000" name: product.usa name: ReadOnlyAccessWhich atmos does not support:
$ atmos auth list Initialize Identities Error: invalid identity kind ## Explanation unsupported identity kind: Initialize Identities Error: failed to initialize identities: invalid identity config: identity=product: invalid identity kind: unsupported identity kind: Error Error: invalid auth config: failed to create auth manager: failed to initialize identities: invalid identity config: identity=product: invalid identity kind: unsupported identity kind:Expected Behavior
it works :-)
Steps to Reproduce
Cf .bug description
Screenshots
No response
Environment
atmos 1.207.0
Additional Context
No response</issue_description>
Comments on the Issue (you are @copilot in this section)
- Fixes #2128
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.
fix(toolchain): resolve aliases in `toolchain exec` / `toolchain which` lookups @osterman (#2332)
what
- Route
findBinaryPath(used byatmos toolchain execandatmos toolchain which) through the existing alias-awareLookupToolVersionhelper instead of a rawtoolVersions.Tools[name]map lookup. - Derive
owner/repofrom the resolved canonical key so the computed install path matches what the write side persisted. - Add a regression test that reproduces the bug:
.tool-versionsstoringhelm/helm 3.20.2+ an aliashelm → helm/helmnow resolves viaWhichExec("helm").
why
- Symptom:
atmos toolchain install helm@3.20.2succeeds, butatmos toolchain exec -- helm …then errors withtool 'helm' not configured in .tool-versionsand tries to re-install. - Root cause: the write side already canonicalizes via the resolver (
wouldCreateDuplicate→aliasConflictsWithFullName), so entries land under theowner/repokey. The read side did a raw map lookup with no resolver, so an alias query missed the canonical entry — the classic write/read asymmetry. - Fix keeps the read side symmetric with the write side by reusing the helper that already exists for exactly this purpose.
references
- Out of scope, tracked separately:
RunInstallpersisting the literal stringlatestto.tool-versionswhen installing without an explicit version, and wiringpkg/toolchain/filemanager/pkg/toolchain/lockfileinto install/uninstall/set/exec.
Summary by CodeRabbit
- Bug Fixes
- Fixed tool alias resolution to correctly locate binary paths when requesting tools by their registered alias names instead of canonical identifiers. The system now properly maps aliases to their resolved canonical entries before checking availability.
fix: resolve JIT workdir path for !terraform.state, !terraform.output, and atmos.Component @zack-is-cool (#2328)
What
Bug fix PR. Makes !terraform.state, !terraform.output, and atmos.Component work correctly for JIT workdir components (provision.workdir.enabled: true). All three were silently broken in ways that only surfaced at runtime.
Four fixes:
!terraform.statepath resolution — resolves state path from.workdir/terraform/<stack>-<component>/instead of the static source directory JIT components never write state to.!terraform.output/atmos.Componentauto-provision — provisions the JIT workdir beforeterraform initso output references work on any machine, not just ones with a pre-existing workdir from a prior apply.- Source-provisioned JIT workdir support — Fix 2 only handled local-copy provisioning. For
source.uricomponents,!terraform.outputnow hydrates from the source URI before init. Also fixesextractComponentNamefallback and a go-getterFileGetterdst-must-not-exist invariant. - Provisioner output interleaving —
ui.ClearLine()before status writes prevents the bubbletea spinner from leaving leading whitespace on provisioner messages.
Correctness & security fixes:
- TOCTOU race —
sync.Map.Load+Storereplaced withLoadOrStoreinside the singleflight closure, eliminating the window where two goroutines could both enterProvision. - Context cancellation — switched to
singleflight.DoChan+selectso waiters with cancelled contexts exit immediately. Addedcontext.WithoutCancelso leader cancellation doesn't abort shared provisioning work. - Path traversal guard —
extractComponentPathverifies the derived workdir path stays withinfilepath.Abs(basePath)before returning it; escaping paths fall back tocomponentPath. Mirrors the existing guard interraform_backend_local.go. - Actionable error hint —
ErrWorkdirProvisionnow includes the full YAML path and env var to disable auto-provisioning. loadConfigFromCLIArgsenv var bug —setEnv(v)was missing on the--config/--config-pathcode path, silently ignoring allATMOS_*overrides when config was loaded from CLI args.- Documentation —
auto_provision_workdir_for_outputsandATMOS_COMPONENTS_TERRAFORM_AUTO_PROVISION_WORKDIR_FOR_OUTPUTSadded to the config/env var reference docs.
Why
JIT workdir components write their Terraform files to .workdir/terraform/<stack>-<component>/ via a before.terraform.init hook — but that hook only fires during direct atmos terraform commands, not YAML function evaluation. Three distinct silent failures resulted:
!terraform.statelooked in the source directory where JIT components have no state — unconditional failure.!terraform.outputcomputed the correct workdir path but never populated the directory before callingterraform init— fails with "no such file or directory" on any cold machine.!terraform.output+source.uri— even with Fix 2,ProvisionWorkdironly copies local files. Source-provisioned components needAutoProvisionSourcefirst, which only fires in the hook system the output executor never reaches.
Note on Fix 3 (source.uri components)
!terraform.output against a source-provisioned component with a cold workdir will fetch from source.uri — the same credentials already needed for atmos terraform apply. The fetch is cached per (stack, component) pair per process.
Set auto_provision_workdir_for_outputs: false (or ATMOS_COMPONENTS_TERRAFORM_AUTO_PROVISION_WORKDIR_FOR_OUTPUTS=false) to disable Fixes 2 and 3.
For state-only reads, prefer !terraform.state — no init, no source fetch, no terraform binary required.
Migration
No breaking changes. Previously-failing commands now work.
# Before (runs terraform init + output on every eval):
vpc_id: '{{ (atmos.Component "vpc" .stack).outputs.vpc_id }}'
# After (reads state file directly, no init):
vpc_id: !terraform.state vpc {{ .stack }} vpc_idResolves #2167
Summary by CodeRabbit
-
New Features
- Auto-provision JIT working directories before Terraform output evaluation (configurable, enabled by default).
- Template/YAML functions can resolve state/outputs from JIT-provisioned and source-backed components.
-
Security / Bug Fixes
- Containment checks to prevent path traversal outside configured base path.
- Safer fallbacks and debug logging when workdir/state resolution fails.
-
Documentation
- Docs and env var added for the new auto-provision setting.
-
Tests
- Extensive unit/integration tests covering JIT provisioning, resolution, caching, concurrency, and inheritance.
fix(auth): crash on standalone `ambient` identity; add global panic handler @aknysh (#2334)
what
- Fix a hard
SIGSEGVwhen Atmos authenticates a standaloneambientidentity (kind: ambient). Everyatmos auth login/atmos auth whoami/atmos terraform ...against such an identity crashed the process with a Go stack trace. - Add a process-wide panic handler (
pkg/panics) so any future uncaught panic renders a short, actionable crash message viapkg/uiinstead of a raw Go goroutine dump, while preserving the full stack trace in a crash-report file for bug reports. - Update
github.com/mikefarah/yq/v4(4.52.5 → 4.53.2) and migrate Atmos's yq logger setup to the new slog-based API.
1. Ambient identity crash (primary fix)
Background: the generic ambient identity kind (docs/prd/ambient-identity.md) is a cloud-agnostic passthrough — Authenticate() returns (nil, nil) by design because credentials are resolved by the cloud SDK at subprocess runtime (IRSA / IMDS / ECS task role / environment), not by Atmos.
Bug: the auth manager forwarded those nil credentials straight to buildWhoamiInfo, which unconditionally invoked a method on the credential interface, producing a nil-interface dereference on the main goroutine.
Scope: standalone generic ambient identities. The AWS-specific aws/ambient was not affected because its Authenticate() resolves via the AWS SDK default chain and always returns real credentials.
Fix: buildWhoamiInfo now short-circuits safely when creds == nil and still returns a populated WhoamiInfo (realm, provider, identity, environment, timestamp). Environment is populated unconditionally so atmos auth whoami continues to report the expected surface for pure-passthrough ambient identities. Keystore cache, reference handle, BuildWhoamiInfo, and GetExpiration branches are skipped — there is nothing to cache for an identity that does not own credentials.
Tests:
TestManager_buildWhoamiInfo_NilCredentials— unit coverage of the nil-creds branch. Before the fix, this test panicked atmanager_whoami.go:25.TestManager_Authenticate_Ambient_Standalone— end-to-end via realNewAuthManager+Authenticate(). Before the fix, this path panicked in the same location throughmanager.go:294.
Both pass post-fix alongside the existing whoami tests.
Full write-up: docs/fixes/2026-04-17-ambient-identity-nil-credentials.md.
2. Global panic handler
Motivation: the ambient crash surfaced as a wall of Go runtime output that was useless to end users. Any future bug of the same shape would produce the same bad experience. The handler is defensive infrastructure, not a workaround for the ambient fix — both ship together so a regression cannot reintroduce a raw crash.
Behavior:
- One deferred
panics.Recover(&exitCode)at the top ofmain.run()covers every code path reachable synchronously fromcmd.Execute()— every command, theinternal/exec/pipeline,pkg/auth/,pkg/stack/, etc. Installed beforedefer cmd.Cleanup()so Cleanup runs normally on clean exit and Recover also catches anything that escapes Cleanup itself. - User-facing output uses
pkg/uiexclusively (per CLAUDE.md I/O/UI rules): red ✗Atmos crashed unexpectedlyheadline, Markdown-rendered body with panic summary, version, OS/arch, Go build toolchain, command-line, crash-report path, and an issue-tracker link. - Full stack is shown inline only when
ATMOS_LOGS_LEVEL=Debugor=Trace(case-insensitive). Otherwise it is written to a0o600crash report at$TMPDIR/atmos-crash-<UTC>-<pid>.txtwhose path appears in the friendly message. - The panic is wrapped via
cockroachdb/errors.WithStackand forwarded toerrUtils.CaptureError, so Sentry (when configured) gets a proper event with breadcrumbs through the existing error pipeline. - Exit code 1 matches the existing error-exit convention — no CI/pre-commit behavior change.
Tests: 14 unit cases covering string / error / runtime.Error panic values, debug-mode on/off, crash-file write success and graceful failure, option defaults, env-gate matrix (canonical / lower / upper / whitespace / non-debug levels), and Recover with nil and non-nil exit-code pointers.
Manual verification: injected a nil-pointer dereference into the version command, ran ./build/atmos version in both default and ATMOS_LOGS_LEVEL=Debug modes. Exact output is reproduced in the fix doc for PR/release-note reuse.
Full write-up: docs/fixes/2026-04-17-global-panic-handler.md.
3. yq bump + logger API migration
github.com/mikefarah/yq/v4 is bumped from 4.52.5 → 4.53.2. The 4.53 line replaces yqlib's internal logger — previously built on op/go-logging.v1 — with one built on Go's standard log/slog. The old yqlib.GetLogger().SetBackend(backend logging.Backend) entry point is gone; the new API exposes SetLevel(slog.Level) and SetSlogger(*slog.Logger).
Atmos's pkg/utils/yq_utils.go used SetBackend with a no-op logBackend struct to silence yq's internal chatter unless Logs.Level == Trace. Without migration, atmos fails to build against the new yq with logger.SetBackend undefined.
Migration:
- Removed the
logBackendtype and its four methods (Log,GetLevel,SetLevel,IsEnabledFor) along with thegopkg.in/op/go-logging.v1import. - Rewrote
configureYqLoggerto install anio.Discardslog handler viayqlib.GetLogger().SetSlogger(...)when the Atmos log level is not Trace. Semantics are preserved: yq's internal diagnostics are suppressed by default and only surface at Trace level. - Deleted
TestLogBackendfrompkg/utils/yq_utils_test.go(tested a type that no longer exists).TestConfigureYqLoggerand allEvaluateYqExpressiontests still pass.
No behavior change for end users: templates and YAML-function calls that route through yq produce the same output with the same suppression of yq's internal logs.
Also
- Bump
ATMOS_VERSION=1.216.0inexamples/quick-start-advanced/Dockerfileand two test fixtures that referenced the old version.
why
- Ambient identity crash is a complete blocker. Any user running
atmos auth loginagainst a genericambientidentity — the canonical pattern for IRSA / IMDS / ECS task roles / cloud-agnostic passthrough — hits a hard SIGSEGV on every invocation. There is no workaround short of not using the identity kind, which defeats the reason the kind exists. - The panic handler is defensive UX. Cloud-credential code paths are full of nil-interface boundaries; the ambient crash is proof that a similar bug could slip in again. Intercepting panics at the main-goroutine entry point turns any future incident of the same shape into a crisp bug-report loop (one friendly line + one file path to attach) instead of a wall of goroutine output, with the full stack one env var away for contributors.
- The yq bump is required to stay on a maintained yqlib. 4.53 is the current minor line; staying on 4.52 leaves us one release behind on upstream fixes and drifts further from the slog-based logger API that the rest of the Go ecosystem is converging on. The migration is a one-file change with identical user-visible behavior.
references
docs/fixes/2026-04-17-ambient-identity-nil-credentials.md— ambient crash fix: root cause, scope, tests, and why the fix belongs at the manager layer rather than synthesizing a credential stub in the identity.docs/fixes/2026-04-17-global-panic-handler.md— panic handler design, sample output (default + debug mode + crash report), test matrix, and follow-up items.docs/prd/ambient-identity.md— the ambient-identity PRD. The(nil, nil)return contract fromambient.Authenticate()is intentional for the generic kind; the bug was the manager failing to honor it..claude/agents/tui-expert.md—pkg/uioutput-channel rules the panic handler follows (stderr UI channel viaui.Error/ui.MarkdownMessage; neverfmt.Fprintf(os.Stderr, ...)).github.com/mikefarah/yqv4.53.0 release notes — upstream changelog for the logger migration.
Summary by CodeRabbit
-
New Features
- Global panic recovery with user-friendly crash reports and automatic crash-file generation.
-
Bug Fixes
- Prevented crash when authenticating with generic ambient identities that return nil credentials; authentication now returns stable identity info without panicking.
-
Documentation
- Added detailed fix write-ups for panic recovery and nil-credential behavior.
-
Tests
- Added unit and integration tests covering panic handling and nil-credential authentication paths.
-
Chores
- Updated dependencies, bumped example default version to 1.216.0, adjusted logger handling, and refreshed NOTICE entries.
fix: respect workdir path for generate: writes and hook-triggered terraform @zack-is-cool (#2309)
Summary
Fixes a cluster of bugs in provision.workdir.enabled: true mode covering file generation, hook dispatch, store hook correctness, and repeated-apply terraform init prompts.
Bug 1 – generate: writes to base component directory instead of workdir
resolveAndProvisionComponentPath called autoGenerateComponentFiles before provisionComponentSource. Generated files (e.g. locals_override.tf) were written to components/terraform/<component>/ instead of the JIT workdir.
Fix: swap call order — provision source first, then generate into the returned (workdir) path.
Bug 2 – hooks and output executor used base component directory
extractComponentPath always returned the base component directory because _workdir_path is a runtime key absent from freshly-described sections. Hooks calling terraform output would fail with "no such file or directory" when trying to write backend.tf.json to a path that doesn't exist.
Fix: check provision.workdir.enabled in sections and rebuild the deterministic workdir path via workdir.BuildPath.
Bug 3 – hooks fired on every event regardless of events: list
RunAll had no event matching — all hooks ran regardless of their events: list. YAML uses hyphens (after-terraform-apply) but Go HookEvent constants use dots (after.terraform.apply).
Fix: added MatchesEvent() with hyphen→dot normalisation. Hooks with no events: field match all events to preserve backward compatibility with configs written before event filtering existed.
Bug 4 – store hook used wrong output getter and wrong error sentinels
The store hook always used GetOutput (which runs terraform init) regardless of when it fires. Running init after apply with a closed stdin triggers state-migration prompts. Additionally, errors used ErrNilTerraformOutput for both retrieval failures and missing keys, and included no context about which hook or event caused the failure.
Fix: RunE now selects the getter based on the event — after- events use GetOutputSkipInit (workdir already initialised); before- events use GetOutput (init may not have run yet). IsPostExecution() helper on HookEvent encodes the contract. Error messages now include hook name, event, output key, component, and stack. Correct sentinels: ErrTerraformOutputFailed for retrieval errors, ErrTerraformOutputNotFound for missing keys.
Bug 5 – "Do you want to migrate all workspaces?" prompt on every apply
This was caused by three interacting problems:
-
-reconfigureadded wheneverWorkdirPathKeywas set —WorkdirPathKeyis set for both a preserved workdir (TTL not expired) and a wiped/re-provisioned workdir (TTL=0s or expired). Checking it unconditionally added-reconfigureeven when.terraform/was intact. -
init_run_reconfigure: trueoverriding the preserved-workdir guard — even after scoping-reconfiguretoWorkdirReprovisionedKey, the globalInitRunReconfigureflag bypassed the check and always added-reconfigure. -
cleanTerraformWorkspacedeleting.terraform/environmentfor workdir components — this function was designed for backend-switching on non-workdir components. For workdir components it deleted the active workspace record before every init, causing OpenTofu to see orphanedterraform.tfstate.d/<workspace>/directories with no active workspace and prompt for migration.
When combined: -reconfigure tells OpenTofu to ignore the saved backend and treat init as fresh. A fresh-init with existing workspace state dirs triggers the migration prompt even when the backend is unchanged.
Fix (three parts):
- Introduce
WorkdirReprovisionedKey(_workdir_reprovisioned), set only byvendorToTarget(source wiped) orSyncDirwith file changes (workdir synced). This is the correct signal that.terraform/was actually cleared. - For workdir components with a preserved workdir, ignore
InitRunReconfigure— the backend is always generated deterministically from the same stack config and never changes between runs.-reconfigureis only added whenWorkdirReprovisionedKeyis set or the subcommand isworkspace. - Skip
cleanTerraformWorkspacefor workdir-enabled components — the backend is consistent, so there is no reason to clear the workspace record.
Tested end-to-end
Full producer → store → consumer pipeline:
null-labelapplies with JIT workdir +generate:overrideafter-terraform-applyhook reads.idoutput and writes it to Redis (no init re-run, no migration prompt)consumerreads the value via!store local/redis null-label label_id, injects it into its owngenerate:template, applies successfully- Repeated applies do not prompt for workspace migration, with or without
init_run_reconfigure: trueand with or withoutttl: "0s"
Reproduction
This worked successfully for the deployment that I was initially having this issue with. Local reproduction below.
cat << 'SCRIPT' > repro.sh
#!/usr/bin/env bash
# ============================================================
# ATMOS REPRO: generate: writes orphaned override to base
# component directory; hook-triggered terraform fails;
# consumer reads store value into JIT workdir generate:
#
# Stack name: demo (from vars.name + name_template)
# Components: null-label (producer), consumer (reads from store)
#
# Requires: atmos, tofu, docker
# ============================================================
set -euo pipefail
WORKDIR="$(mktemp -d -t atmos-repro-XXXXXX)"
echo "Working in: ${WORKDIR}"
cd "${WORKDIR}"
echo "== starting redis =="
docker stop atmos-repro-redis 2>/dev/null || true
docker run -d --rm --name atmos-repro-redis -p 6379:6379 redis:7-alpine
trap 'docker stop atmos-repro-redis 2>/dev/null || true' EXIT
sleep 1
cat <<'EOF' > atmos.yaml
base_path: "."
stores:
local/redis:
type: redis
options:
url: "redis://localhost:6379"
components:
terraform:
base_path: "components/terraform"
command: "tofu"
workspaces_enabled: true
apply_auto_approve: false
deploy_run_init: true
init_run_reconfigure: true
auto_generate_backend_file: true
auto_generate_files: true
stacks:
name_template: "{{ .vars.name }}"
base_path: "stacks"
included_paths:
- "**/*"
EOF
mkdir -p stacks
cleanup() {
echo "-- cleanup --"
atmos terraform workdir clean --all 2>/dev/null || true
echo "-- cleanup done --"
}
show_dirs() {
local label="${1:-}"
echo
if [[ -n "$label" ]]; then
echo "-- directories: $label --"
fi
echo "components/terraform/null-label"
ls -la components/terraform/null-label/ 2>/dev/null || echo "(does not exist)"
echo ".workdir/terraform/demo-null-label"
ls -la .workdir/terraform/demo-null-label/ 2>/dev/null || echo "(does not exist)"
echo ".workdir/terraform/demo-consumer"
ls -la .workdir/terraform/demo-consumer/ 2>/dev/null || echo "(does not exist)"
}
# ============================================================
# SCENARIO 1: JIT + generate, no hook.
# Verifies generate: writes to the workdir only (not the base
# component directory), and that apply succeeds.
# ============================================================
echo
echo "================================================="
echo "SCENARIO 1: init + apply WITHOUT hook (expect success)"
echo " - generate: must write only to workdir, not base component dir"
echo "================================================="
cleanup
cat <<'EOF' > stacks/demo.yaml
vars:
name: demo
terraform:
backend_type: local
components:
terraform:
null-label:
vars:
namespace: "eg"
stage: "test"
name: "demo"
enabled: true
source:
uri: "git::https://github.com/cloudposse/terraform-null-label.git"
version: "0.25.0"
ttl: "0s"
provision:
workdir:
enabled: true
generate:
locals_override.tf: |
# override file generated by atmos
locals {
name = "THISISANOVERRIDE"
}
EOF
echo "== init =="
atmos terraform init null-label -s demo
show_dirs "after init"
echo "== apply =="
atmos terraform apply null-label -s demo -- -auto-approve
show_dirs "after apply"
echo
echo "SCENARIO 1: PASSED"
# ============================================================
# SCENARIO 2: JIT + generate + after-apply hook writes to Redis.
# The hook fires after apply, reads terraform output, and stores
# it in Redis. Tests that the hook does not re-run init (which
# would prompt for workspace migration with a closed stdin).
# ============================================================
echo
echo "================================================="
echo "SCENARIO 2: init + apply WITH after-apply store hook (expect success)"
echo " - hook reads .id output and stores it in Redis"
echo " - hook must NOT re-run terraform init"
echo "================================================="
cleanup
cat <<'EOF' > stacks/demo.yaml
vars:
name: demo
terraform:
backend_type: local
components:
terraform:
null-label:
vars:
namespace: "eg"
stage: "test"
name: "demo"
enabled: true
source:
uri: "git::https://github.com/cloudposse/terraform-null-label.git"
version: "0.25.0"
ttl: "0s"
provision:
workdir:
enabled: true
generate:
locals_override.tf: |
# override file generated by atmos
locals {
name = "THISISANOVERRIDE"
}
hooks:
store-outputs:
events:
- after-terraform-apply
command: store
name: local/redis
outputs:
label_id: .id
EOF
echo "== init =="
atmos terraform init null-label -s demo
echo "== apply =="
atmos terraform apply null-label -s demo -- -auto-approve
show_dirs "after apply"
echo
echo "== verifying Redis contains label_id =="
STORED=$(docker exec atmos-repro-redis redis-cli KEYS "*label_id*")
if [[ -z "$STORED" ]]; then
echo "SCENARIO 2: FAILED — no label_id key found in Redis"
exit 1
fi
echo "Redis keys: $STORED"
echo
echo "SCENARIO 2: PASSED"
# ============================================================
# SCENARIO 3: Consumer reads label_id from Redis via !store,
# injects it into a generate: template inside its own JIT workdir.
# Tests the full producer → store → consumer pipeline.
# ============================================================
echo
echo "================================================="
echo "SCENARIO 3: consumer reads store value into JIT workdir generate:"
echo " - consumer.vars.label_id: !store local/redis null-label label_id"
echo " - generate: uses {{ .vars.label_id }} in a locals override"
echo " - both components use JIT workdir with ttl: 0s"
echo "================================================="
cat <<'EOF' > stacks/demo.yaml
vars:
name: demo
terraform:
backend_type: local
components:
terraform:
null-label:
vars:
namespace: "eg"
stage: "test"
name: "demo"
enabled: true
source:
uri: "git::https://github.com/cloudposse/terraform-null-label.git"
version: "0.25.0"
ttl: "0s"
provision:
workdir:
enabled: true
generate:
locals_override.tf: |
# override file generated by atmos
locals {
name = "THISISANOVERRIDE"
}
hooks:
store-outputs:
events:
- after-terraform-apply
command: store
name: local/redis
outputs:
label_id: .id
consumer:
vars:
namespace: "eg"
stage: "test"
enabled: true
label_id: !store local/redis null-label label_id
source:
uri: "git::https://github.com/cloudposse/terraform-null-label.git"
version: "0.25.0"
ttl: "0s"
provision:
workdir:
enabled: true
generate:
name_override.tf: |
# override file generated by atmos — value comes from Redis via !store
locals {
name = "{{ .vars.label_id }}-derpderpderp"
}
EOF
echo "== apply consumer =="
atmos terraform apply consumer -s demo -- -auto-approve
show_dirs "after consumer apply"
echo
echo "== verifying consumer output contains the store value =="
CONSUMER_ID=$(atmos terraform output consumer -s demo 2>/dev/null | grep "id =" | head -1)
echo "Consumer id output line: $CONSUMER_ID"
if echo "$CONSUMER_ID" | grep -q "derpderpderp"; then
echo
echo "SCENARIO 3: PASSED — consumer label contains store-derived value"
else
echo
echo "SCENARIO 3: FAILED — consumer output does not contain expected suffix"
echo " Expected 'derpderpderp' in id output"
exit 1
fi
echo
echo "================================================="
echo "ALL SCENARIOS PASSED"
echo "Working directory preserved at: ${WORKDIR}"
echo "================================================="
SCRIPT
bash repro.sh 2>&1 | tee repro.log
Test plan
-
TestHook_MatchesEvent— hyphen/dot formats, no match, nil/empty events (backward compat), multiple events -
TestRunAll_EventFiltering— store called/skipped based on event matching -
TestExecutor_GetOutputWithOptions_SkipInit—terraform initNOT called whenSkipInit: true -
TestBuildInitArgs_ReconfigureWhenWorkdirReprovisioned—-reconfigureadded when workdir wiped -
TestBuildInitArgs_NoReconfigureWhenWorkdirPreserved—-reconfigureNOT added for preserved workdir -
TestBuildInitArgs_NoReconfigureWhenWorkdirPreserved_InitRunReconfigureIgnored— globalInitRunReconfigure: truedoes not override the preserved-workdir guard -
TestBuildInitArgs_ReconfigureForNonWorkdir_InitRunReconfigure—InitRunReconfigurestill works for non-workdir components -
TestPrepareInitExecution_SkipsCleanWorkspaceForWorkdir—.terraform/environmentpreserved for workdir components -
TestPrepareInitExecution_CleansWorkspaceForNonWorkdir—.terraform/environmentstill cleaned for non-workdir components -
TestIsWorkdirEnabled/TestExtractComponentPath/workdir_enabled_*— workdir path resolution - Full
pkg/hooks,pkg/terraform/output,internal/exectest suites pass
Summary by CodeRabbit
-
New Features
- Workdir-aware provisioning that targets JIT workdirs and signals reprovisioning
- Hook event normalization, post-execution detection, and event-matching/filtering
- New output APIs including skip-init retrieval and advanced output options
-
Improvements
- Smarter terraform init (-reconfigure) behavior for workdir flows
- Preserve workspace files for workdir components to avoid unintended deletions
- More robust output caching and clearer CLI success/error messaging
-
Tests
- Expanded coverage for workdirs, init args, hooks, output paths, and store commands
fix: treat missing Atmos config in BASE as empty baseline in `atmos describe affected` @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2296)
`atmos describe affected` fatally errors with `failed to find import` on greenfield branches (or when the base ref predates Atmos adoption) because `ErrFailedToFindImport` from BASE stack processing was propagated as a hard failure. The correct behavior is to treat an unconfigured BASE as an empty baseline — everything in HEAD is new, therefore everything is affected.Changes
-
internal/exec/describe_affected_utils.go—executeDescribeAffectednow handlesErrFailedToFindImportalongsideErrNoStackManifestsFoundin both BASE processing paths:FindAllStackConfigsInPathsForStackreturningErrFailedToFindImport(stacks directory absent in BASE) → setsremoteStackConfigFilesAbsolutePaths = []string{}ExecuteDescribeStacksreturningErrFailedToFindImport(imports unresolvable in BASE) → setsremoteStacks = map[string]any{}
Both cases emit a
WARNlog with actionable context:WARN No Atmos stack manifests found in BASE; treating BASE as empty (all HEAD components will be reported as affected) hint="This is expected for greenfield branches or when the base branch does not yet use Atmos" -
tests/describe_affected_greenfield_test.go— Integration test that initializes a bare-minimum git repo (single commit, no Atmos config) as the BASE and asserts all known HEAD components (component-1,component-2inprod/nonprod) appear in the affected output without error.
Associated Pull Requests
Deployment Status
To view the Atmos Pro deployment status of this release, see #2342.