cloudposse/atmos v1.216.0 on GitHub

feat(auth): interactive profile suggestion + profiles.default @osterman (#2333)

what

Add interactive profile suggestion for two scenarios where the loaded atmos.yaml lacks the auth config the user is asking for:
- Identity-specific: atmos … --identity foo when foo isn't defined in the base config but exists in one or more profiles.
- Identity-agnostic: atmos auth login/exec/shell/env/console/whoami when the base config has no auth.identities / auth.providers at all but at least one profile does.
Interactive terminals get a themed huh prompt (yes/no for a single match, select list for multiple). Picking a profile re-execs Atmos with --profile <picked> prepended to the original argv.
Non-interactive terminals (CI, pipes) get the original error enriched with hints naming the candidate profile(s) and the exact --profile <name> flag to re-run.
Add profiles.default config field — implicit default profile loaded when --profile and ATMOS_PROFILE are unset; explicit selection always wins; nested defaults are ignored to prevent recursion.
Loop guard ATMOS_PROFILE_FALLBACK=1 prevents infinite re-prompt cycles when the picked profile also fails to resolve the identity.
Ctrl+C / Esc abort the prompt cleanly with a single User aborted. message instead of cascading identity-not-found / authentication-failed errors.
Prompt titles use ui.FormatInline() so backticks render as styled inline code, matching the rest of Atmos.
New cfg.ProfilesWithIdentity() and cfg.ProfilesWithAuthConfig() helpers (scoped Viper, no global state pollution) to discover candidate profiles.
New AuthManager.MaybeOfferAnyProfileFallback() interface method called by every relevant auth command before returning the terminal "no identity" error.

why

Profile-based repos (e.g. cloudposse/infra-live patterns) put all auth.identities / auth.providers inside profiles/<name>/atmos.yaml and leave the root atmos.yaml minimal. Running atmos auth login there errored with no providers available and no actionable next step — users had to read source / docs to discover they needed --profile.
The fix turns a dead-end error into a one-keystroke recovery in interactive sessions and a copy-pasteable command in CI logs.
profiles.default lets teams pin a sensible profile (e.g. developer) without forcing every shell to export ATMOS_PROFILE, while keeping explicit overrides authoritative.

references

PRD: docs/prd/interactive-profile-suggestion.md
Blog: website/blog/2026-04-16-interactive-profile-suggestion.mdx
Docs: website/docs/cli/configuration/profiles.mdx

Summary by CodeRabbit

New Features
- profiles.default in atmos.yaml (precedence: --profile → ATMOS_PROFILE → profiles.default). Interactive profile suggestion to re-run with --profile when identities or auth config are missing; non‑TTY flows provide enriched rerun hints. Re-exec depth guard and portable re-exec helpers for safer re-run behavior.
Bug Fixes
- Clearer, more actionable error messages and improved fallback routing when identities or auth config are absent.
Documentation
- Added PRD, CLI docs, and blog post describing the behavior.
Tests
- Extensive unit/integration tests covering fallback flows, profile discovery, re-exec, and helpers.

fix: Add missing doc redirects for old core-concepts URLs @osterman (#2287)

what

Adds 25 new client-side redirects for old /core-concepts/ URLs that are still indexed by Google and cached by LLMs, causing 404 errors
Fixes 2 existing redirects that had invalid trailing slashes on /vendor/component-manifest/ targets (was causing Docusaurus build validation errors)

New redirect categories:

4 screenshot-confirmed 404s (vendoring, component-management, provisioning, schemas)
7 project section redirects (/core-concepts/projects/* → /projects/ and /cli/configuration/)
7 stacks sub-pages (define-components, settings, components, backend, vars, env, providers)
2 share-data / remote-state redirects
2 vendor sub-pages (component-manifest, vendor-manifest)
1 describe page redirect
2 component sub-pages (packer, ansible)

why

Old /core-concepts/ URLs are still indexed by Google and widely cached in LLM training data
LLMs frequently generate links to these old URLs when helping users with Atmos, leading to broken links and poor developer experience
Each broken URL was verified by live-fetching the page and confirming a 404 response
Each redirect target was cross-referenced against llms.txt to ensure validity

references

Verified via site:atmos.tools/core-concepts Google searches
All redirect targets validated against the Docusaurus build (npm run build passes)

Summary by CodeRabbit

Bug Fixes
- Fixed numerous broken documentation links and improved navigation by adding and updating redirect rules across Projects, Stacks, Components, Vendor, and related pages (including removal of trailing-slash redirect mismatches) so users are directed to correct docs URLs.
Chores
- Updated CI workflow runner constraints to refine automated job scheduling.

feat(list): add matrix output format to list instances command @johncblandii (#2322)

what

Add --format=matrix support to atmos list instances, producing GitHub Actions-compatible matrix JSON identical to atmos describe affected --format=matrix
Add --output-file flag for writing results in key=value format (for $GITHUB_OUTPUT)
Extract shared matrix types and output logic into pkg/matrix/ for DRY reuse across both describe affected and list instances

why

CI/CD pipelines need matrix output from list instances to drive parallel GitHub Actions jobs, just like describe affected already supports
Sharing the matrix output logic between commands avoids duplication and ensures consistent output format
The --output-file flag enables direct integration with GitHub Actions $GITHUB_OUTPUT without shell redirection

references

Output format matches atmos describe affected --format=matrix exactly:

{"include":[{"stack":"...","component":"...","component_path":"...","component_type":"..."}]}

When using --output-file, writes matrix=<json> and affected_count=<N> lines

Summary by CodeRabbit

New Features
- Added --format=matrix to emit GitHub Actions–compatible matrix JSON
- Added --output-file / -o to write matrix results as key=value (for $GITHUB_OUTPUT); only supported with --format=matrix
- Matrix entries include stack, component, component_path, and component_type
- --format=matrix disallows --upload and triggers CI-friendly output behavior
Tests
- Added coverage for matrix format, output-file flag, and file/stdout writing
Documentation
- Added docs, blog post, and roadmap entry for matrix support

chore: Update Atmos Pro workflow to use v1.215.0 container image @osterman (#2323)

what

Updated the Atmos Pro CI workflow container image from 1.214.0 to 1.215.0
Removed the "Build atmos from source" step that compiled a dev binary via go build
Changed atmos docs generate readme and atmos pro commit to use the container's pre-installed atmos binary instead of /tmp/atmos-dev

why

Atmos v1.215.0 ships with the pro commit command built-in, so building from source is no longer necessary
Simplifies the CI workflow and reduces build time by eliminating the Go compilation step

references

#2298 (atmos pro commit feature)

Summary by CodeRabbit

Chores
- Updated GitHub Actions workflow to use atmos container image version 1.215.0 (upgraded from 1.214.0).
- Streamlined workflow execution by removing the local build step and invoking atmos directly from the container image.

🚀 Enhancements

Fix multi-region provider aliases generating incorrect Terraform JSON format @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2210)

When configuring providers with dot-notation aliases (e.g., aws.use1), the generated providers_override.tf.json emitted invalid structure — separate top-level keys instead of the array-of-objects format Terraform's JSON syntax requires for multiple provider instances.

Changes

pkg/terraform/output/backend.go: Added exported ProcessProviderAliases that detects dot-notation provider keys, groups all configurations for the same provider type into an array (base config first, aliases sorted), and leaves non-aliased providers unchanged
internal/exec/utils.go: Updated generateComponentProviderOverrides to delegate to tfoutput.ProcessProviderAliases, eliminating duplicated logic

Example

Given stack config:

providers:
  aws:
    region: us-east-2
  aws.use1:
    region: us-east-1
    alias: use1

Before:

{ "provider": { "aws": { "region": "us-east-2" }, "aws.use1": { "alias": "use1", "region": "us-east-1" } } }

After:

{
  "provider": {
    "aws": [
      { "region": "us-east-2" },
      { "alias": "use1", "region": "us-east-1" }
    ]
  }
}

Original prompt

This section details on the original issue you should resolve
<issue_title>Multi-Region with Provider Aliases example is not working</issue_title>
<issue_description>### Describe the Bug
https://atmos.tools/stacks/providers#multi-region-with-provider-aliases, this example is not working, the actual generated file is different from the example.
Expected Behavior

The generated file is the same as the example.
Steps to Reproduce

With the following atmos component config:
components:
  terraform:
    eip:
      providers:
        aws:
          region: us-east-2
        aws.use1:
          region: us-east-1
          alias: use1
      metadata:
        component: eip
Run atmos command and check the output of providers_override.tf.json
Screenshots

The content of the generated providers_override.tf.json
{
  "provider": {
    "aws": {
      "region": "us-east-2"
    },
    "aws.use1": {
      "alias": "use1",
      "region": "us-east-1"
    }
  }
}
Would expect it to be :
{
  "provider": {
    "aws": [
      {
        "region": "us-east-2"
      },
      {
        "alias": "use1",
        "region": "us-east-1"
      }
    ]
  }
}
Environment

OS: OSX
Version: 1.209.0
Terraform version: v1.14.7

Additional Context

No response</issue_description>
Comments on the Issue (you are @copilot in this section)

Fixes #2208

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Summary by CodeRabbit

New Features
- Added support for provider aliases—both explicit and auto-derived from dot-notation provider keys (e.g., aws.use1).
- Providers are now properly grouped into arrays in generated Terraform provider override files.
Tests
- Added integration tests for provider alias scenarios.
Documentation
- Updated provider documentation to clarify alias auto-derivation behavior.

fix(list): gate `list instances --upload` on `settings.pro.enabled` @osterman (#2330)

what

Change atmos list instances --upload to filter instances by settings.pro.enabled == true (strict boolean) instead of settings.pro.drift_detection.enabled == true.
Rename isProDriftDetectionEnabled → isProEnabled and simplify the check to a single lookup on settings.pro.enabled; drift_detection.enabled is no longer consulted.
Update all unit, integration, comprehensive, cmd, and benchmark tests to the new fixture shape; add an explicit case proving pro.enabled: true with drift_detection.enabled: false is now enabled.
Update website/docs/cli/commands/list/list-instances.mdx to document the filter criterion under --upload, in the examples section, and in the :::tip block (noting it must be a boolean, not the string "true").

why

Users with settings.pro.enabled: true configured on their components were hitting No Atmos Pro-enabled instances found; nothing to upload. even when Pro was clearly enabled, because the filter required the narrower drift_detection.enabled sub-key.
settings.pro.enabled is the correct top-level enablement flag for Pro; drift detection is one feature among several and shouldn't gate the whole upload.
The docs previously described --upload without specifying what made an instance eligible, so the failure mode was invisible to users.

Behavior change (callout)

Components that previously qualified via only settings.pro.drift_detection.enabled: true (without pro.enabled: true) will now be excluded from --upload. Users in that shape must add settings.pro.enabled: true.

references

--upload was introduced in #2322

Summary by CodeRabbit

Bug Fixes
- Pro detection simplified: only an explicit boolean settings.pro.enabled=true marks an instance as Pro; missing/non-boolean values are treated as disabled.
- Upload behavior: all collected instances are uploaded; post-upload summary shows total uploaded plus enabled/disabled and drift-enabled counts.
- Improved Pro authentication hints for GitHub Actions and workspace ID.
Documentation
- CLI docs updated to reflect new upload semantics, payload shape, and the "No instances found; nothing to upload." message.
Tests
- Tests updated/added to cover the new Pro flag shape, counting, and upload behavior.

Fix: Identity names with dots incorrectly parsed by Viper @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2129)

- [x] Initial plan for fixing identity names with dots - [x] Add `fixAuthIdentities()` to re-parse identities from raw YAML - [x] Extract shared decode hooks into `getAtmosDecodeHookFunc()` - [x] Apply fix in `LoadConfig()` and `loadConfigFromCLIArgs()` - [x] Add test case `TestIdentityNamesWithDots` - [x] Use atmosConfig in perf.Track for consistency - [x] Remove debug log message that caused test snapshot failures - [x] Add error handling test cases to increase coverage to 84.6%

Original prompt

This section details on the original issue you should resolve

<issue_title>Zero-Configuration AWS SSO Identity Management: identity containing dots break it.</issue_title>
<issue_description>### Describe the Bug

Testing

auth:
  providers:
    sso-prod:
      kind: aws/iam-identity-center
      start_url: https://my-org.awsapps.com/start
      region: us-east-1
      auto_provision_identities: true  # One line to enable

I do get a list of identities in ~/.cache/atmos/auth/sso-prod/provisioned-identities.yaml.

Some of them contains dots, e.g.

        product.usa/ReadOnlyAccess: # <=== The "." here breaks it
            kind: aws/permission-set
            provider: sso-prod
            via:
                provider: sso-prod
            principal:
                account:
                    id: "000000000000"
                    name: product.usa
                name: ReadOnlyAccess

Which atmos does not support:

$ atmos auth list
   Initialize Identities 

   Error: invalid identity kind
  
  ## Explanation

   unsupported identity kind:

   Initialize Identities 

   Error: failed to initialize identities: invalid identity config: identity=product: invalid identity kind: unsupported identity kind:

   Error 

   Error: invalid auth config: failed to create auth manager: failed to initialize identities: invalid identity config: identity=product: invalid identity kind: unsupported identity kind:

Expected Behavior

it works :-)

Steps to Reproduce

Cf .bug description

Screenshots

No response

Environment

atmos 1.207.0

Additional Context

No response</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #2128

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

fix(toolchain): resolve aliases in `toolchain exec` / `toolchain which` lookups @osterman (#2332)

what

Route findBinaryPath (used by atmos toolchain exec and atmos toolchain which) through the existing alias-aware LookupToolVersion helper instead of a raw toolVersions.Tools[name] map lookup.
Derive owner / repo from the resolved canonical key so the computed install path matches what the write side persisted.
Add a regression test that reproduces the bug: .tool-versions storing helm/helm 3.20.2 + an alias helm → helm/helm now resolves via WhichExec("helm").

why

Symptom: atmos toolchain install helm@3.20.2 succeeds, but atmos toolchain exec -- helm … then errors with tool 'helm' not configured in .tool-versions and tries to re-install.
Root cause: the write side already canonicalizes via the resolver (wouldCreateDuplicate → aliasConflictsWithFullName), so entries land under the owner/repo key. The read side did a raw map lookup with no resolver, so an alias query missed the canonical entry — the classic write/read asymmetry.
Fix keeps the read side symmetric with the write side by reusing the helper that already exists for exactly this purpose.

references

Out of scope, tracked separately: RunInstall persisting the literal string latest to .tool-versions when installing without an explicit version, and wiring pkg/toolchain/filemanager / pkg/toolchain/lockfile into install/uninstall/set/exec.

Summary by CodeRabbit

Bug Fixes
- Fixed tool alias resolution to correctly locate binary paths when requesting tools by their registered alias names instead of canonical identifiers. The system now properly maps aliases to their resolved canonical entries before checking availability.

fix: resolve JIT workdir path for !terraform.state, !terraform.output, and atmos.Component @zack-is-cool (#2328)

What

Bug fix PR. Makes !terraform.state, !terraform.output, and atmos.Component work correctly for JIT workdir components (provision.workdir.enabled: true). All three were silently broken in ways that only surfaced at runtime.

Four fixes:

!terraform.state path resolution — resolves state path from .workdir/terraform/<stack>-<component>/ instead of the static source directory JIT components never write state to.
!terraform.output / atmos.Component auto-provision — provisions the JIT workdir before terraform init so output references work on any machine, not just ones with a pre-existing workdir from a prior apply.
Source-provisioned JIT workdir support — Fix 2 only handled local-copy provisioning. For source.uri components, !terraform.output now hydrates from the source URI before init. Also fixes extractComponentName fallback and a go-getter FileGetter dst-must-not-exist invariant.
Provisioner output interleaving — ui.ClearLine() before status writes prevents the bubbletea spinner from leaving leading whitespace on provisioner messages.

Correctness & security fixes:

TOCTOU race — sync.Map.Load+Store replaced with LoadOrStore inside the singleflight closure, eliminating the window where two goroutines could both enter Provision.
Context cancellation — switched to singleflight.DoChan + select so waiters with cancelled contexts exit immediately. Added context.WithoutCancel so leader cancellation doesn't abort shared provisioning work.
Path traversal guard — extractComponentPath verifies the derived workdir path stays within filepath.Abs(basePath) before returning it; escaping paths fall back to componentPath. Mirrors the existing guard in terraform_backend_local.go.
Actionable error hint — ErrWorkdirProvision now includes the full YAML path and env var to disable auto-provisioning.
loadConfigFromCLIArgs env var bug — setEnv(v) was missing on the --config/--config-path code path, silently ignoring all ATMOS_* overrides when config was loaded from CLI args.
Documentation — auto_provision_workdir_for_outputs and ATMOS_COMPONENTS_TERRAFORM_AUTO_PROVISION_WORKDIR_FOR_OUTPUTS added to the config/env var reference docs.

Why

JIT workdir components write their Terraform files to .workdir/terraform/<stack>-<component>/ via a before.terraform.init hook — but that hook only fires during direct atmos terraform commands, not YAML function evaluation. Three distinct silent failures resulted:

!terraform.state looked in the source directory where JIT components have no state — unconditional failure.
!terraform.output computed the correct workdir path but never populated the directory before calling terraform init — fails with "no such file or directory" on any cold machine.
!terraform.output + source.uri — even with Fix 2, ProvisionWorkdir only copies local files. Source-provisioned components need AutoProvisionSource first, which only fires in the hook system the output executor never reaches.

Note on Fix 3 (source.uri components)

!terraform.output against a source-provisioned component with a cold workdir will fetch from source.uri — the same credentials already needed for atmos terraform apply. The fetch is cached per (stack, component) pair per process.

Set auto_provision_workdir_for_outputs: false (or ATMOS_COMPONENTS_TERRAFORM_AUTO_PROVISION_WORKDIR_FOR_OUTPUTS=false) to disable Fixes 2 and 3.

For state-only reads, prefer !terraform.state — no init, no source fetch, no terraform binary required.

Migration

No breaking changes. Previously-failing commands now work.

# Before (runs terraform init + output on every eval):
vpc_id: '{{ (atmos.Component "vpc" .stack).outputs.vpc_id }}'

# After (reads state file directly, no init):
vpc_id: !terraform.state vpc {{ .stack }} vpc_id

Resolves #2167

Summary by CodeRabbit

New Features
- Auto-provision JIT working directories before Terraform output evaluation (configurable, enabled by default).
- Template/YAML functions can resolve state/outputs from JIT-provisioned and source-backed components.
Security / Bug Fixes
- Containment checks to prevent path traversal outside configured base path.
- Safer fallbacks and debug logging when workdir/state resolution fails.
Documentation
- Docs and env var added for the new auto-provision setting.
Tests
- Extensive unit/integration tests covering JIT provisioning, resolution, caching, concurrency, and inheritance.

fix(auth): crash on standalone `ambient` identity; add global panic handler @aknysh (#2334)

what

Fix a hard SIGSEGV when Atmos authenticates a standalone ambient identity (kind: ambient). Every atmos auth login / atmos auth whoami / atmos terraform ... against such an identity crashed the process with a Go stack trace.
Add a process-wide panic handler (pkg/panics) so any future uncaught panic renders a short, actionable crash message via pkg/ui instead of a raw Go goroutine dump, while preserving the full stack trace in a crash-report file for bug reports.
Update github.com/mikefarah/yq/v4 (4.52.5 → 4.53.2) and migrate Atmos's yq logger setup to the new slog-based API.

1. Ambient identity crash (primary fix)

Background: the generic ambient identity kind (docs/prd/ambient-identity.md) is a cloud-agnostic passthrough — Authenticate() returns (nil, nil) by design because credentials are resolved by the cloud SDK at subprocess runtime (IRSA / IMDS / ECS task role / environment), not by Atmos.

Bug: the auth manager forwarded those nil credentials straight to buildWhoamiInfo, which unconditionally invoked a method on the credential interface, producing a nil-interface dereference on the main goroutine.

Scope: standalone generic ambient identities. The AWS-specific aws/ambient was not affected because its Authenticate() resolves via the AWS SDK default chain and always returns real credentials.

Fix: buildWhoamiInfo now short-circuits safely when creds == nil and still returns a populated WhoamiInfo (realm, provider, identity, environment, timestamp). Environment is populated unconditionally so atmos auth whoami continues to report the expected surface for pure-passthrough ambient identities. Keystore cache, reference handle, BuildWhoamiInfo, and GetExpiration branches are skipped — there is nothing to cache for an identity that does not own credentials.

Tests:

TestManager_buildWhoamiInfo_NilCredentials — unit coverage of the nil-creds branch. Before the fix, this test panicked at manager_whoami.go:25.
TestManager_Authenticate_Ambient_Standalone — end-to-end via real NewAuthManager + Authenticate(). Before the fix, this path panicked in the same location through manager.go:294.

Both pass post-fix alongside the existing whoami tests.

Full write-up: docs/fixes/2026-04-17-ambient-identity-nil-credentials.md.

2. Global panic handler

Motivation: the ambient crash surfaced as a wall of Go runtime output that was useless to end users. Any future bug of the same shape would produce the same bad experience. The handler is defensive infrastructure, not a workaround for the ambient fix — both ship together so a regression cannot reintroduce a raw crash.

Behavior:

One deferred panics.Recover(&exitCode) at the top of main.run() covers every code path reachable synchronously from cmd.Execute() — every command, the internal/exec/ pipeline, pkg/auth/, pkg/stack/, etc. Installed before defer cmd.Cleanup() so Cleanup runs normally on clean exit and Recover also catches anything that escapes Cleanup itself.
User-facing output uses pkg/ui exclusively (per CLAUDE.md I/O/UI rules): red ✗ Atmos crashed unexpectedly headline, Markdown-rendered body with panic summary, version, OS/arch, Go build toolchain, command-line, crash-report path, and an issue-tracker link.
Full stack is shown inline only when ATMOS_LOGS_LEVEL=Debug or =Trace (case-insensitive). Otherwise it is written to a 0o600 crash report at $TMPDIR/atmos-crash-<UTC>-<pid>.txt whose path appears in the friendly message.
The panic is wrapped via cockroachdb/errors.WithStack and forwarded to errUtils.CaptureError, so Sentry (when configured) gets a proper event with breadcrumbs through the existing error pipeline.
Exit code 1 matches the existing error-exit convention — no CI/pre-commit behavior change.

Tests: 14 unit cases covering string / error / runtime.Error panic values, debug-mode on/off, crash-file write success and graceful failure, option defaults, env-gate matrix (canonical / lower / upper / whitespace / non-debug levels), and Recover with nil and non-nil exit-code pointers.

Manual verification: injected a nil-pointer dereference into the version command, ran ./build/atmos version in both default and ATMOS_LOGS_LEVEL=Debug modes. Exact output is reproduced in the fix doc for PR/release-note reuse.

Full write-up: docs/fixes/2026-04-17-global-panic-handler.md.

3. yq bump + logger API migration

github.com/mikefarah/yq/v4 is bumped from 4.52.5 → 4.53.2. The 4.53 line replaces yqlib's internal logger — previously built on op/go-logging.v1 — with one built on Go's standard log/slog. The old yqlib.GetLogger().SetBackend(backend logging.Backend) entry point is gone; the new API exposes SetLevel(slog.Level) and SetSlogger(*slog.Logger).

Atmos's pkg/utils/yq_utils.go used SetBackend with a no-op logBackend struct to silence yq's internal chatter unless Logs.Level == Trace. Without migration, atmos fails to build against the new yq with logger.SetBackend undefined.

Migration:

Removed the logBackend type and its four methods (Log, GetLevel, SetLevel, IsEnabledFor) along with the gopkg.in/op/go-logging.v1 import.
Rewrote configureYqLogger to install an io.Discard slog handler via yqlib.GetLogger().SetSlogger(...) when the Atmos log level is not Trace. Semantics are preserved: yq's internal diagnostics are suppressed by default and only surface at Trace level.
Deleted TestLogBackend from pkg/utils/yq_utils_test.go (tested a type that no longer exists). TestConfigureYqLogger and all EvaluateYqExpression tests still pass.

No behavior change for end users: templates and YAML-function calls that route through yq produce the same output with the same suppression of yq's internal logs.

Also

Bump ATMOS_VERSION=1.216.0 in examples/quick-start-advanced/Dockerfile and two test fixtures that referenced the old version.

why

Ambient identity crash is a complete blocker. Any user running atmos auth login against a generic ambient identity — the canonical pattern for IRSA / IMDS / ECS task roles / cloud-agnostic passthrough — hits a hard SIGSEGV on every invocation. There is no workaround short of not using the identity kind, which defeats the reason the kind exists.
The panic handler is defensive UX. Cloud-credential code paths are full of nil-interface boundaries; the ambient crash is proof that a similar bug could slip in again. Intercepting panics at the main-goroutine entry point turns any future incident of the same shape into a crisp bug-report loop (one friendly line + one file path to attach) instead of a wall of goroutine output, with the full stack one env var away for contributors.
The yq bump is required to stay on a maintained yqlib. 4.53 is the current minor line; staying on 4.52 leaves us one release behind on upstream fixes and drifts further from the slog-based logger API that the rest of the Go ecosystem is converging on. The migration is a one-file change with identical user-visible behavior.

references

docs/fixes/2026-04-17-ambient-identity-nil-credentials.md — ambient crash fix: root cause, scope, tests, and why the fix belongs at the manager layer rather than synthesizing a credential stub in the identity.
docs/fixes/2026-04-17-global-panic-handler.md — panic handler design, sample output (default + debug mode + crash report), test matrix, and follow-up items.
docs/prd/ambient-identity.md — the ambient-identity PRD. The (nil, nil) return contract from ambient.Authenticate() is intentional for the generic kind; the bug was the manager failing to honor it.
.claude/agents/tui-expert.md — pkg/ui output-channel rules the panic handler follows (stderr UI channel via ui.Error / ui.MarkdownMessage; never fmt.Fprintf(os.Stderr, ...)).
github.com/mikefarah/yq v4.53.0 release notes — upstream changelog for the logger migration.

Summary by CodeRabbit

New Features
- Global panic recovery with user-friendly crash reports and automatic crash-file generation.
Bug Fixes
- Prevented crash when authenticating with generic ambient identities that return nil credentials; authentication now returns stable identity info without panicking.
Documentation
- Added detailed fix write-ups for panic recovery and nil-credential behavior.
Tests
- Added unit and integration tests covering panic handling and nil-credential authentication paths.
Chores
- Updated dependencies, bumped example default version to 1.216.0, adjusted logger handling, and refreshed NOTICE entries.

fix: respect workdir path for generate: writes and hook-triggered terraform @zack-is-cool (#2309)

Summary

Fixes a cluster of bugs in provision.workdir.enabled: true mode covering file generation, hook dispatch, store hook correctness, and repeated-apply terraform init prompts.

Bug 1 – generate: writes to base component directory instead of workdir

resolveAndProvisionComponentPath called autoGenerateComponentFiles before provisionComponentSource. Generated files (e.g. locals_override.tf) were written to components/terraform/<component>/ instead of the JIT workdir.

Fix: swap call order — provision source first, then generate into the returned (workdir) path.

Bug 2 – hooks and output executor used base component directory

extractComponentPath always returned the base component directory because _workdir_path is a runtime key absent from freshly-described sections. Hooks calling terraform output would fail with "no such file or directory" when trying to write backend.tf.json to a path that doesn't exist.

Fix: check provision.workdir.enabled in sections and rebuild the deterministic workdir path via workdir.BuildPath.

Bug 3 – hooks fired on every event regardless of events: list

RunAll had no event matching — all hooks ran regardless of their events: list. YAML uses hyphens (after-terraform-apply) but Go HookEvent constants use dots (after.terraform.apply).

Fix: added MatchesEvent() with hyphen→dot normalisation. Hooks with no events: field match all events to preserve backward compatibility with configs written before event filtering existed.

Bug 4 – store hook used wrong output getter and wrong error sentinels

The store hook always used GetOutput (which runs terraform init) regardless of when it fires. Running init after apply with a closed stdin triggers state-migration prompts. Additionally, errors used ErrNilTerraformOutput for both retrieval failures and missing keys, and included no context about which hook or event caused the failure.

Fix: RunE now selects the getter based on the event — after- events use GetOutputSkipInit (workdir already initialised); before- events use GetOutput (init may not have run yet). IsPostExecution() helper on HookEvent encodes the contract. Error messages now include hook name, event, output key, component, and stack. Correct sentinels: ErrTerraformOutputFailed for retrieval errors, ErrTerraformOutputNotFound for missing keys.

Bug 5 – "Do you want to migrate all workspaces?" prompt on every apply

This was caused by three interacting problems:

-reconfigure added whenever WorkdirPathKey was set — WorkdirPathKey is set for both a preserved workdir (TTL not expired) and a wiped/re-provisioned workdir (TTL=0s or expired). Checking it unconditionally added -reconfigure even when .terraform/ was intact.
init_run_reconfigure: true overriding the preserved-workdir guard — even after scoping -reconfigure to WorkdirReprovisionedKey, the global InitRunReconfigure flag bypassed the check and always added -reconfigure.
cleanTerraformWorkspace deleting .terraform/environment for workdir components — this function was designed for backend-switching on non-workdir components. For workdir components it deleted the active workspace record before every init, causing OpenTofu to see orphaned terraform.tfstate.d/<workspace>/ directories with no active workspace and prompt for migration.

When combined: -reconfigure tells OpenTofu to ignore the saved backend and treat init as fresh. A fresh-init with existing workspace state dirs triggers the migration prompt even when the backend is unchanged.

Fix (three parts):

Introduce WorkdirReprovisionedKey (_workdir_reprovisioned), set only by vendorToTarget (source wiped) or SyncDir with file changes (workdir synced). This is the correct signal that .terraform/ was actually cleared.
For workdir components with a preserved workdir, ignore InitRunReconfigure — the backend is always generated deterministically from the same stack config and never changes between runs. -reconfigure is only added when WorkdirReprovisionedKey is set or the subcommand is workspace.
Skip cleanTerraformWorkspace for workdir-enabled components — the backend is consistent, so there is no reason to clear the workspace record.

Tested end-to-end

Full producer → store → consumer pipeline:

null-label applies with JIT workdir + generate: override
after-terraform-apply hook reads .id output and writes it to Redis (no init re-run, no migration prompt)
consumer reads the value via !store local/redis null-label label_id, injects it into its own generate: template, applies successfully
Repeated applies do not prompt for workspace migration, with or without init_run_reconfigure: true and with or without ttl: "0s"

Reproduction
This worked successfully for the deployment that I was initially having this issue with. Local reproduction below.

cat << 'SCRIPT' > repro.sh
#!/usr/bin/env bash
# ============================================================
# ATMOS REPRO: generate: writes orphaned override to base
#              component directory; hook-triggered terraform fails;
#              consumer reads store value into JIT workdir generate:
#
# Stack name:  demo       (from vars.name + name_template)
# Components:  null-label (producer), consumer (reads from store)
#
# Requires: atmos, tofu, docker
# ============================================================

set -euo pipefail

WORKDIR="$(mktemp -d -t atmos-repro-XXXXXX)"
echo "Working in: ${WORKDIR}"
cd "${WORKDIR}"

echo "== starting redis =="
docker stop atmos-repro-redis 2>/dev/null || true
docker run -d --rm --name atmos-repro-redis -p 6379:6379 redis:7-alpine
trap 'docker stop atmos-repro-redis 2>/dev/null || true' EXIT
sleep 1

cat <<'EOF' > atmos.yaml
base_path: "."

stores:
  local/redis:
    type: redis
    options:
      url: "redis://localhost:6379"

components:
  terraform:
    base_path: "components/terraform"
    command: "tofu"
    workspaces_enabled: true
    apply_auto_approve: false
    deploy_run_init: true
    init_run_reconfigure: true
    auto_generate_backend_file: true
    auto_generate_files: true

stacks:
  name_template: "{{ .vars.name }}"
  base_path: "stacks"
  included_paths:
    - "**/*"
EOF

mkdir -p stacks

cleanup() {
  echo "-- cleanup --"
  atmos terraform workdir clean --all 2>/dev/null || true
  echo "-- cleanup done --"
}

show_dirs() {
  local label="${1:-}"
  echo
  if [[ -n "$label" ]]; then
    echo "-- directories: $label --"
  fi
  echo "components/terraform/null-label"
  ls -la components/terraform/null-label/ 2>/dev/null || echo "(does not exist)"
  echo ".workdir/terraform/demo-null-label"
  ls -la .workdir/terraform/demo-null-label/ 2>/dev/null || echo "(does not exist)"
  echo ".workdir/terraform/demo-consumer"
  ls -la .workdir/terraform/demo-consumer/ 2>/dev/null || echo "(does not exist)"
}

# ============================================================
# SCENARIO 1: JIT + generate, no hook.
# Verifies generate: writes to the workdir only (not the base
# component directory), and that apply succeeds.
# ============================================================
echo
echo "================================================="
echo "SCENARIO 1: init + apply WITHOUT hook (expect success)"
echo "  - generate: must write only to workdir, not base component dir"
echo "================================================="
cleanup

cat <<'EOF' > stacks/demo.yaml
vars:
  name: demo

terraform:
  backend_type: local

components:
  terraform:
    null-label:
      vars:
        namespace: "eg"
        stage: "test"
        name: "demo"
        enabled: true
      source:
        uri: "git::https://github.com/cloudposse/terraform-null-label.git"
        version: "0.25.0"
        ttl: "0s"
      provision:
        workdir:
          enabled: true
      generate:
        locals_override.tf: |
          # override file generated by atmos
          locals {
            name = "THISISANOVERRIDE"
          }
EOF

echo "== init =="
atmos terraform init null-label -s demo

show_dirs "after init"

echo "== apply =="
atmos terraform apply null-label -s demo -- -auto-approve

show_dirs "after apply"

echo
echo "SCENARIO 1: PASSED"

# ============================================================
# SCENARIO 2: JIT + generate + after-apply hook writes to Redis.
# The hook fires after apply, reads terraform output, and stores
# it in Redis. Tests that the hook does not re-run init (which
# would prompt for workspace migration with a closed stdin).
# ============================================================
echo
echo "================================================="
echo "SCENARIO 2: init + apply WITH after-apply store hook (expect success)"
echo "  - hook reads .id output and stores it in Redis"
echo "  - hook must NOT re-run terraform init"
echo "================================================="
cleanup

cat <<'EOF' > stacks/demo.yaml
vars:
  name: demo

terraform:
  backend_type: local

components:
  terraform:
    null-label:
      vars:
        namespace: "eg"
        stage: "test"
        name: "demo"
        enabled: true
      source:
        uri: "git::https://github.com/cloudposse/terraform-null-label.git"
        version: "0.25.0"
        ttl: "0s"
      provision:
        workdir:
          enabled: true
      generate:
        locals_override.tf: |
          # override file generated by atmos
          locals {
            name = "THISISANOVERRIDE"
          }
      hooks:
        store-outputs:
          events:
            - after-terraform-apply
          command: store
          name: local/redis
          outputs:
            label_id: .id
EOF

echo "== init =="
atmos terraform init null-label -s demo

echo "== apply =="
atmos terraform apply null-label -s demo -- -auto-approve

show_dirs "after apply"

echo
echo "== verifying Redis contains label_id =="
STORED=$(docker exec atmos-repro-redis redis-cli KEYS "*label_id*")
if [[ -z "$STORED" ]]; then
  echo "SCENARIO 2: FAILED — no label_id key found in Redis"
  exit 1
fi
echo "Redis keys: $STORED"
echo
echo "SCENARIO 2: PASSED"

# ============================================================
# SCENARIO 3: Consumer reads label_id from Redis via !store,
# injects it into a generate: template inside its own JIT workdir.
# Tests the full producer → store → consumer pipeline.
# ============================================================
echo
echo "================================================="
echo "SCENARIO 3: consumer reads store value into JIT workdir generate:"
echo "  - consumer.vars.label_id: !store local/redis null-label label_id"
echo "  - generate: uses {{ .vars.label_id }} in a locals override"
echo "  - both components use JIT workdir with ttl: 0s"
echo "================================================="

cat <<'EOF' > stacks/demo.yaml
vars:
  name: demo

terraform:
  backend_type: local

components:
  terraform:
    null-label:
      vars:
        namespace: "eg"
        stage: "test"
        name: "demo"
        enabled: true
      source:
        uri: "git::https://github.com/cloudposse/terraform-null-label.git"
        version: "0.25.0"
        ttl: "0s"
      provision:
        workdir:
          enabled: true
      generate:
        locals_override.tf: |
          # override file generated by atmos
          locals {
            name = "THISISANOVERRIDE"
          }
      hooks:
        store-outputs:
          events:
            - after-terraform-apply
          command: store
          name: local/redis
          outputs:
            label_id: .id

    consumer:
      vars:
        namespace: "eg"
        stage: "test"
        enabled: true
        label_id: !store local/redis null-label label_id
      source:
        uri: "git::https://github.com/cloudposse/terraform-null-label.git"
        version: "0.25.0"
        ttl: "0s"
      provision:
        workdir:
          enabled: true
      generate:
        name_override.tf: |
          # override file generated by atmos — value comes from Redis via !store
          locals {
            name = "{{ .vars.label_id }}-derpderpderp"
          }
EOF

echo "== apply consumer =="
atmos terraform apply consumer -s demo -- -auto-approve

show_dirs "after consumer apply"

echo
echo "== verifying consumer output contains the store value =="
CONSUMER_ID=$(atmos terraform output consumer -s demo 2>/dev/null |  grep "id =" | head -1)
echo "Consumer id output line: $CONSUMER_ID"

if echo "$CONSUMER_ID" | grep -q "derpderpderp"; then
  echo
  echo "SCENARIO 3: PASSED — consumer label contains store-derived value"
else
  echo
  echo "SCENARIO 3: FAILED — consumer output does not contain expected suffix"
  echo "  Expected 'derpderpderp' in id output"
  exit 1
fi

echo
echo "================================================="
echo "ALL SCENARIOS PASSED"
echo "Working directory preserved at: ${WORKDIR}"
echo "================================================="
SCRIPT
bash repro.sh 2>&1 | tee repro.log

Test plan

TestHook_MatchesEvent — hyphen/dot formats, no match, nil/empty events (backward compat), multiple events
TestRunAll_EventFiltering — store called/skipped based on event matching
TestExecutor_GetOutputWithOptions_SkipInit — terraform init NOT called when SkipInit: true
TestBuildInitArgs_ReconfigureWhenWorkdirReprovisioned — -reconfigure added when workdir wiped
TestBuildInitArgs_NoReconfigureWhenWorkdirPreserved — -reconfigure NOT added for preserved workdir
TestBuildInitArgs_NoReconfigureWhenWorkdirPreserved_InitRunReconfigureIgnored — global InitRunReconfigure: true does not override the preserved-workdir guard
TestBuildInitArgs_ReconfigureForNonWorkdir_InitRunReconfigure — InitRunReconfigure still works for non-workdir components
TestPrepareInitExecution_SkipsCleanWorkspaceForWorkdir — .terraform/environment preserved for workdir components
TestPrepareInitExecution_CleansWorkspaceForNonWorkdir — .terraform/environment still cleaned for non-workdir components
TestIsWorkdirEnabled / TestExtractComponentPath/workdir_enabled_* — workdir path resolution
Full pkg/hooks, pkg/terraform/output, internal/exec test suites pass

Closes #2308
Closes #2307

Summary by CodeRabbit

New Features
- Workdir-aware provisioning that targets JIT workdirs and signals reprovisioning
- Hook event normalization, post-execution detection, and event-matching/filtering
- New output APIs including skip-init retrieval and advanced output options
Improvements
- Smarter terraform init (-reconfigure) behavior for workdir flows
- Preserve workspace files for workdir components to avoid unintended deletions
- More robust output caching and clearer CLI success/error messaging
Tests
- Expanded coverage for workdirs, init args, hooks, output paths, and store commands

fix: treat missing Atmos config in BASE as empty baseline in `atmos describe affected` @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2296)

`atmos describe affected` fatally errors with `failed to find import` on greenfield branches (or when the base ref predates Atmos adoption) because `ErrFailedToFindImport` from BASE stack processing was propagated as a hard failure. The correct behavior is to treat an unconfigured BASE as an empty baseline — everything in HEAD is new, therefore everything is affected.

Changes

internal/exec/describe_affected_utils.go — executeDescribeAffected now handles ErrFailedToFindImport alongside ErrNoStackManifestsFound in both BASE processing paths:
- FindAllStackConfigsInPathsForStack returning ErrFailedToFindImport (stacks directory absent in BASE) → sets remoteStackConfigFilesAbsolutePaths = []string{}
- ExecuteDescribeStacks returning ErrFailedToFindImport (imports unresolvable in BASE) → sets remoteStacks = map[string]any{}
Both cases emit a WARN log with actionable context:
```
WARN No Atmos stack manifests found in BASE; treating BASE as empty (all HEAD components will be reported as affected)
     hint="This is expected for greenfield branches or when the base branch does not yet use Atmos"
```
tests/describe_affected_greenfield_test.go — Integration test that initializes a bare-minimum git repo (single commit, no Atmos config) as the BASE and asserts all known HEAD components (component-1, component-2 in prod/nonprod) appear in the affected output without error.

Associated Pull Requests

#2333

Deployment Status

To view the Atmos Pro deployment status of this release, see #2342.