github cloudposse/atmos v1.216.0-rc.2

latest release: v1.216.0
pre-release5 hours ago
fix: Add missing doc redirects for old core-concepts URLs @osterman (#2287) ## what
  • Adds 25 new client-side redirects for old /core-concepts/ URLs that are still indexed by Google and cached by LLMs, causing 404 errors
  • Fixes 2 existing redirects that had invalid trailing slashes on /vendor/component-manifest/ targets (was causing Docusaurus build validation errors)

New redirect categories:

  • 4 screenshot-confirmed 404s (vendoring, component-management, provisioning, schemas)
  • 7 project section redirects (/core-concepts/projects/*/projects/ and /cli/configuration/)
  • 7 stacks sub-pages (define-components, settings, components, backend, vars, env, providers)
  • 2 share-data / remote-state redirects
  • 2 vendor sub-pages (component-manifest, vendor-manifest)
  • 1 describe page redirect
  • 2 component sub-pages (packer, ansible)

why

  • Old /core-concepts/ URLs are still indexed by Google and widely cached in LLM training data
  • LLMs frequently generate links to these old URLs when helping users with Atmos, leading to broken links and poor developer experience
  • Each broken URL was verified by live-fetching the page and confirming a 404 response
  • Each redirect target was cross-referenced against llms.txt to ensure validity

references

  • Verified via site:atmos.tools/core-concepts Google searches
  • All redirect targets validated against the Docusaurus build (npm run build passes)

Summary by CodeRabbit

  • Bug Fixes

    • Fixed numerous broken documentation links and improved navigation by adding and updating redirect rules across Projects, Stacks, Components, Vendor, and related pages (including removal of trailing-slash redirect mismatches) so users are directed to correct docs URLs.
  • Chores

    • Updated CI workflow runner constraints to refine automated job scheduling.

🚀 Enhancements

Fix multi-region provider aliases generating incorrect Terraform JSON format @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2210) When configuring providers with dot-notation aliases (e.g., `aws.use1`), the generated `providers_override.tf.json` emitted invalid structure — separate top-level keys instead of the array-of-objects format Terraform's JSON syntax requires for multiple provider instances.

Changes

  • pkg/terraform/output/backend.go: Added exported ProcessProviderAliases that detects dot-notation provider keys, groups all configurations for the same provider type into an array (base config first, aliases sorted), and leaves non-aliased providers unchanged
  • internal/exec/utils.go: Updated generateComponentProviderOverrides to delegate to tfoutput.ProcessProviderAliases, eliminating duplicated logic

Example

Given stack config:

providers:
  aws:
    region: us-east-2
  aws.use1:
    region: us-east-1
    alias: use1

Before:

{ "provider": { "aws": { "region": "us-east-2" }, "aws.use1": { "alias": "use1", "region": "us-east-1" } } }

After:

{
  "provider": {
    "aws": [
      { "region": "us-east-2" },
      { "alias": "use1", "region": "us-east-1" }
    ]
  }
}
Original prompt

This section details on the original issue you should resolve

<issue_title>Multi-Region with Provider Aliases example is not working</issue_title>
<issue_description>### Describe the Bug

https://atmos.tools/stacks/providers#multi-region-with-provider-aliases, this example is not working, the actual generated file is different from the example.

Expected Behavior

The generated file is the same as the example.

Steps to Reproduce

With the following atmos component config:

components:
  terraform:
    eip:
      providers:
        aws:
          region: us-east-2
        aws.use1:
          region: us-east-1
          alias: use1
      metadata:
        component: eip

Run atmos command and check the output of providers_override.tf.json

Screenshots

The content of the generated providers_override.tf.json

{
  "provider": {
    "aws": {
      "region": "us-east-2"
    },
    "aws.use1": {
      "alias": "use1",
      "region": "us-east-1"
    }
  }
}

Would expect it to be :

{
  "provider": {
    "aws": [
      {
        "region": "us-east-2"
      },
      {
        "alias": "use1",
        "region": "us-east-1"
      }
    ]
  }
}

Environment

  • OS: OSX
  • Version: 1.209.0
  • Terraform version: v1.14.7

Additional Context

No response</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Summary by CodeRabbit

  • New Features

    • Added support for provider aliases—both explicit and auto-derived from dot-notation provider keys (e.g., aws.use1).
    • Providers are now properly grouped into arrays in generated Terraform provider override files.
  • Tests

    • Added integration tests for provider alias scenarios.
  • Documentation

    • Updated provider documentation to clarify alias auto-derivation behavior.
fix(list): gate `list instances --upload` on `settings.pro.enabled` @osterman (#2330) ## what
  • Change atmos list instances --upload to filter instances by settings.pro.enabled == true (strict boolean) instead of settings.pro.drift_detection.enabled == true.
  • Rename isProDriftDetectionEnabledisProEnabled and simplify the check to a single lookup on settings.pro.enabled; drift_detection.enabled is no longer consulted.
  • Update all unit, integration, comprehensive, cmd, and benchmark tests to the new fixture shape; add an explicit case proving pro.enabled: true with drift_detection.enabled: false is now enabled.
  • Update website/docs/cli/commands/list/list-instances.mdx to document the filter criterion under --upload, in the examples section, and in the :::tip block (noting it must be a boolean, not the string "true").

why

  • Users with settings.pro.enabled: true configured on their components were hitting No Atmos Pro-enabled instances found; nothing to upload. even when Pro was clearly enabled, because the filter required the narrower drift_detection.enabled sub-key.
  • settings.pro.enabled is the correct top-level enablement flag for Pro; drift detection is one feature among several and shouldn't gate the whole upload.
  • The docs previously described --upload without specifying what made an instance eligible, so the failure mode was invisible to users.

Behavior change (callout)

Components that previously qualified via only settings.pro.drift_detection.enabled: true (without pro.enabled: true) will now be excluded from --upload. Users in that shape must add settings.pro.enabled: true.

references

  • --upload was introduced in #2322

Summary by CodeRabbit

  • Bug Fixes

    • Pro detection simplified: only an explicit boolean settings.pro.enabled=true marks an instance as Pro; missing/non-boolean values are treated as disabled.
    • Upload behavior: all collected instances are uploaded; post-upload summary shows total uploaded plus enabled/disabled and drift-enabled counts.
    • Improved Pro authentication hints for GitHub Actions and workspace ID.
  • Documentation

    • CLI docs updated to reflect new upload semantics, payload shape, and the "No instances found; nothing to upload." message.
  • Tests

    • Tests updated/added to cover the new Pro flag shape, counting, and upload behavior.
Fix: Identity names with dots incorrectly parsed by Viper @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2129) - [x] Initial plan for fixing identity names with dots - [x] Add `fixAuthIdentities()` to re-parse identities from raw YAML - [x] Extract shared decode hooks into `getAtmosDecodeHookFunc()` - [x] Apply fix in `LoadConfig()` and `loadConfigFromCLIArgs()` - [x] Add test case `TestIdentityNamesWithDots` - [x] Use atmosConfig in perf.Track for consistency - [x] Remove debug log message that caused test snapshot failures - [x] Add error handling test cases to increase coverage to 84.6%
Original prompt

This section details on the original issue you should resolve

<issue_title>Zero-Configuration AWS SSO Identity Management: identity containing dots break it.</issue_title>
<issue_description>### Describe the Bug

Testing

auth:
  providers:
    sso-prod:
      kind: aws/iam-identity-center
      start_url: https://my-org.awsapps.com/start
      region: us-east-1
      auto_provision_identities: true  # One line to enable

I do get a list of identities in ~/.cache/atmos/auth/sso-prod/provisioned-identities.yaml.

Some of them contains dots, e.g.

        product.usa/ReadOnlyAccess: # <=== The "." here breaks it
            kind: aws/permission-set
            provider: sso-prod
            via:
                provider: sso-prod
            principal:
                account:
                    id: "000000000000"
                    name: product.usa
                name: ReadOnlyAccess

Which atmos does not support:

$ atmos auth list
   Initialize Identities 

   Error: invalid identity kind
  
  ## Explanation

   unsupported identity kind:

   Initialize Identities 

   Error: failed to initialize identities: invalid identity config: identity=product: invalid identity kind: unsupported identity kind:

   Error 

   Error: invalid auth config: failed to create auth manager: failed to initialize identities: invalid identity config: identity=product: invalid identity kind: unsupported identity kind:

Expected Behavior

it works :-)

Steps to Reproduce

Cf .bug description

Screenshots

No response

Environment

atmos 1.207.0

Additional Context

No response</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

fix(toolchain): resolve aliases in `toolchain exec` / `toolchain which` lookups @osterman (#2332) ## what
  • Route findBinaryPath (used by atmos toolchain exec and atmos toolchain which) through the existing alias-aware LookupToolVersion helper instead of a raw toolVersions.Tools[name] map lookup.
  • Derive owner / repo from the resolved canonical key so the computed install path matches what the write side persisted.
  • Add a regression test that reproduces the bug: .tool-versions storing helm/helm 3.20.2 + an alias helm → helm/helm now resolves via WhichExec("helm").

why

  • Symptom: atmos toolchain install helm@3.20.2 succeeds, but atmos toolchain exec -- helm … then errors with tool 'helm' not configured in .tool-versions and tries to re-install.
  • Root cause: the write side already canonicalizes via the resolver (wouldCreateDuplicatealiasConflictsWithFullName), so entries land under the owner/repo key. The read side did a raw map lookup with no resolver, so an alias query missed the canonical entry — the classic write/read asymmetry.
  • Fix keeps the read side symmetric with the write side by reusing the helper that already exists for exactly this purpose.

references

  • Out of scope, tracked separately: RunInstall persisting the literal string latest to .tool-versions when installing without an explicit version, and wiring pkg/toolchain/filemanager / pkg/toolchain/lockfile into install/uninstall/set/exec.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed tool alias resolution to correctly locate binary paths when requesting tools by their registered alias names instead of canonical identifiers. The system now properly maps aliases to their resolved canonical entries before checking availability.
fix: resolve JIT workdir path for !terraform.state, !terraform.output, and atmos.Component @zack-is-cool (#2328) ## What

Bug fix PR. Makes !terraform.state, !terraform.output, and atmos.Component work correctly for JIT workdir components (provision.workdir.enabled: true). All three were silently broken in ways that only surfaced at runtime.

Four fixes:

  1. !terraform.state path resolution — resolves state path from .workdir/terraform/<stack>-<component>/ instead of the static source directory JIT components never write state to.
  2. !terraform.output / atmos.Component auto-provision — provisions the JIT workdir before terraform init so output references work on any machine, not just ones with a pre-existing workdir from a prior apply.
  3. Source-provisioned JIT workdir support — Fix 2 only handled local-copy provisioning. For source.uri components, !terraform.output now hydrates from the source URI before init. Also fixes extractComponentName fallback and a go-getter FileGetter dst-must-not-exist invariant.
  4. Provisioner output interleavingui.ClearLine() before status writes prevents the bubbletea spinner from leaving leading whitespace on provisioner messages.

Correctness & security fixes:

  • TOCTOU racesync.Map.Load+Store replaced with LoadOrStore inside the singleflight closure, eliminating the window where two goroutines could both enter Provision.
  • Context cancellation — switched to singleflight.DoChan + select so waiters with cancelled contexts exit immediately. Added context.WithoutCancel so leader cancellation doesn't abort shared provisioning work.
  • Path traversal guardextractComponentPath verifies the derived workdir path stays within filepath.Abs(basePath) before returning it; escaping paths fall back to componentPath. Mirrors the existing guard in terraform_backend_local.go.
  • Actionable error hintErrWorkdirProvision now includes the full YAML path and env var to disable auto-provisioning.
  • loadConfigFromCLIArgs env var bugsetEnv(v) was missing on the --config/--config-path code path, silently ignoring all ATMOS_* overrides when config was loaded from CLI args.
  • Documentationauto_provision_workdir_for_outputs and ATMOS_COMPONENTS_TERRAFORM_AUTO_PROVISION_WORKDIR_FOR_OUTPUTS added to the config/env var reference docs.

Why

JIT workdir components write their Terraform files to .workdir/terraform/<stack>-<component>/ via a before.terraform.init hook — but that hook only fires during direct atmos terraform commands, not YAML function evaluation. Three distinct silent failures resulted:

  • !terraform.state looked in the source directory where JIT components have no state — unconditional failure.
  • !terraform.output computed the correct workdir path but never populated the directory before calling terraform init — fails with "no such file or directory" on any cold machine.
  • !terraform.output + source.uri — even with Fix 2, ProvisionWorkdir only copies local files. Source-provisioned components need AutoProvisionSource first, which only fires in the hook system the output executor never reaches.

Note on Fix 3 (source.uri components)

!terraform.output against a source-provisioned component with a cold workdir will fetch from source.uri — the same credentials already needed for atmos terraform apply. The fetch is cached per (stack, component) pair per process.

Set auto_provision_workdir_for_outputs: false (or ATMOS_COMPONENTS_TERRAFORM_AUTO_PROVISION_WORKDIR_FOR_OUTPUTS=false) to disable Fixes 2 and 3.

For state-only reads, prefer !terraform.state — no init, no source fetch, no terraform binary required.


Migration

No breaking changes. Previously-failing commands now work.

# Before (runs terraform init + output on every eval):
vpc_id: '{{ (atmos.Component "vpc" .stack).outputs.vpc_id }}'

# After (reads state file directly, no init):
vpc_id: !terraform.state vpc {{ .stack }} vpc_id

Resolves #2167

Summary by CodeRabbit

  • New Features

    • Auto-provision JIT working directories before Terraform output evaluation (configurable, enabled by default).
    • Template/YAML functions can resolve state/outputs from JIT-provisioned and source-backed components.
  • Security / Bug Fixes

    • Containment checks to prevent path traversal outside configured base path.
    • Safer fallbacks and debug logging when workdir/state resolution fails.
  • Documentation

    • Docs and env var added for the new auto-provision setting.
  • Tests

    • Extensive unit/integration tests covering JIT provisioning, resolution, caching, concurrency, and inheritance.
fix(auth): crash on standalone `ambient` identity; add global panic handler @aknysh (#2334) ## what
  • Fix a hard SIGSEGV when Atmos authenticates a standalone ambient identity (kind: ambient). Every atmos auth login / atmos auth whoami / atmos terraform ... against such an identity crashed the process with a Go stack trace.
  • Add a process-wide panic handler (pkg/panics) so any future uncaught panic renders a short, actionable crash message via pkg/ui instead of a raw Go goroutine dump, while preserving the full stack trace in a crash-report file for bug reports.
  • Update github.com/mikefarah/yq/v4 (4.52.5 → 4.53.2) and migrate Atmos's yq logger setup to the new slog-based API.

1. Ambient identity crash (primary fix)

Background: the generic ambient identity kind (docs/prd/ambient-identity.md) is a cloud-agnostic passthrough — Authenticate() returns (nil, nil) by design because credentials are resolved by the cloud SDK at subprocess runtime (IRSA / IMDS / ECS task role / environment), not by Atmos.

Bug: the auth manager forwarded those nil credentials straight to buildWhoamiInfo, which unconditionally invoked a method on the credential interface, producing a nil-interface dereference on the main goroutine.

Scope: standalone generic ambient identities. The AWS-specific aws/ambient was not affected because its Authenticate() resolves via the AWS SDK default chain and always returns real credentials.

Fix: buildWhoamiInfo now short-circuits safely when creds == nil and still returns a populated WhoamiInfo (realm, provider, identity, environment, timestamp). Environment is populated unconditionally so atmos auth whoami continues to report the expected surface for pure-passthrough ambient identities. Keystore cache, reference handle, BuildWhoamiInfo, and GetExpiration branches are skipped — there is nothing to cache for an identity that does not own credentials.

Tests:

  • TestManager_buildWhoamiInfo_NilCredentials — unit coverage of the nil-creds branch. Before the fix, this test panicked at manager_whoami.go:25.
  • TestManager_Authenticate_Ambient_Standalone — end-to-end via real NewAuthManager + Authenticate(). Before the fix, this path panicked in the same location through manager.go:294.

Both pass post-fix alongside the existing whoami tests.

Full write-up: docs/fixes/2026-04-17-ambient-identity-nil-credentials.md.

2. Global panic handler

Motivation: the ambient crash surfaced as a wall of Go runtime output that was useless to end users. Any future bug of the same shape would produce the same bad experience. The handler is defensive infrastructure, not a workaround for the ambient fix — both ship together so a regression cannot reintroduce a raw crash.

Behavior:

  • One deferred panics.Recover(&exitCode) at the top of main.run() covers every code path reachable synchronously from cmd.Execute() — every command, the internal/exec/ pipeline, pkg/auth/, pkg/stack/, etc. Installed before defer cmd.Cleanup() so Cleanup runs normally on clean exit and Recover also catches anything that escapes Cleanup itself.
  • User-facing output uses pkg/ui exclusively (per CLAUDE.md I/O/UI rules): red ✗ Atmos crashed unexpectedly headline, Markdown-rendered body with panic summary, version, OS/arch, Go build toolchain, command-line, crash-report path, and an issue-tracker link.
  • Full stack is shown inline only when ATMOS_LOGS_LEVEL=Debug or =Trace (case-insensitive). Otherwise it is written to a 0o600 crash report at $TMPDIR/atmos-crash-<UTC>-<pid>.txt whose path appears in the friendly message.
  • The panic is wrapped via cockroachdb/errors.WithStack and forwarded to errUtils.CaptureError, so Sentry (when configured) gets a proper event with breadcrumbs through the existing error pipeline.
  • Exit code 1 matches the existing error-exit convention — no CI/pre-commit behavior change.

Out of scope (tracked as follow-up): panics on spawned goroutines (signal handler, telemetry flushes, async work) — those need their own deferred Recover at each entry point.

Tests: 14 unit cases covering string / error / runtime.Error panic values, debug-mode on/off, crash-file write success and graceful failure, option defaults, env-gate matrix (canonical / lower / upper / whitespace / non-debug levels), and Recover with nil and non-nil exit-code pointers.

Manual verification: injected a nil-pointer dereference into the version command, ran ./build/atmos version in both default and ATMOS_LOGS_LEVEL=Debug modes. Exact output is reproduced in the fix doc for PR/release-note reuse.

Full write-up: docs/fixes/2026-04-17-global-panic-handler.md.

3. yq bump + logger API migration

github.com/mikefarah/yq/v4 is bumped from 4.52.5 → 4.53.2. The 4.53 line replaces yqlib's internal logger — previously built on op/go-logging.v1 — with one built on Go's standard log/slog. The old yqlib.GetLogger().SetBackend(backend logging.Backend) entry point is gone; the new API exposes SetLevel(slog.Level) and SetSlogger(*slog.Logger).

Atmos's pkg/utils/yq_utils.go used SetBackend with a no-op logBackend struct to silence yq's internal chatter unless Logs.Level == Trace. Without migration, atmos fails to build against the new yq with logger.SetBackend undefined.

Migration:

  • Removed the logBackend type and its four methods (Log, GetLevel, SetLevel, IsEnabledFor) along with the gopkg.in/op/go-logging.v1 import.
  • Rewrote configureYqLogger to install an io.Discard slog handler via yqlib.GetLogger().SetSlogger(...) when the Atmos log level is not Trace. Semantics are preserved: yq's internal diagnostics are suppressed by default and only surface at Trace level.
  • Deleted TestLogBackend from pkg/utils/yq_utils_test.go (tested a type that no longer exists). TestConfigureYqLogger and all EvaluateYqExpression tests still pass.

No behavior change for end users: templates and YAML-function calls that route through yq produce the same output with the same suppression of yq's internal logs.

Also

  • Bump ATMOS_VERSION=1.216.0 in examples/quick-start-advanced/Dockerfile and two test fixtures that referenced the old version.

why

  • Ambient identity crash is a complete blocker. Any user running atmos auth login against a generic ambient identity — the canonical pattern for IRSA / IMDS / ECS task roles / cloud-agnostic passthrough — hits a hard SIGSEGV on every invocation. There is no workaround short of not using the identity kind, which defeats the reason the kind exists.
  • The panic handler is defensive UX. Cloud-credential code paths are full of nil-interface boundaries; the ambient crash is proof that a similar bug could slip in again. Intercepting panics at the main-goroutine entry point turns any future incident of the same shape into a crisp bug-report loop (one friendly line + one file path to attach) instead of a wall of goroutine output, with the full stack one env var away for contributors.
  • The yq bump is required to stay on a maintained yqlib. 4.53 is the current minor line; staying on 4.52 leaves us one release behind on upstream fixes and drifts further from the slog-based logger API that the rest of the Go ecosystem is converging on. The migration is a one-file change with identical user-visible behavior.

references

  • docs/fixes/2026-04-17-ambient-identity-nil-credentials.md — ambient crash fix: root cause, scope, tests, and why the fix belongs at the manager layer rather than synthesizing a credential stub in the identity.
  • docs/fixes/2026-04-17-global-panic-handler.md — panic handler design, sample output (default + debug mode + crash report), test matrix, and follow-up items.
  • docs/prd/ambient-identity.md — the ambient-identity PRD. The (nil, nil) return contract from ambient.Authenticate() is intentional for the generic kind; the bug was the manager failing to honor it.
  • .claude/agents/tui-expert.mdpkg/ui output-channel rules the panic handler follows (stderr UI channel via ui.Error / ui.MarkdownMessage; never fmt.Fprintf(os.Stderr, ...)).
  • github.com/mikefarah/yq v4.53.0 release notes — upstream changelog for the logger migration.

Summary by CodeRabbit

  • New Features

    • Global panic recovery with user-friendly crash reports and automatic crash-file generation.
  • Bug Fixes

    • Prevented crash when authenticating with generic ambient identities that return nil credentials; authentication now returns stable identity info without panicking.
  • Documentation

    • Added detailed fix write-ups for panic recovery and nil-credential behavior.
  • Tests

    • Added unit and integration tests covering panic handling and nil-credential authentication paths.
  • Chores

    • Updated dependencies, bumped example default version to 1.216.0, adjusted logger handling, and refreshed NOTICE entries.

Don't miss a new atmos release

NewReleases is sending notifications on new releases.