cloudposse/atmos v1.221.0 on GitHub

feat: describe affected evaluates all provisioned component sections @osterman (#2573)

what

Fix atmos describe affected so it detects changes in every provisioned component section, not just vars/env/settings/metadata/source/provision.
Newly evaluated sections: providers, required_providers (provider versions), required_version, hooks, generate, backend, backend_type, remote_state_backend, remote_state_backend_type, auth, command, and dependencies — including scalar sections (previously only map sections were compared).
Add a configurable describe.affected.sections setting in atmos.yaml that fully replaces the evaluated set (e.g. to track a custom section or narrow the list); metadata/settings are always evaluated.
Refactor the three component processors to a single table-driven comparison, add a documented "Evaluated sections" list, tests, a changelog blog post, and a roadmap milestone.

why

The comparison ran against a hand-maintained allow-list that had drifted out of sync with what Atmos actually merges into a component, so changes to providers, hooks, provider versions, backend, etc. were silently missed — a false negative that could let CI pipelines skip components that genuinely changed.
The table is now tied (via comments) to the sections written in stack_processor_merge.go, and the new config setting gives users an escape hatch so the bug class can't quietly return.
locals, overrides, inheritance, and retry are intentionally excluded (they either fold into other sections or are execution-time only).

references

Docs: Evaluated sections and describe.affected.sections

Summary by CodeRabbit

New Features
- Describe now evaluates and reports changes across a comprehensive set of top-level component sections (including scalar sections) with per-section reasons; first changed section becomes the headline reason.
- Added configurable describe.affected.sections to fully replace the default evaluated set (metadata/settings remain always evaluated).
Documentation
- Blog and CLI/config docs updated with evaluated-sections details, output reason entries, and configuration examples.
Tests
- Added tests for section evaluation, equality behavior, remote-locator logic, and override/no-false-positive cases.
Chores
- Updated snapshots, roadmap, CI workflow pins, link-checker exclusions, and changelog guidance.

feat(hooks): terraform init lifecycle hooks + --skip-hooks before-* fix @osterman (#2574)

what

Fix --skip-hooks for before- hooks.* Previously it only skipped after-* hooks; before-terraform-plan/apply/deploy hooks ran regardless. Now --skip-hooks (skip all) and --skip-hooks=name1,name2 (skip by name) are honored symmetrically for before and after events.
Add before-terraform-init and after-terraform-init lifecycle hooks for the atmos terraform init command. after-terraform-init is new; before-terraform-init was documented but never dispatched to user hooks — now it fires. They run through the same runHooks/RunAll path, so the skip fix applies to them too.
Add tests (real parsed Cobra flag, not viper.Set), strengthen hook-inheritance coverage with a fixture proving top-level terraform.hooks: is inherited by every component (and components.terraform.hooks: is not), update the Hooks docs, blog post, and roadmap.

why

--skip-hooks is a global flag bound to Viper inside RunE, but before-* hooks run earlier in PreRunE — so viper.GetString("skip-hooks") never saw the CLI value and before-hooks fired anyway. The flag is now resolved directly from the parsed command (Viper/ATMOS_SKIP_HOOKS fallback), mirroring how --ci and --verbose are read in PreRunE.
Init had no user-hook surface at all: init.go wired no hooks and the BeforeTerraformInit event was never dispatched. Wiring PreRunE/PostRunE on the init command (like plan/apply/deploy) closes the lifecycle gap so teams can validate tooling, vendor sources, or notify systems around terraform init declaratively.
The previous skip tests injected via viper.Set with a nil command, sidestepping the exact flag-binding lifecycle that was broken — which is how the bug shipped; the new tests fail against the old implementation.

references

Hooks documentation: /stacks/hooks
Note: before-/after-terraform-init fire on the explicit atmos terraform init, not the implicit init that plan/apply run.

Summary by CodeRabbit

New Features
- Added Terraform init lifecycle event: after-terraform-init (alongside before-terraform-init); Terraform-scoped default hooks can be inherited by Terraform components.
Bug Fixes
- Fixed --skip-hooks precedence so CLI flag reliably overrides env/config and consistently skips before/after hook phases.
- Clarified hook scope handling so misplaced hook keys aren’t incorrectly applied.
Documentation
- Blog, docs, and roadmap updated to describe init hook events and skip-hooks behavior.
Tests
- Expanded coverage for hook inheritance, scope, init wiring, event filtering, and skip-hooks CLI behavior.
Chores
- CI Codecov step made non-fatal for transient upload errors.

feat(auth): share single OIDC session across aws/iam-identity-center providers @Benbentwo (#2553)

what

Refactors the aws/iam-identity-center (AWS SSO) provider so that multiple providers pointing at the same SSO portal (identical start_url + region) share a single OIDC token — one browser flow now unlocks every provider instead of one flow per provider.
Adds silent refresh-token renewal via ssooidc:CreateToken with grant_type=refresh_token, so a single browser interaction holds for the full portal session (~8h) rather than re-prompting every hour.
Introduces an in-process sessionTokenStore (keyed by sha1(start_url|region)) with per-session mutexes that single-flight concurrent device-auth flows; re-keys the on-disk cache from aws-sso/<provider>/token.json to aws-sso/sessions/<sha1>.json in the AWS SDK ssocreds-compatible format.
Adds the design PRD (docs/prd/aws-sso-session-support.md), a changelog blog post, and a shipped roadmap milestone under the Unified Authentication initiative.

why

A common setup has one provider per environment (dev/staging/prod) all backed by the same corporate SSO portal; previously atmos auth login launched the browser flow once per provider, contradicting AWS's own "credentials have been shared successfully" single-sign-in experience.
The legacy flow re-ran the full browser interaction on every ~1h access-token expiry and keyed its cache by provider name, so renaming a provider silently invalidated a still-valid token — both are eliminated here with zero atmos.yaml config changes.

references

PRD: docs/prd/aws-sso-session-support.md
AWS CLI token provider docs: https://docs.aws.amazon.com/cli/latest/userguide/sso-configure-profile-token.html
AWS SDK for Go v2 ssocreds: https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/credentials/ssocreds

Summary by CodeRabbit

New Features
- Shared AWS SSO sessions across providers for the same portal (start URL + region), reducing duplicate logins and browser prompts.
- Silent refresh via refresh tokens to renew credentials without a browser; per-session locking prevents concurrent device-auth flows.
- Session-keyed on-disk cache (compatible with AWS SDK patterns); logout clears shared session data; added session telemetry.
Documentation
- Product spec and blog post describing session sharing, cache format, refresh behavior, and rollout plan.
Tests
- Added/updated tests validating session sharing, cache semantics, isolation, refresh logic, and concurrency.

feat: implement !append YAML function for list concatenation @osterman (#1513)

what

Implements the !append YAML function that allows fine-grained control over list merging behavior in Atmos stack configurations
Lists tagged with !append will be concatenated with base values instead of replaced
Adds comprehensive unit tests and integration test fixtures

why

Resolves the ongoing challenge of needing to concatenate lists on a case-by-case basis
Currently, users have to fall back to using maps instead of lists when they need append behavior
This is particularly important for fields like depends_on where appending is often the desired behavior rather than replacement
The !append tag provides opt-in, per-field control that works alongside the global list_merge_strategy setting

Key Features

Opt-in behavior: Only lists explicitly tagged with !append use append mode
Works alongside global settings: The !append tag works independently of the global list_merge_strategy setting
Nested support: Works with deeply nested configurations
Backward compatible: No impact on existing configurations without the tag

Example Usage

# base.yaml
components:
  terraform:
    eks:
      settings:
        depends_on:
          - vpc
          - iam-role

# override.yaml
components:
  terraform:
    eks:
      settings:
        depends_on: !append  # This tag indicates append mode
          - rds
          - elasticache
          
# Result: depends_on = [vpc, iam-role, rds, elasticache]

Testing

✅ All unit tests pass
✅ Build succeeds without errors
✅ Linting passes with no issues
✅ Code follows Atmos conventions and patterns

references

Linear issue: DEV-2980
Documentation: !append YAML function
Changelog: blog post append-yaml-function; roadmap milestone updated (Extensibility initiative)

Summary by CodeRabbit

New Features
- Added a !append YAML function to append items to lists during configuration merging (per-field, preserves order, supports nested lists/maps, interacts with global list-merge strategies).
Tests
- Added comprehensive unit and integration tests covering append-tag helpers, parsing, merging, and end-to-end scenarios.
Documentation
- Added docs, examples, blog post, and index updates explaining !append usage and behavior.
Chores
- Updated website roadmap/metadata and package config; added a sentinel error alias.

feat: add !unset YAML function to delete keys from configuration @osterman (#1521)

what

Add new !unset YAML function that completely removes keys from configuration during inheritance and merging
Implement processing in both stack merging (yaml_func_utils.go) and config loading (process_yaml.go)
Add comprehensive unit tests for all functionality
Create documentation with examples and use cases
Update YAML functions index documentation

why

Users need a way to explicitly remove inherited configuration values, not just override them with null
Current workarounds require physically removing or commenting out keys in parent configurations
This addresses GitHub issue #227: "A YAML way of undefining a value without removing the key"
Provides fine-grained control over configuration inheritance in complex stack hierarchies

Key Features

Complete removal: Unlike setting to null, !unset completely removes the key from configuration
Inheritance control: Child configurations can remove values inherited from parents
Works everywhere: Functions in all Atmos configuration sections (vars, settings, env, metadata, etc.)
Type-safe: Operates after YAML parsing, ensuring no syntax breakage
Respects skip list: Can be disabled via skip list if needed

Examples

Basic Usage

# parent.yaml
components:
  terraform:
    vpc:
      vars:
        enable_nat_gateway: true
        enable_vpn_gateway: true

# child.yaml
import:
  - parent

components:
  terraform:
    vpc:
      vars:
        enable_vpn_gateway: !unset  # Completely removes this key

Removing Nested Values

config:
  database:
    host: "prod.db.example.com"
    backup_enabled: true

# Override:
config:
  database:
    backup_enabled: !unset  # Remove backup config
    host: "dev.db.example.com"

Testing

All tests pass:

✅ Unit tests for config processing
✅ Unit tests for stack processing
✅ Integration tests with other YAML functions
✅ Skip list functionality tests
✅ Inheritance scenario tests

references

Closes #227
Related to #267 (YAML Explicit Typing support)

Summary by CodeRabbit

New Features
- Added a YAML !unset function to remove keys or list items during config processing and inheritance. Works at any depth, supports multiple unsets, and coexists with other YAML functions.
Tests
- Introduced comprehensive tests covering flat and nested structures, arrays, multiple/nested unsets, inheritance scenarios, and edge cases.
Documentation
- Added dedicated docs and examples for !unset, including usage in stack manifests, nested removals, list handling, and guidance on expected behavior.

feat(imports): cache remote stack-import clones (dedup + opt-in TTL) @osterman (#2571)

what

Clone each remote (Git) stack-import source repository at most once per Atmos invocation instead of once per import — all subdir imports of the same repo now resolve from a single shared clone (within-run dedup, spanning both describe affected passes).
Add an opt-in ttl to reuse the cloned source across runs until it expires: per-import (ttl: in the import map form) and a global imports.ttl default in atmos.yaml. With no ttl, the source refreshes once per run so mutable refs like ?ref=main stay fresh.
Wire the default git-subdir resolve path through the existing ensureSourceDir, add per-session fetch tracking + TTL freshness (timestamp persisted in the .atmos-source-ready marker), and extract a shared duration.IsExpired/IsZeroTTL that the source provisioner now reuses.
Update JSON schemas, add unit tests, document "Caching Remote Imports" in stacks/imports.mdx, add a changelog blog post, and add a roadmap milestone.

why

For hub-and-spoke repos pulling a shared catalog via remote imports, atmos describe affected was re-cloning the hub repo once per import (~68–87×/run, ~7–11 min total), and a warm actions/cache of ~/.cache/atmos/stack-imports/ was ignored because the subdir path re-cloned unconditionally.
Within-run dedup collapses those clones to one per repo (the ~80% win, no staleness risk); the opt-in ttl lets CI reuse the clone across runs (warm cache skips the clone entirely) while keeping mutable refs fresh by default. Shallow clones (depth=1) were already in use — the win is not re-cloning the same repo repeatedly.

references

Cached sources live under the XDG cache dir (~/.cache/atmos/stack-imports/, honoring XDG_CACHE_HOME).
Builds on the source-provisioning TTL mechanism (pkg/duration, pkg/provisioner/source).
Changelog: website/blog/2026-06-05-faster-remote-stack-imports.mdx

Summary by CodeRabbit

New Features
- Per-import ttl and global imports.ttl for optional cross-run caching of remote stack imports.
- Each unique remote Git source is cloned at most once per invocation and shared across nested imports.
- Improved cache freshness semantics, including explicit zero-ttl behavior.
Documentation
- Added caching guide, TTL examples, XDG cache guidance, and a blog post.
Tests
- Added tests for TTL parsing/expiration and remote import caching behavior.

[codex] consolidate terraform bulk execution on scheduler @shirkevich (#2466)

Summary

route Terraform --all, --components, and --query through the scheduler-backed Terraform adapter
build Terraform dependency graphs from dependencies.components first, with settings.depends_on fallback
preserve query-path auth manager setup, store resolver bridging, YAML function processing, and per-component CI hook capture
includes #2348 identity/auth fixes in this stack so local --identity terraform testing works
include the credential-store concurrency-safety prerequisite discovered by concurrency validation
keep effective scheduler concurrency fixed at 1 for this PR

Stacking

This PR is stacked on PR 2 and targets codex/dag-scheduler-core.

PR 4 is #2468 and is stacked on this branch to introduce plan-only --max-concurrency wiring.

Supersedes the earlier fork-headed draft #2462 now that the stack branches exist in cloudposse/atmos.

Draft note

This branch is back to the intended PR 3 review shape: Terraform --all, --components, and --query share the graph-backed scheduler path, but execution remains sequential.

The temporary ATMOS_EXPERIMENTAL_DAG_MAX_CONCURRENCY validation hook has been removed. User-visible plan concurrency now belongs to PR 4.

This branch retains the narrow credential-store concurrency-safety prerequisite discovered during validation:

credential-store initialization no longer mutates global Viper env bindings per component and preserves ATMOS_KEYRING_TYPE precedence

Validation

go test ./pkg/scheduler ./pkg/scheduler/adapters ./internal/exec -run TestExecuteTerraformQuery|TestExecuteTerraformQueryNoMatches|TestBuildTerraformDependencyGraph|TestExecuteTerraformAllUsesGraphBackedSequentialOrder|TestExecuteTerraformComponentsUsesGraphBackedSequentialOrder|TestExecuteTerraformQueryUsesGraphBackedSequentialOrder|TestExecuteTerraformKeepsIndependentComponentsSequential|TestBuildTerraformGraph
go test ./pkg/auth/credentials
go test -race ./pkg/auth/credentials -run TestNewCredentialStoreWithConfig_ConcurrentInitialization
go test ./pkg/auth ./internal/exec -run TestCreateAndAuthenticateManagerWithAtmosConfig|TestSetupTerraformAuth|TestProcessComponentConfig_PropagatesAuthManager|TestProcessComponentConfig_AuthManagerGuardBranches
built build/atmos and live-tested against a downstream stack with terraform plan --all and an explicit identity

Validation findings carried forward

The first concurrency-4 validation run exposed an auth race: per-component credential-store initialization called global viper.BindEnv, causing fatal error: concurrent map writes. This PR fixes that narrowly in pkg/auth/credentials.
Higher-concurrency validation also showed local Terraform working-directory contention when multiple logical aliases share one physical Terraform component directory. PR 4 keeps path-based locking while introducing plan concurrency.

Follow-up discussion

The longer-term way to unlock true parallelism for aliases sharing one physical Terraform folder would be per-node isolated workdirs plus isolated TF_DATA_DIR and generated files. That needs repo-owner discussion because it changes the operator debugging model: Atmos would need to decide whether and how to retain those per-node copies for inspection, how atmos terraform shell maps to them, and how cleanup/debug artifacts are managed.

Summary by CodeRabbit

New Features
- Graph-backed Terraform scheduler with deterministic dependency order, reversed destroy order, per-resource serialization, concurrency control, per-component output capture/hooks, and signal-aware cancellation.
- New Terraform run options: --failure-mode, --max-concurrency, log-order, hide (including no-changes), and execution-summary file.
- Line-prefixing writer for prefixed log output.
Bug Fixes
- Credential keyring type now respects ATMOS_KEYRING_TYPE and is safe for concurrent init.
- Workdir sync/hash skips Terraform/OpenTofu runtime dirs.
- More tolerant Git repo opening for worktrees.
Tests
- Large expansion of tests covering scheduler behavior, CLI options, concurrency, logging, auth, and new utilities.

feat: install Atmos from a branch or tag with --use-version=ref: @osterman (#2569)

what

Add a ref:<name> version spec to --use-version (and version.use in atmos.yaml / ATMOS_USE_VERSION) that installs Atmos from the latest commit of a branch or tag, e.g. atmos --use-version=ref:main version.
Accepts branch names, tag names, and slash-qualified refs for disambiguation: ref:main, ref:release/v1.199, ref:v1.199.0, ref:heads/main, ref:tags/v1.199.0.
Resolves the ref to its full commit SHA via the GitHub API, then reuses the existing sha: install/cache path unchanged; ref versions always re-execute and fail hard on resolution errors.
Docs (version/use.mdx), a minor blog post, and a roadmap milestone.

why

Previously --use-version only accepted PR numbers (pr:1234), commit SHAs (sha:ceb7526), and releases — a branch name like main was rejected, even though branch/tag pushes already publish the same build-artifacts-* from the Tests workflow.
ref: lets you pin a moving target once (ref:main) instead of chasing a new sha: after every merge, making it trivial to test unreleased fixes on a branch.
The ref is re-resolved on every run so a mutable branch always tracks the latest build, while the SHA-keyed cache avoids reinstalling when the ref hasn't moved. Resolving to the full SHA also sidesteps GitHub's head_sha filter, which only matches full (not short) SHAs.

references

Docs: Version Pinning
Changelog: website/blog/2026-06-04-use-version-ref.mdx

Summary by CodeRabbit

New Features
- Support for git branches/tags via --use-version=ref: (resolves refs to commit SHAs and uses existing artifact download/cache).
Behavior Changes
- CI artifact selection now prefers the newest workflow run that contains the platform artifact (may pick in-progress or failed runs if they include the artifact).
- Re-exec/version switching treats ref: like immutable versions (resolve → install/cache).
Bug Fixes
- Clearer, user-friendly error when a ref does not exist (with actionable hints).
Documentation
- Added CLI docs, blog post, and roadmap entry describing ref: usage and caching.

feat: Add custom component types for custom commands @osterman (#1904)

Summary

Implement shell completion for semantic-typed flags and arguments (component/stack types)
Add interactive prompting for missing required semantic-typed values
Support custom component types in shell completions

What Changed

New custom component type provider system (pkg/component/custom)
Shell completion for semantic-typed arguments and flags in custom commands
Interactive prompting for missing required semantic-typed values
Extended command schema to support semantic types and components
Comprehensive test coverage for completion and prompting functionality

Why This Matters

This feature enables custom commands to provide superior developer experience through:

Tab completion for component and stack arguments/flags
Interactive prompts for required semantic-typed values
Support for custom component types beyond built-in types

References

closes #1787
closes #444
closes #438

Summary by CodeRabbit

New Features
- Custom component types with registry support, CLI integration, and template access to resolved component data.
- Enhanced CLI semantic completion and interactive prompting for selecting component and stack values.
- Aggregated component listing across stacks for discovery and completion.
Documentation
- New guides, examples, and blog post demonstrating custom component types and workflows.
- Schema updates to validate custom component manifests.
Tests
- Broad test coverage for completion, providers, processing, and stack handling.

docs(gists): add Atmos + Packer + GitHub Actions AMI pipeline gist @aknysh (#2560)

what

Add a new gist at gists/aws-ami-packer-github-actions/ demonstrating an end-to-end AWS AMI pipeline with Atmos + Packer + GitHub Actions:
- Build a hardened Amazon Linux 2023 AMI with Packer, orchestrated by Atmos.
- Validate it on a live test instance, optionally scan it, and gate promotion behind a manual approval.
- Tag the approved image ScanStatus=approved and share it across AWS accounts.
Drive the whole build from stack configuration (no hardcoded HCL) and operate the result through a tree of atmos ami custom commands (get-ami-id, tag, share, launch/terminate test instances, …).
Include reference IAM/OIDC policies and an org SCP that enforces "launch only approved AMIs".
Wire the gist into the docs-site file browser (tags + related-docs links) and announce it with a blog post.

why

"How do I use Atmos + Packer to build AMIs, and automate the build → approve → share process?" is a frequent community question. This gist is a vendor-neutral, copy-and-adapt reference recipe that combines several Atmos features into one production-shaped workflow.
Like all gists, it's shared as-is (not part of the CI-tested examples), so users adapt it to their environment and Atmos version.

references

Gist: gists/aws-ami-packer-github-actions/
Blog post: website/blog/2026-06-01-gist-aws-ami-packer-github-actions.mdx

Summary by CodeRabbit

New Features
- Added a complete gist showing an end-to-end AMI build/validate/approve/share pipeline using Atmos + Packer + GitHub Actions, with reusable setup and tool-install steps, approval gate, optional vulnerability scan, and cross-account sharing.
Documentation
- Added detailed README, customization checklist, policy templates, and a blog post documenting setup, governance (OIDC, IAM, SCP), local execution, and cleanup guidance.

feat: add !git.* repository YAML functions and atmos.Resolve template func @osterman (#2558)

what

Add five new !git.* YAML functions that expose Git repository metadata from the origin remote: !git.repository (the <owner>/<repo> slug, e.g. cloudposse/atmos), !git.owner, !git.name, !git.host, and !git.url.
Add the atmos.Resolve template function, which evaluates any Atmos YAML-function string (!git.*, !exec, !store, !terraform.output, …) at template-render time so its result can be composed with other strings and template variables in a single value.
The new YAML functions are parsed generically (GitHub/GitLab/Bitbucket/Azure DevOps), support a fallback value, and work in both stack/component processing and atmos.yaml config preprocessing.
Includes unit tests, per-function docs, two changelog posts, a roadmap update, and a follow-up PRD.

why

Users needed the repository slug (and its parts) for tagging resources and building backend paths, previously only achievable by shelling out via !exec echo ${GITHUB_REPOSITORY:-$(git remote get-url origin | sed …)}.
A bare YAML tag owns the entire scalar and Atmos renders Go templates before YAML functions, so composing a function result with extra text (e.g. prefixing workspace_key_prefix with the repo slug) was impossible without !exec; atmos.Resolve makes that composition native:
```
workspace_key_prefix: '{{ atmos.Resolve .settings.context.repo }}/{{ or .metadata.name .metadata.component }}'
```

references

Extends the existing Git YAML function family from the Git YAML Functions changelog.
Docs: /functions/yaml/git.repository, /functions/template/atmos.Resolve.
Follow-up: docs/prd/lazy-yaml-function-template-values.md (lazy-Stringer auto-deref so {{ .settings.context.repo }} evaluates without atmos.Resolve).

Summary by CodeRabbit

New Features
- Added Git repository metadata YAML functions (!git.repository, !git.owner, !git.name, !git.host, !git.url).
- Added atmos.Resolve template function to evaluate YAML functions during template rendering for inline composition.
Documentation
- Added PRD, docs pages, blog posts, and roadmap entries describing the new YAML functions and atmos.Resolve.
Tests
- Added tests covering Git YAML tag resolution and the new template Resolve behavior.
Chores
- Updated link-checker configuration to exclude slow/intermittent targets.

feat(stacks): template variables in import paths from earlier imports @osterman (#2554)

what

Render Go templates in stack import: paths (local paths and a remote import's Git ?ref=) against the settings/vars/env accumulated from imports listed earlier in the same manifest, plus the import's own context.
A single variable (e.g. settings.context.deployment_repo_version, set once in a _defaults) can now pin both a remote catalog import's ref and the component source.version.
Only the import path string is rendered; imported file content templating and its deferral are unchanged. Missing values are a hard error (with hints) unless ignore_missing_template_values is set; skip_templates_processing or a disabled templating engine leaves the path literal.
Adds the ErrImportPathTemplate sentinel, a fixture scenario + unit tests, docs ("Referencing Earlier Imports in Import Paths"), a changelog blog post, and a roadmap milestone.

why

Keep dev and prod in one repo while isolating prod from dev changes: dev uses local catalogs/sources, prod imports a versioned catalog and pins the component source to an immutable ref — both driven by one variable.
Previously the component source.version template worked (resolved late, at component processing) but the import ?ref= had to be hard-coded, because imports are resolved before that context exists. This closes that gap so both come from the same variable.

references

Docs: /stacks/imports#referencing-earlier-imports-in-import-paths
Builds on remote stack imports (#2528) and the git context YAML functions (#2537)

Summary by CodeRabbit

New Features
- Import paths now support Go-template rendering, letting paths reference settings, vars, and env from earlier imports in the same manifest.
Bug Fixes
- Templating failures in import paths now surface a clear error; options added to ignore or skip unresolved import templates.
Documentation
- Added docs and a blog post with examples and operational guidance for templated import paths.

Add ECR Public authentication: `aws/ecr-public` integration and `atmos aws ecr login --public` @osterman (#2231)

what

Add ECR Public authentication to Atmos for authenticated access to public.ecr.aws, solving Docker rate limiting on public ECR images. Two entry points:

atmos aws ecr login --public — direct, zero-config login using ambient AWS credentials (the AWS SDK default chain: env, shared config/profile, SSO, IMDS/IRSA/ECS), or --public --identity <name> to use a specific identity. Ideal for CI.
aws/ecr-public integration kind — for automatic login on atmos auth login and identity linking.

Key changes:

Command (cmd/aws/ecr/login.go): new --public flag on atmos aws ecr login; ambient-credential and identity-based ECR Public login paths; mutually exclusive with a positional integration argument and --registry.
Cloud layer (pkg/auth/cloud/aws/ecr_public.go): GetPublicAuthorizationToken() calls ecrpublic:GetAuthorizationToken, always in us-east-1.
Integration layer (pkg/auth/integrations/aws/ecr_public.go): ECRPublicIntegration factory registering the aws/ecr-public kind, with region validation at config time. Implements the full Integration interface including Cleanup() (docker logout) and Environment() (DOCKER_CONFIG).
Region validation: rejects unsupported regions (only us-east-1 and us-west-2 have service endpoints; auth is us-east-1 only).
Tests: cloud-layer and integration-layer unit tests (token retrieval, region validation, cleanup, error handling) with a generated mock ECR Public client; command tests for the --public flag and mode validation.
Documentation: atmos aws ecr login command reference (added --public flag), ECR authentication tutorial, and a PRD (docs/prd/ecr-public-authentication.md).
Blog post + roadmap: announcement and a shipped milestone linking to the changelog.

Note: this branch has been merged up to main. Following #2144 (atmos auth ecr-login → atmos aws ecr login), ECR login lives under the aws namespace, and the integration was adapted to main's evolved Integration interface (exported BuildAWSConfigFromCreds, new Cleanup/Environment methods).

why

Docker pulls from public.ecr.aws hit rate limits when unauthenticated. This blocks CI workflows, especially those using cloudposse/github-action-docker-build-push which pulls BuildKit/binfmt images on every run. Authenticated pulls have significantly higher (or no) rate limits.

Because public.ecr.aws is global, any valid AWS credentials unlock authenticated pulls — so --public with ambient credentials "just works" in CI with zero configuration. ECR Public otherwise differs from private ECR: it uses the ecrpublic SDK service, a bearer token instead of SigV4, a hardcoded us-east-1 auth region, and a fixed public.ecr.aws registry URL. It requires ecr-public:GetAuthorizationToken and sts:GetServiceBearerToken IAM permissions.

references

ECR Public Authentication Tutorial — configuration examples, multi-environment setup.
atmos aws ecr login Command Reference — command usage, --public flag, integration configuration.
ECR Public Blog Post — announcement and use cases.
PRD: docs/prd/ecr-public-authentication.md.
AWS Docs: ECR Public APIs.

Summary by CodeRabbit

New Features
- ECR Public authentication (aws/ecr-public) with atmos aws ecr login --public, identity-driven auto-provisioning, and enforced us-east-1 auth.
Documentation
- Tutorials, blog post, and roadmap updated with ECR Public examples, permissions, CI guidance, and troubleshooting.
Bug Fixes
- Improved identity selection UX (confirmation message) and safer CLI behavior for non‑TTY identity selection.
Tests
- Extensive unit and integration tests covering ECR Public flows and CLI routing.
Chores
- NOTICE/dependencies updated and minor .gitignore tweak.

feat(auth): Atmos Pro STS — JIT GitHub token broker for CI @osterman (#2546)

what

Add a new auth provider kind: atmos/pro that authenticates the Atmos CLI to Atmos Pro by federating the GitHub Actions runner's OIDC token into an Atmos Pro session JWT (v1 is OIDC-only).
Add a new auth integration kind: github/sts — a just-in-time GitHub token broker for CI. On login it mints short-lived, scoped GitHub App installation tokens via POST /api/v1/sts, materializes them as per-owner GIT_CONFIG_* URL rewrites (env or file mode), and revokes them at command-end (via atmos auth exec in CI) and on atmos auth logout.
Add a passthrough kind: atmos/pro identity, a keyring-registered ProCredentials type, realm scoping for integration state, and via.provider binding for integrations (in addition to via.identity).
Add ATMOS_PRO_GITHUB_TOKEN, preferred by Atmos-native git operations (vendoring, source: provisioning, go-getter) ahead of ATMOS_GITHUB_TOKEN/GITHUB_TOKEN.
Add the PRD (docs/prd/atmos-pro-sts.md), a changelog blog post, a shipped roadmap milestone, and configuration docs; full unit-test coverage for the provider, identity, integration, keyring round-trip, via.provider matching, revoke gating, and token precedence.

why

Fetching private Terraform modules, Atmos source: components, and vendored artifacts in CI today requires a long-lived, over-privileged GitHub credential (PAT, machine user, or deploy key) sitting in a CI secret — a standing breach risk that can't be scoped per-run.
Atmos Pro STS replaces that with least-privilege, deny-by-default, short-lived tokens minted at the start of a run and revoked at the end — with zero .tf changes (the injected GIT_CONFIG_* rewrites are honored by both go-getter and Terraform's native git), and multi-org support because tokens are minted per (installation, permission-set).
Built into Atmos CLI core (CI-native, OIDC-aware) rather than as a GitHub Action, modeled on the existing aws/ecr/aws/eks integrations; the workflow only needs permissions: id-token: write.

references

PRD: docs/prd/atmos-pro-sts.md (includes deferred Future Work: moving Pro connection config under auth, unifying pkg/pro onto auth-issued sessions, and broadening command-end revoke beyond atmos auth exec)
Changelog: website/blog/2026-05-29-atmos-pro-github-sts.mdx

Summary by CodeRabbit

New Features
- Atmos Pro GitHub token broker: new atmos/pro provider + github/sts integration for just-in-time GitHub tokens (env or git-config modes) with realm-scoped state and optional token export.
- ATMOS_PRO_GITHUB_TOKEN added as preferred GitHub token source.
- CI-gated automatic token revocation on command exit/logout.
- Ambient credential broker registry to auto-provision env vars for remote reads.
Documentation
- PRD, docs, and blog post for Atmos Pro STS and usage guidance.

docs: re-date custom commands step types blog post to 2026-05-30 @osterman (#2550)

what

Re-dated the "25+ Interactive Step Types" blog post from 2026-01-03 to 2026-05-30.
Renamed the file prefix (git mv, history preserved) and added a matching date: 2026-05-30 frontmatter field.

why

Aligns the post's publish date with its actual release timing so it surfaces correctly in the changelog feed.
Adds the explicit date: field to match the repo convention (e.g. 2026-05-28-git-yaml-functions.mdx).
The slug is unchanged, so the published URL stays the same.

references

N/A — docs-only date adjustment, no user-facing code change.

Summary by CodeRabbit

Documentation
- Published comprehensive guide to custom commands and workflow step types, featuring 25+ interactive step types with usage examples, including input collection, output formatting, and variable passing conventions for enhanced automation capabilities.

fix(website): use consistent brand-blue announcement bar @osterman (#2551)

what

Removed the per-announcement backgroundColor/textColor overrides from website/src/data/announcements.js so every announcement bar entry inherits the brand-blue (#3578e5) / white-text defaults from the --announcement-bar-* CSS variables.
Documented the convention in the file header so future announcements don't reintroduce per-entry colors.

why

The announcement bar cycled through a rainbow of saturated Tailwind-600 colors (emerald green, violet, cyan, amber, red, indigo, teal...) that looked like "crayola" against the site's dark, near-black theme.
A single restrained, on-brand color reads as sophisticated and consistent with the rest of the dark site, and matches the bar's original styling.

references

N/A (website cosmetic change)

Summary by CodeRabbit

Refactor
- Standardized announcement bar styling configuration to use shared CSS variables instead of per-announcement color settings, improving consistency across announcements.

feat: Implement workflow step types with registry pattern (DEV-263, DEV-2969) @osterman (#1899)

what

Add 20+ step types across 4 categories (Interactive, Output, UI, Command) with extensible registry pattern
Support Go template variable passing between steps (e.g., {{ .steps.step1.value }})
Implement per-step output modes: viewport (pager), raw (passthrough), log (grouped), none (silent)
Interactive handlers with TTY detection and clear error messages in CI environments

why

Addresses DEV-263 (add input type to workflows) and DEV-2969 (add viewport support). Enables users to build complex multi-step workflows with user interaction, conditional execution, and flexible result display.

references

Closes #DEV-263
Closes #DEV-2969

Summary by CodeRabbit

Release Notes

New Features
- Added 25+ interactive step types for workflows and custom commands (input, confirm, choose, filter, file, write, markdown, spin, table, style, and more).
- Support for configurable output modes (viewport, raw, log, none) and step-level display options.
- Workflow progress rendering and status indicators.
Documentation
- Comprehensive guides for interactive workflows and custom commands with step type reference.
- New examples demonstrating interactive deployments, credentials collection, and multi-step flows.
Bug Fixes
- Improved error messaging for workflow step validation and execution failures.

Add process and I/O execution foundation @shirkevich (#2464)

Summary

This is PR 1 for the DAG concurrent execution rollout. It introduces the reusable process and stream-isolation foundation without enabling scheduler behavior or changing Terraform bulk routing.

Changes:

Add pkg/process with Runner, TaskSpec, Streams, Result, default os/exec runner, context-aware execution, cancellation reporting, and exit-code preservation.
Extend pkg/io with prefixed per-node stream composition for terminal, file, and capture sinks.
Refactor internal/exec.ExecuteShellCommand() into a backward-compatible wrapper over pkg/process while preserving CI stdout/stderr capture options.
Replace the runTerraformShow() global os.Stdout swap with injected stdout capture.

Scope

No scheduler, CLI routing consolidation, concurrency flags, or Terraform adapter behavior is enabled in this PR.

Stacking

This PR is the bottom of the DAG rollout stack and targets main.

Supersedes the earlier fork-headed draft #2459 now that the stack branches exist in cloudposse/atmos.

Validation

rtk env GOCACHE=/private/tmp/atmos-gocache GOMODCACHE=/private/tmp/atmos-gomodcache go test ./pkg/process ./pkg/io ./internal/exec ./cmd/terraform

Next PR

PR 2 branches from codex/dag-process-io-foundation and adds the generic pkg/scheduler core with ready-queue scheduling, bounded workers, deterministic aggregate results, and isolated unit tests only.

Summary by CodeRabbit

Release Notes

New Features
- Configurable subprocess execution with optional contexts and injectable streams
- Composable, scope-scoped output writers with per-line prefixing and masking
Bug Fixes
- More accurate subprocess exit/error reporting and improved stream-redirection behavior
Tests
- Expanded unit tests for subprocess execution, stream injection/capture, and output utilities
Documentation
- Updated concurrent execution docs to reflect stream-based output handling

Add core git YAML functions @osterman (#2537)

what

Add core Git YAML functions: !git.root, !git.sha, !git.branch, and !git.ref.
Resolve Git metadata through pkg/git, with pkg/utils limited to compatibility shims and YAML tag registration.
Wire Git tag resolution through config preprocessing, stack/component YAML processing, and function registry metadata.
Add a changelog post and roadmap milestone for the new Git YAML functions.

why

Allow dev stack/component source versions to pin to the current Git SHA via !git.ref.
Keep prod pins explicit while giving dev environments PR-aware source refs.
Avoid expanding pkg/utils by placing Git behavior in the self-contained Git package.

references

Summary by CodeRabbit

New Features
- Added Git YAML tags (!git.root / !repo-root, !git.sha, !git.ref, !git.branch) to resolve repo root, commit SHA/ref, and branch in configs and stacks; !git.ref can pin source versions.
Refactor
- Centralized git tag resolution for consistent behavior, alias support, unified fallbacks, and clearer error handling.
Tests
- Expanded coverage for tag resolution, fallbacks, detached‑HEAD behavior, and real-repo scenarios.
Documentation
- Updated blog post and roadmap with examples and usage notes.

🚀 Enhancements

fix(stacks): honor component list_merge_strategy in metadata.inherits… @JaseKoonce (#2565)

what

settings.list_merge_strategy set on a component now applies when merging lists via metadata.inherits
Adds tests covering append, replace, and merge strategies across single and multi-level inheritance
chains

why

Component-level list_merge_strategy was only honored on the import/stack merge path (fixed in #2480).
The metadata.inherits resolution path always used the global atmosConfig, so per-component overrides were
silently ignored
A component with list_merge_strategy: append inheriting two bases would get last-wins ([from_b]) instead
of the expected accumulated result ([from_a, from_b])

references

Closes #2396
Follow-up to #2480

Summary by CodeRabbit

Improvements
- Component inheritance now applies per-component list merge strategies during metadata-based inheritance so inherited lists are accumulated, replaced, or merged by index according to the inheriting component’s settings across multi-level chains.
Tests
- Added integration tests and fixture scenarios validating append, replace, multi-level append, and merge-by-index behaviors for metadata inheritance.

fix(auth): unwrap Atmos Pro envelope in github/sts mint @osterman (#2568)

what

Fix the github/sts auth integration ignoring a successfully minted Atmos Pro STS token because mint() decoded the response with a flat struct instead of the canonical API envelope.
Add a shared, reusable primitive — dtos.Envelope[T] + pro.DecodeEnvelope[T] — and route mint() through it so every Atmos Pro response unwraps the nested data payload through one sanctioned path.
Fix the bug-masking test fixture (the simulated broker now emits the real envelope shape) and add a regression test asserting mint() persists 1 token, not 0, plus decoder unit tests including a canary that a flat payload decodes to empty data.

why

Every Atmos Pro API route returns { "success": true, "status": 200, "data": { "tokens": [...], "excluded": [...] } }, but mint() decoded straight into the flat stsResponse (top-level tokens), so it always read 0 tokens — the CLI logged GitHub STS: no tokens granted, never wrote the git insteadOf config, and cross-repo import: calls fell back to the ambient GITHUB_TOKEN and failed with remote: Repository not found, even though the server had minted a valid token (HTTP 200, so no error surfaced).
The existing e2e test passed only because its simulated broker returned the unwrapped {tokens,excluded} shape the real server never sends; matching the fixture to the real envelope and adding the regression/canary tests prevents this whole class of "decoded a Pro response without the envelope" bug from recurring.

references

mint() was the only Pro call bypassing the shared AtmosApiResponse envelope that ExchangeOIDCToken / LockStack already use.

Summary by CodeRabbit

Bug Fixes
- Clearer STS error messages and correct unwrapping of canonical API envelopes.
- Prevent ambient tokens from being baked into Git URLs by honoring insteadOf rewrites (including file-mode).
- Avoid invalid git checkout/fetch for empty refs by fetching default branch and skipping bad checkouts.
- Warn when component source is misplaced under metadata and accept simple-form source strings.
New Features
- Provision credential brokers before Git source detection so token rewrites apply.
Tests
- Expanded tests covering envelope decoding, STS handling, broker provisioning, git insteadOf, and default-ref behavior.
Documentation
- Added fix notes on STS envelope/token-shadowing and updated PRD guidance for source.

fix(pro): respect metadata.enabled when uploading instances for drift @osterman (#2563)

what

atmos list instances --upload now collapses the Atmos Pro enabled hierarchy (metadata.enabled > settings.pro.enabled > settings.pro.drift_detection.enabled) before uploading, so the values Atmos Pro persists already reflect any outer disable.
A shared effectiveEnabledState helper is the single source of truth for both the upload payload (extractProSettings) and the success-toast counts, so they can no longer diverge.
Disabled components are still uploaded (as pro.enabled: false) rather than omitted, so Atmos Pro shows them disabled instead of orphaning them.
Reference docs corrected (settings/pro.mdx gains a settings.pro.enabled entry + precedence note; list/list-instances.mdx drops the now-false "preserved verbatim" / "drift is independent of pro.enabled" claims), plus a docs/fixes/ write-up.

why

Components disabled upstream via metadata.enabled: false kept failing scheduled drift detection (dispatchError: "missing_plan_result", drift_status: error): the CLI skips planning them, but the upload serialized the raw settings.pro block and never sent metadata.enabled, so Atmos Pro (whose ingestion contract has no metadata field) persisted them as enabled:true, drift_enabled:true and legitimately dispatched drift.
Fixing it in the CLI keeps the determination where it is already resolved and needs no Atmos Pro change: the stuck error rows self-heal to disabled on the next upload, with no data migration.
pro.enabled defaults to true (matching the Pro server-side default) so the collapse only ever turns things off when an outer level is explicitly disabled — it never regresses default-enabled components.

references

docs/fixes/2026-06-03-drift-dispatch-ignores-metadata-enabled.md (root-cause analysis, Neon instances evidence, verification steps)
Source of truth for the disabled determination: internal/exec/component_utils.go (isComponentEnabled)

Summary by CodeRabbit

Bug Fixes
- Resolve upload so component enablement honors metadata.enabled, preventing metadata-disabled components from remaining scheduled for drift and correcting counts; disabled components are uploaded as disabled rather than omitted.
Documentation
- Clarify enablement precedence (metadata.enabled > settings.pro.enabled > drift_detection.enabled), upload behavior, and how effective Pro/drift state is reflected in UI counts.
Tests
- Add unit and end-to-end tests validating effective enablement resolution, drift counting, and uploaded payloads.

fix(auth): deduplicate ECR, ECR Public, and EKS integrations to once per process @MrZablah (#2564)

What

Adds a process-level execution cache to triggerIntegrations so that
auto-provisioned integrations (aws/ecr, aws/ecr-public, aws/eks)
fire at most once per atmos invocation, regardless of how many times
Authenticate is called or how many AuthManager instances are created.

The cache key is the integration's canonical target endpoint rather than
its config entry name:

aws/ecr → "aws/ecr:<account_id>:<region>"
aws/ecr-public → "aws/ecr-public" (single global registry)
aws/eks → "aws/eks:<cluster_name>:<region>"
everything else → integration name (no behaviour change)

This means two config entries that point at the same registry — e.g. one
from global atmos.yaml and one from a component stack file — are
collapsed to a single execution.

Why

atmos terraform plan calls Authenticate from at least three internal
paths: setupTerraformAuth, TerraformPreHook, and one call per YAML
function (!store.get, !terraform.state). With a 6-tool .tool-versions
this produced 6 ECR logins per command. Switching to a name-keyed cache
reduced it to 2 because merged configs can carry two integration entries
with different names for the same registry. Keying by target endpoint
reduces this to exactly 1.

Changes

pkg/auth/manager_integrations.go — adds processIntegrationCache sync.Map, resetProcessIntegrationCache() (test helper),
integrationTargetKey() (canonical key helper covering aws/ecr,
aws/ecr-public, aws/eks); updates triggerIntegrations to use
LoadOrStore on the target key.
pkg/auth/manager_integrations_test.go — adds
TestIntegrationTargetKey (table-driven tests for all key variants
including ECR Public) and TestIntegrationTargetKey_Deduplication
(verifies that two same-registry entries produce one cache hit).

Notes

aws/ecr-public was added to upstream/main in #2231 after this branch
diverged; coverage for it was added here to keep deduplication consistent
across all three AWS integration kinds.

references

ECR / ECR Public Login Executes Multiple Times Per atmos terraform Invocation
#2562

Summary by CodeRabbit

New Features
- Added process-level deduplication for auto-provisioned integrations to prevent redundant provisioning of the same target within a single process.
- Failed provisioning attempts are evicted from the dedupe cache so retries can proceed.
Tests
- Added unit tests validating cache key behavior and deduplication scenarios to ensure consistent provisioning outcomes.

fix(auth): make github/sts compose with default GitHub token injection @osterman (#2557)

what

Stop Atmos's go-getter token injection from silently shadowing github/sts-minted GitHub tokens: CustomGitDetector now skips URL token injection when a live GIT_CONFIG_* insteadOf rewrite already matches the URL's host/owner, so git's rewrite (carrying the correct least-privilege token) wins.
Make the ATMOS_PRO_GITHUB_TOKEN bridge consistent: resolveToken falls back to the live env var (which the broker sets after startup), mirroring pkg/http/client.go.
Default token_env to ATMOS_PRO_GITHUB_TOKEN (was empty) so a single-owner mint reaches gh/REST and Atmos's in-process git path automatically.
Replace the ad-hoc {owner} placeholder with Atmos's standard Go-template syntax ({{ .owner }}, plus .host); update docs, PRD, and add a docs/fixes/ write-up.

why

A real CI job resolving a remote import: from a second private repo failed with remote: Repository not found — the minted token was correct, but the ambient GITHUB_TOKEN was being injected into the URL ahead of it, defeating git's insteadOf rewrite. The only fix was the settings.inject_github_token: false workaround.
These changes make github/sts (introduced in #2546) compose with the default settings.inject_github_token: true, so it "just works" with no workaround. Reproduced first with a simulated-broker e2e test, then fixed.

references

Fixes the github/sts feature shipped in #2546
docs/fixes/2026-06-01-github-sts-token-injection-shadowing.md (root cause, fix, and why this is a fix doc rather than a changelog entry)
docs/prd/atmos-pro-sts.md

Summary by CodeRabbit

Bug Fixes
- Prevented minted GitHub tokens from being silently overridden by detecting broker-provided git URL rewrites and skipping ambient token injection.
New Features
- token_env accepts Go-template names (e.g., GH_TOKEN_{{ .owner }}) and defaults to ATMOS_PRO_GITHUB_TOKEN when appropriate.
- Token resolution prefers a live exported broker token before falling back to configured values; minted tokens are not logged.
Documentation
- Clarified github/sts token_env semantics, templating, multi-owner behavior, and URL-rewrite interactions.
Tests
- Added/expanded tests for token-env defaults, templating, precedence, and insteadOf handling.
Chores
- Made license NOTICE generation produce deterministic URLs.

fix(auth): report missing exec binary instead of "atmos requires a subcommand" @osterman (#2559)

what

Fix atmos auth exec -- <command> reporting the misleading "The command atmos requires a subcommand" when the executable after -- (e.g. uvx) is not found on PATH.
The missing executable is now reported clearly via the error builder: the command name, the underlying cause, a PATH hint, and exit code 127.
Internally, Cobra's "unknown command" conversion now uses the ErrUnknownSubcommand sentinel, and the root handler intercepts that (via a new testable unknownSubcommand helper) instead of the overloaded ErrCommandNotFound.

why

auth exec and the registry executor both wrapped the shared ErrCommandNotFound sentinel, so a missing user binary was indistinguishable from an unknown Atmos subcommand and got masked as root usage output — hiding the real cause.
Separating the two sentinels gives accurate errors for both cases (genuine unknown subcommands still show root usage with suggestions; missing executables now say "command not found" with a hint), and also fixes the same latent masking for pkg/hooks command lookups.

references

Regression from the atmos auth → command-registry migration (#1919) combined with the registry executor's Cobra-error conversion (#1643).

Summary by CodeRabbit

Bug Fixes
- Clearer "command not found" errors with install guidance and enforced exit code 127.
- Distinguish missing external executables from unknown subcommands so help is shown only for genuine unknown subcommands.
Tests
- Added/updated tests to guard error-classification behaviors and prevent regressions.
Documentation
- Adjusted BSD dependency listing to mark the URL as Unknown.

fix: allow --use-version artifact downloads without GitHub token @osterman (#2212)

what

Allow unauthenticated artifact downloads for public repositories via --use-version flag
Metadata fetching (PR info, workflow runs, artifact listing) and artifact downloads now work without authentication on public repos per GitHub API docs
Replace upfront GetGitHubTokenOrError() gate with optional GetGitHubToken() in InstallFromPR() and InstallFromSHA()
Skip Authorization header when token is unavailable in downloadPRArtifact()
Add smart HTTP error handling with buildDownloadHTTPError() to distinguish auth failures from rate limiting

why

Users without GitHub token environment variables couldn't install PR artifacts, even for public repositories
Rate limit errors (429) were reported generically as "HTTP 429" with no actionable context
Need to properly surface rate limit information (60/hr for unauthenticated, 5,000/hr for authenticated) to guide users

references

Fixes the issue where atmos --use-version=2129 fails with "authentication failed" when no GITHUB_TOKEN is set
GitHub API documentation confirms artifact downloads work without authentication for public repositories

Summary by CodeRabbit

New Features
- Added optional unauthenticated access for public GitHub artifacts (subject to rate limits)
- New ATMOS_GITHUB_CLI env var to control/disable CLI-based token retrieval
Bug Fixes
- Clearer handling and messaging for auth vs rate-limit errors, with improved hints and retry info
- GitHub token is now optional for artifact operations (falls back to anonymous when available)
Tests
- Expanded tests for artifact downloads and HTTP auth/rate-limit scenarios
Documentation
- Documented ATMOS_GITHUB_CLI usage and behavior

fix(version): honor ATMOS_USE_VERSION env var for version re-exec @osterman (#2556)

what

Honor the documented ATMOS_USE_VERSION environment variable so Atmos actually switches to (and downloads, if needed) the requested version during early re-exec.
resolveRequestedVersion now reads ATMOS_USE_VERSION, with precedence ATMOS_VERSION_USE > ATMOS_USE_VERSION > ATMOS_VERSION > version.use.
cmd/root.go also honors ATMOS_USE_VERSION from the environment so version-management commands (e.g. atmos version) re-exec on it just like the --use-version flag.
Add a table case and a precedence test covering the new behavior.

why

ATMOS_USE_VERSION is advertised as the primary env var (docs at website/docs/cli/environment-variables.mdx and the flag binding WithEnvVars("use-version", "ATMOS_USE_VERSION")), but the re-exec resolver never read it — it only checked the internal ATMOS_VERSION_USE (set solely by the CLI flag), the ATMOS_VERSION alias, and version.use config.
An env-populated flag is not marked Changed() and maps to viper key use-version rather than version.use, so ATMOS_USE_VERSION fell through every code path — setting it was a complete no-op.
This surfaced in CI where ATMOS_USE_VERSION was set for atmos describe affected --upload but Atmos ran the already-installed version instead of switching. This brings the code in line with the existing documentation.

references

Docs already describe the intended behavior: website/docs/cli/environment-variables.mdx

Summary by CodeRabbit

New Features
- Added support for the ATMOS_USE_VERSION environment variable as an alternative to the --use-version CLI flag.
- Updated version selection precedence to consider environment variables in the defined order.
Tests
- Extended test coverage for environment-variable-driven version selection scenarios.
Chores
- Updated NOTICE entry for a dependency license URL.

fix(auth): honor keyring.type config and send DPoP proof on AWS webflow @osterman (#2545)

what

Honor auth.keyring.type from atmos.yaml across all auth-manager entrypoints by threading authConfig into credentials.NewCredentialStoreWithConfig(...) (was silently dropped via the no-arg NewCredentialStore()), and inject the manager's config-aware store into AWS user identities via a new optional SetCredentialStore interface.
Add an RFC 9449 DPoP proof (EC P-256 / ES256, stdlib-only) to the AWS browser webflow token requests; generate the key per session, persist it in the refresh-token cache, and reuse it on refresh (a cache without a key falls back to the browser flow).
Add AuthManager.CredentialStoreType() for observability/testability, mark the no-arg NewCredentialStore() constructor Deprecated, and add unit tests for both fixes (keyring backend selection, DPoP proof structure/signature, key round-trip, header presence).

why

#2544: with auth.keyring.type: memory set, Atmos still selected the default system keyring and hung indefinitely on hosts where the keyring service is present but unusable (e.g. a locked gnome-keyring-daemon). The config value was read and then thrown away before backend selection — only ATMOS_KEYRING_TYPE worked. Now the configured backend is honored everywhere an auth manager is built.
#2542: AWS sign-in's /v1/token endpoint now rejects requests without a DPoP proof (HTTP 400 INVALID_REQUEST), so browser-based authentication for aws/user identities failed at the code-exchange step. Sending the proof restores the flow; because the public-client refresh token is bound to the DPoP key, the key is persisted and reused on refresh.

references

closes #2542
closes #2544
RFC 9449 (DPoP): https://datatracker.ietf.org/doc/html/rfc9449

Summary by CodeRabbit

New Features
- Added RFC 9449 DPoP support for AWS OAuth token exchanges to strengthen token binding.
- Auth now respects configured keyring backend across authentication flows.
Bug Fixes
- Fixed AWS token parsing to match real-world snake_case responses.
Improvements
- Auth manager exposes credential store backend type for easier diagnostics.

fix(yaml-functions): honor init.pass_vars when resolving !terraform.output (#1412) @thejrose1984 (#2548)

what

When components.terraform.init.pass_vars: true is set, forward the component's vars to the internal terraform init that runs while resolving !terraform.output, via TF_VAR_* environment variables.

ComponentConfig gains PassVars + Vars, populated in ExtractComponentConfig.
SetupEnvironment injects TF_VAR_* for each var when PassVars is true (strings verbatim, other types JSON-encoded).
Regression tests cover the enabled path (string/number/bool/list), the disabled default, and env-section precedence.

why

Closes #1412.

The main terraform path honors pass_vars by passing -var-file to init (terraform_execute_helpers.go), so modules with init-time variable dependencies (e.g. a module version/source bound to var.aks_version) can initialize. But the init that runs while resolving !terraform.output goes through pkg/terraform/output, which uses the terraform-exec library and never honored pass_vars:

runInit only set Upgrade(false) + optional Reconfigure(true).
ComponentConfig had no PassVars/vars plumbing.
terraform-exec's initConfig has no var-file field — it structurally cannot pass -var-file to init.

So atmos tofu init/plan -s <stack> failed with Unable to compute static value / module.aks.version depends on var.aks_version which is not available whenever an init-time var came from a component resolved via !terraform.output.

Why TF_VAR_* rather than a var-file

terraform-exec can't attach a var-file to init, and an auto-loaded *.auto.tfvars.json on disk would risk cross-stack contamination when components are resolved concurrently. TF_VAR_* is process/runner-scoped, reaches init transparently through the existing SetEnv call, and Terraform/OpenTofu accept these values for the matching variable types (JSON encodings of lists/maps are valid HCL2). Gated behind pass_vars (default false), so it's a no-op unless opted in; an explicit TF_VAR_* in the component env section still wins.

references

Closes #1412

test plan

go test ./pkg/terraform/output/...

New tests:

TestDefaultEnvironmentSetup_PassVars — vars exported as TF_VAR_* with correct encoding.
TestDefaultEnvironmentSetup_PassVarsDisabled — no TF_VAR_* when pass_vars is off.
TestDefaultEnvironmentSetup_PassVarsEnvSectionWins — explicit env-section TF_VAR_* wins.

Validation note: verified at the unit level (init env now carries the component vars when pass_vars is set; previously the init invocation was unchanged whether pass_vars was on or off). I don't have terraform/tofu in this environment to re-run the reporter's full tofu init/plan end-to-end, so a maintainer check against a real init-time-dependent module would be worthwhile before release.

Summary by CodeRabbit

New Features
- Added an option to forward component variables as TF_VAR_* environment entries during Terraform/OpenTofu init; existing TF_VAR_* values are preserved and non-string values are JSON-encoded.
Tests
- Added tests for enabled/disabled forwarding, JSON encoding of non-strings, precedence of explicit env values, and end-to-end propagation to the runner env.
Documentation
- Docs updated to note init.pass_vars also applies to implicit init runs and how forwarded vars are presented as TF_VAR_*.

test(yaml-functions): regression test for mixed state/output circular dependency (#2005) @thejrose1984 (#2547)

what

Add a regression test and fixture for a cross-component circular dependency that mixes !terraform.state and !terraform.output (component-a → !terraform.state component-b; component-b → !terraform.output component-a).
New fixture: tests/fixtures/scenarios/yaml-functions-circular-deps-mixed.
New test: TestYAMLFunctionsCrossComponentCycleMixed.

why

This is the exact scenario from #2005. It was the same root cause as #2457 and was fixed by #2533 (making ProcessCustomYamlTags reuse the goroutine-local ResolutionContext so the Visited map survives nested walks). That fix covers both state↔state and the mixed state↔output path, but only the state↔state case had a regression test — so #2005 could silently regress while the existing test stayed green.

Verified the mixed cycle hangs (infinite recursion / goroutine stack overflow) on the commit before #2533, and returns a clean ErrCircularDependency on current main.

references

Closes #2005
Follow-up to #2533 / #2457

test plan

go test ./tests -run TestYAMLFunctionsCrossComponentCycle -v

Both TestYAMLFunctionsCrossComponentCycle (state↔state) and TestYAMLFunctionsCrossComponentCycleMixed (state↔output) pass. The mixed test asserts ErrCircularDependency is returned and that the MaxResolutionDepth safety net is not what fired (which would indicate the primary cycle detector regressed).

fix: defer custom-command/built-in collision warning to invocation time @thejrose1984 (#2549)

what

Scope is intentionally narrow: change only when the existing collision warning fires — defer it from command-registration time to the moment the conflicting command is actually invoked.

No change to collision behavior: the built-in still wins and custom steps are still ignored.
No override:/invoke: work — that opt-in design is tracked separately in the custom-command-builtin-override PRD.
Implemented by wrapping the conflicting built-in command's PreRunE in processCustomCommands (preserving any existing PreRunE/PreRun and honoring Cobra's precedence of PreRunE over PreRun).
Adds a regression test asserting the warning is absent at registration and present (exactly once) on invocation.

why

Today the warning (introduced in #2191) is emitted from processCustomCommands, which runs during root init on every Atmos invocation. So a single colliding custom command makes every command — atmos list stacks, atmos terraform ..., etc. — print a warning about, say, a plan collision it never touched. The result is worse than noisy:

It's misleading — the warning points at a command the user didn't run.
It breaks scripting/CI that reads stderr, since every command (except version) emits it.

Deferring the warning to invocation makes it accurate and actionable: it appears exactly once, only when you run the command the warning is actually about, and stderr stays clean for every other command. Same information, delivered at the moment it's relevant instead of on every unrelated call.

Behavior

Invocation	Before	After
`atmos list stacks` (with a colliding custom `plan`)	⚠ warning printed	no warning
`atmos <colliding command>`	⚠ warning printed (and also for every other command)	⚠ warning printed once, here only

references

Refs #2102
Related: #2191 (introduced the collision guard / warning)

test

go test ./cmd/ -run 'TestCustomCommand_.*Collision|TestCustomCommand_StepsConflictWarning|TestCustomCommand_NamespaceMerge|TestCustomCommand_DeepNesting'

Verified the new test fails against the previous (emit-at-registration) behavior and passes with the fix.

Summary by CodeRabbit

Bug Fixes
- Collision warnings for custom commands that overlap built-in leaf commands are now deferred until the conflicting command is invoked, reducing startup noise and preserving existing pre-run error behavior.
Tests
- Added regression tests to verify deferred warnings are emitted exactly once on invocation and that existing pre-run behavior and error propagation remain intact; tests skip on Windows where stderr capture is unreliable.

fix(flags): register --settings-list-merge-strategy as a global flag (#2398) @thejrose1984 (#2540)

what

Register --settings-list-merge-strategy as a global persistent flag on RootCmd, with env binding to ATMOS_SETTINGS_LIST_MERGE_STRATEGY.
Add a Cobra-direct fallback in ProcessCommandLineArgs so the value reaches ConfigAndStacksInfo even when Cobra strips the flag from RunE's args.
In setSettingsConfig, scan os.Args (mirroring setLogConfig's parseFlags() pattern) so command paths that call InitCliConfig directly with a zero-value ConfigAndStacksInfo (e.g. describe config) still honor the flag.
Unit test the registration, inheritance, defaults, CLI value, and env-var path.

why

The flag is advertised in two places:

atmos.yaml:344 — "Can also be set using 'ATMOS_SETTINGS_LIST_MERGE_STRATEGY' environment variable, or '--settings-list-merge-strategy' command-line argument"
website/docs/cli/configuration/settings/settings.mdx:54

And Atmos's internal arg/flag layer already expects it:

pkg/config/const.go:147 — SettingsListMergeStrategyFlag = \"--settings-list-merge-strategy\"
internal/exec/cli_utils.go:72 — listed in commonFlags
internal/exec/cli_utils.go:495 — string-flag handler that writes info.SettingsListMergeStrategy
pkg/config/utils.go:726 — applies it onto atmosConfig.Settings.ListMergeStrategy

But it was never registered with Cobra at the global level. Subcommands that don't whitelist unknown flags (e.g. terraform plan, which has no FParseErrWhitelist) rejected the flag before the legacy commonFlags post-processing ever ran:

$ atmos --settings-list-merge-strategy=append terraform plan vpc -s test
Error: unknown flag --settings-list-merge-strategy for command atmos terraform plan

references

Closes #2398

test plan

Unit tests added in pkg/flags/global_registry_test.go:

flag is registered on RootCmd as persistent
defaults to empty string
CLI flag value flows through Viper
ATMOS_SETTINGS_LIST_MERGE_STRATEGY env var flows through Viper
subcommand inherits the persistent flag

End-to-end verification on a minimal project (atmos.yaml has settings.list_merge_strategy: replace):

Invocation	`list_merge_strategy`
`atmos describe config`	`replace` (baseline from `atmos.yaml`)
`atmos --settings-list-merge-strategy=append describe config`	`append`
`atmos describe config --settings-list-merge-strategy=merge`	`merge`
`ATMOS_SETTINGS_LIST_MERGE_STRATEGY=append atmos describe config`	`append`

atmos --help now lists --settings-list-merge-strategy.

Full test suites pass for the touched packages:

ok  github.com/cloudposse/atmos/pkg/flags
ok  github.com/cloudposse/atmos/pkg/flags/global
ok  github.com/cloudposse/atmos/pkg/config
ok  github.com/cloudposse/atmos/internal/exec

Summary by CodeRabbit

New Features
- Added --settings-list-merge-strategy CLI flag (replace, append, merge) and ATMOS_SETTINGS_LIST_MERGE_STRATEGY env var to override list-merge behavior for an invocation
Documentation
- Documented the new flag and environment variable with usage and defaults
Tests
- Updated CLI help snapshots to include the new flag and refreshed help text formatting across commands

Fix templated store hook execution @osterman (#2539)

what

Render hook execution fields only after a hook matches the current event and skip filters.
Preserve static hook discovery/preflight while supporting !template and bare Go templates in store hook names, output keys, and output values.
Add regression tests for templated store hooks and non-matching hooks with invalid execution-only templates.

why

Fixes a regression where templated store-outputs.name values were used literally, causing store lookup failures.
Keeps pre-auth hook discovery safe while allowing execution-time hook fields to use the fully available component context.
Prevents future regressions for both YAML function and bare Go template forms.

references

Closes #2538

Summary by CodeRabbit

New Features
- Hooks now resolve execution-time templates and custom YAML functions, supporting nested templating, rendering into hook execution fields, stronger type validation, and clearer hook-specific error messages.
Tests
- Added tests for template rendering, YAML-function evaluation, nested value processing, error cases, and store-hook execution behavior.

fix(auth): normalize override keys to uppercase in filterAtmosOverrides (#2349) @thejrose1984 (#2541)

what

Uppercase the override key before the prefix check (and in the returned map) inside pkg/auth/manager_env_overrides.go:filterAtmosOverrides.
Add regression test cases in TestFilterAtmosOverrides covering Viper-lowercased keys, mixed-case keys, and mixed atmos/non-atmos casings.

why

filterAtmosOverrides did a case-sensitive strings.HasPrefix(k, \"ATMOS_\"). The function's documented contract was "only keys with the ATMOS_* prefix" — but in production the only realistic source of its input map is an MCP server env: block in atmos.yaml / .atmos.d/mcp.yaml, which Viper loads with all map keys lowercased.

This is the same Viper-lowercasing pitfall already documented and handled on a sibling code path by pkg/mcp/client/mcpconfig.go:copyEnv (the CLI-provider pass-through that writes config files for Claude Code / Codex / Gemini). That fix wasn't applied to the auth code path, so an authored:

mcp:
  servers:
    atmos:
      command: atmos
      args: [\"mcp\", \"start\"]
      env:
        ATMOS_PROFILE: managers
      identity: core-root/terraform

reached filterAtmosOverrides as {\"atmos_profile\": \"managers\"}, was silently dropped, and the auth manager was rebuilt against the default profile. Identity resolution then surfaced as:

✗ Server failed to start
   Error: MCP server failed to start: atmos: auth setup failed for \"atmos\": identity not found: core-root/terraform

I confirmed Viper's lowercasing end-to-end against the actual schema.MCPServerConfig shape (Env map[string]string):

env key=\"atmos_profile\" value=\"managers\"
env key=\"aws_region\"    value=\"us-east-1\"

— so the authored ATMOS_PROFILE is gone by the time the filter runs.

scope of behavior change

Already-uppercase callers (ATMOS_PROFILE): unchanged.
Previously-dropped lowercase/mixed-case callers (atmos_profile, Atmos_Profile): now honored — and those are exactly the users hitting the documented bug.
Non-ATMOS_* keys: still dropped, regardless of case (aws_profile, FOO, foo).
Existing TestFilterAtmosOverrides cases still pass unchanged.
Existing TestCreateAndAuthenticateManagerWithEnvOverrides_* tests still pass unchanged.

alternatives considered

I weighed three fix locations on the original issue:

Uppercase inside filterAtmosOverrides (this PR). Smallest possible surface, single source of truth for the auth path, doesn't touch the MCP layer.
copyEnv (or equivalent) inside ScopedAuthProvider.ForServer. Localizes to the MCP adapter; downside is a future non-MCP consumer of CreateAndAuthenticateManagerWithEnvOverrides that loads its env map from YAML would hit the same trap.
Uppercase at ParseConfig time. Widest reach — would also affect subprocess env propagation. A real (if narrow) behavior change for users who deliberately set unconventionally-cased env vars in env: and expected those passed to the spawned MCP server verbatim.

Option 1 fixes the documented case without altering any other code path's behavior or risking the subprocess-env corner case in Option 3.

references

Closes #2349
Related context: pkg/mcp/client/mcpconfig.go:128 (copyEnv) — the parallel fix on the CLI-provider pass-through path that documents the Viper-lowercasing trap.

test plan

go test ./pkg/auth -run 'TestFilterAtmosOverrides|TestCreateAndAuthenticateManagerWithEnvOverrides' -v
go test ./pkg/auth ./pkg/mcp/client/...

Both pass. New regression subtests:

viper-lowercased atmos key is normalized to uppercase
mixed-case atmos key is normalized to uppercase
viper-lowercased non-atmos key is dropped
mixed casings across atmos and non-atmos keys

Summary by CodeRabbit

Bug Fixes

Fixed an issue where environment configuration overrides specified in lowercase format (from YAML configuration files) were incorrectly dropped during processing. Environment override keys are now properly normalized to ensure consistent handling regardless of the input format used.