cloudposse/atmos v1.208.0 on GitHub

Exclude unsupported windows/arm from goreleaser build matrix @goruha (#2133)

what

Add ignore rule to the shared goreleaser config (.github/goreleaser.yml) to exclude the windows/arm (32-bit ARM) build target
Prevents "unsupported GOOS/GOARCH pair windows/arm" build failures for any org repo using Go 1.24+

why

Go 1.24 (February 2025) deprecated the windows/arm port, and Go 1.25+ removed it entirely. Any repo that upgrades past Go 1.23 and uses this shared goreleaser config will fail during the release build after spending ~33 minutes compiling the other 13 targets
The ignore rule is harmless for repos still on Go < 1.24 — goreleaser simply skips a target that would otherwise build successfully. No binaries are lost for any currently-supported platform
windows/arm (32-bit ARM on Windows) had negligible real-world usage — Windows on ARM devices run 64-bit Windows 11 (windows/arm64), which remains supported

references

Go 1.24 release notes — Ports: "Go 1.24 is the last release that supports building for 386 and arm GOOS targets on Windows"
Go issue #67001: Remove windows/arm port
GoReleaser ignore docs: filtering unsupported GOOS/GOARCH pairs
cloudposse/.github#246

Summary by CodeRabbit

Chores
- Excluded Windows ARM 32-bit builds from release distribution.

Add AI Agent Skills for LLM-Powered Infrastructure Development @aknysh (#2121)

what

Added 21 AI agent skills following the Agent Skills Open Standard and the AGENTS.md standard (Linux Foundation AAIF)
Skills packaged as a single Claude Code plugin (atmos@cloudposse) -- one install command, all 21 skills
Added Claude Code plugin marketplace manifest (.claude-plugin/marketplace.json) and plugin manifest (agent-skills/.claude-plugin/plugin.json)
Added AGENTS.md skill-activation router for cross-tool compatibility (Codex, Gemini, Cursor, Windsurf, Copilot)
Added 21 .claude/skills/ symlinks for contributor auto-discovery when working in the Atmos repo
Added website documentation at website/docs/integrations/ai/agent-skills.mdx (skill reference) and website/docs/projects/setup-editor/ai-assistants.mdx (tool setup)
Added blog post at website/blog/2026-02-27-ai-agent-skills.mdx
Added PRD at docs/prd/atmos-agent-skills.md
Added CI workflow (.github/workflows/validate-agent-skills.yml) to validate skill structure, size limits, frontmatter, and code fence tags
Updated roadmap and sidebars

Skills (21 total, 1 plugin)

Each skill follows a 3-tier progressive disclosure pattern: AGENTS.md router → SKILL.md instructions → references/*.md deep dives.

All 21 skills live in a flat agent-skills/skills/ directory:

atmos-ansible, atmos-auth, atmos-components, atmos-config, atmos-custom-commands, atmos-design-patterns, atmos-devcontainer, atmos-gitops, atmos-helmfile, atmos-introspection, atmos-packer, atmos-schemas, atmos-stacks, atmos-stores, atmos-templates, atmos-terraform, atmos-toolchain, atmos-validation, atmos-vendoring, atmos-workflows, atmos-yaml-functions

Claude Code Plugin Marketplace

Install with two commands:

/plugin marketplace add cloudposse/atmos
/plugin install atmos@cloudposse

Team auto-discovery via .claude/settings.json:

{
  "enabledPlugins": {
    "atmos@cloudposse": true
  }
}

Other AI Tools

For Gemini CLI, OpenAI Codex, Cursor, Windsurf, and GitHub Copilot, use Atmos vendoring:

# vendor.yaml
apiVersion: atmos/v1
kind: AtmosVendorConfig
metadata:
  name: atmos-agent-skills
  description: Vendor Atmos AI agent skills
spec:
  sources:
    - component: "agent-skills"
      source: "github.com/cloudposse/atmos.git//agent-skills?ref={{.Version}}"
      version: "main"
      targets:
        - "agent-skills"

atmos vendor pull --component agent-skills

Open Standards

Built on two open standards:

AGENTS.md -- Cross-tool instruction file (OpenAI, Google, Cursor, Linux Foundation AAIF)
Agent Skills -- Skill packaging format (Anthropic, Microsoft, OpenAI, GitHub)

why

AI coding assistants need domain-specific context to generate correct Atmos configurations. Without skills, they guess at YAML format, use wrong CLI flags, and miss Atmos patterns like deep merging, abstract components, and YAML functions. Skills provide structured, up-to-date knowledge directly in the repository so AI tools generate accurate guidance.

references

Agent Skills Specification
AGENTS.md Standard
PRD: docs/prd/atmos-agent-skills.md
Documentation: AI Agent Skills | Configure AI Assistants

feat: add `!aws.organization_id` YAML function @aknysh (#2117)

what

Add a new !aws.organization_id YAML function that retrieves the AWS Organization ID by calling the AWS Organizations DescribeOrganization API
New pkg/aws/organization/ package with Getter interface, per-auth-context caching with double-checked locking, and mock support
Full integration with Atmos Authentication — uses credentials from the active identity when available, falls back to standard AWS SDK credential resolution
Handles AWSOrganizationsNotInUseException with a clear error message when the account is not in an organization
Added ErrAwsDescribeOrganization sentinel error
Updated Go toolchain references to 1.26

why

Users need to reference the AWS Organization ID in stack configurations for governance, tagging, cross-account trust policies, and SCP scoping
Currently the organization ID must be hardcoded or retrieved through workarounds
This is the Atmos equivalent of Terragrunt's get_aws_org_id() function
Closes #2073

references

closes #2073
AWS API: DescribeOrganization
Terragrunt equivalent: get_aws_org_id()

Summary by CodeRabbit

Release Notes

New Features
- Added !aws.organization_id YAML function to retrieve AWS Organization ID from stack configurations with automatic per-invocation caching.
Chores
- Updated Go toolchain from 1.25 to 1.26.
- Updated Atmos installer version from 1.206.0 to 1.208.0.
- Updated AWS SDK and other key dependencies to latest stable versions.
Documentation
- Added comprehensive documentation and blog post for the new AWS Organization ID function, including usage examples and prerequisites.

chore: update the website with ansible support @RoseSecurity (#2116)

what

Added "Ansible" to the animated list of tools in the hero section and updated the visually hidden text for accessibility to include Ansible.
Updated the footer message to mention Ansible alongside Terraform/OpenTofu and Packer, making it clear that teams can use these tools with Atmos.

why

This pull request updates the website's homepage to highlight support for Ansible in addition to existing tools. The changes ensure Ansible is included in both the animated tool list and the page's accessibility text, as well as in the footer messaging.

Summary by CodeRabbit

New Features
- Added Ansible to the homepage's rotating featured technologies.
- Updated homepage promotional copy to mention Ansible alongside Terraform/OpenTofu and Packer.
Accessibility
- Updated visually-hidden/screen-reader text to include Ansible in the spoken description of the rotating showcase.

docs(ansible): add documentation and examples for Ansible integration @RoseSecurity (#2108)

what

This pull request introduces a comprehensive demo example, documentation updates, and test cases. The changes expand the Atmos component model to include Ansible alongside Terraform, Helmfile, and Packer, and provide users with clear guidance and examples for configuring, running, and testing Ansible playbooks through Atmos.

Added a complete demo-ansible example, including stack manifests, catalog defaults, Ansible playbook, inventory, Atmos configuration, and .gitignore entries for Ansible artifacts. This demonstrates variable handling, catalog pattern, and per-environment overrides for Ansible components. [1] [2] [3] [4] [5] [6] [7] [8]
Introduced test cases for the demo example to verify stack listing, variable resolution, and dry-run functionality for Ansible playbooks.
Updated Atmos documentation to describe Ansible component support, including configuration schema, directory structure, and usage patterns in atmos.yaml and stack manifests. [1] [2]
Revised component overview and configuration docs to include Ansible as a supported component type, updating descriptions, tables, and directory structure examples. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
Enhanced the ansible-playbook command reference with embedded example files and an interactive demo, improving clarity and real-world applicability. [1] [2] [3]\

why

These changes collectively make Ansible a first-class citizen in Atmos, providing users with practical examples, test coverage, and detailed documentation for configuration management and automation workflows.

Summary by CodeRabbit

New Features
- Ansible added as a first-class component type with playbook, inventory, vars, and dry-run support (prints command in dry-run).
Documentation
- Comprehensive Ansible docs, configuration guides, examples and CLI pages added; site navigation updated to surface Ansible content and demo.
Tests
- New test suite validating the Ansible demo workflows (dev/prod, describe, dry-run).
Chores
- Updated ignore rules to exclude CI-managed lockfiles and build artifacts.

feat(stores): add identity-based authentication for stores @aknysh (#2099)

what

Stores (!store YAML function) can now reference an Atmos auth identity via a new identity field in the store configuration
When identity is set, the store authenticates using that identity's credentials instead of the default credential chain (environment variables, default AWS profiles, etc.)
Supported for all cloud-backed store types: AWS SSM Parameter Store, Azure Key Vault, and Google Secret Manager
Redis and Artifactory stores emit a warning if identity is set (unsupported since they don't map to cloud provider identity types)
Stores with identity use lazy client initialization (sync.Once) — the cloud client is created on first Get/Set access rather than at construction time
Fully backward compatible: stores without the identity field work exactly as before

why

Previously, stores always used the default credential chain, requiring separate credential management for secrets access vs. Terraform execution
This change lets users reuse the same atmos auth identity system for both, simplifying credential management and enabling more granular access control
Lazy initialization avoids circular dependency issues: stores are registered during config loading, but auth happens later during command execution

references

Closes #2082
PRD: docs/prd/store-identity-support.md
Blog post: website/blog/2026-02-22-store-identity-support.mdx
Roadmap updated in website/src/data/roadmap.js
Example: examples/auth-stores/

Configuration example

stores:
  prod/aws-ssm:
    type: aws-ssm-parameter-store
    identity: prod-admin
    options:
      region: us-east-1

Example

See examples/auth-stores/ for a complete multi-cloud configuration showing AWS SSM, Azure Key Vault, and GCP Secret Manager stores with identity-based authentication.

Files changed

Area	Files	Description
Schema	`pkg/store/config.go`	Added `Identity` field to `StoreConfig`
Interfaces	`pkg/store/identity.go`	`AuthContextResolver`, `IdentityAwareStore`, local auth config types
Errors	`pkg/store/errors.go`	`ErrIdentityNotConfigured`, `ErrAuthContextNotAvailable`
AWS SSM	`pkg/store/aws_ssm_param_store.go`	Lazy init, `SetAuthContext`, identity-based AWS config loading
Azure KV	`pkg/store/azure_keyvault_store.go`	Lazy init, `SetAuthContext`, identity-based credential creation
GCP GSM	`pkg/store/google_secret_manager_store.go`	Lazy init, `SetAuthContext`, identity-based client creation
Registry	`pkg/store/registry.go`	Pass identity to constructors, `SetAuthContextResolver()`
Bridge	`pkg/store/authbridge/resolver.go`	Bridges store ↔ auth packages (avoids circular deps)
Wiring	`internal/exec/terraform.go`, `terraform_shell.go`	Injects resolver after auth manager creation
Tests	`pkg/store/identity_test.go`, `authbridge/resolver_test.go`, `registry_test.go`	40+ unit tests
Example	`examples/auth-stores/`	Multi-cloud auth + stores configuration example
Docs	PRD, blog post, roadmap	Feature documentation

Summary by CodeRabbit

Release Notes

New Features
- Stores (AWS SSM, Azure Key Vault, Google Secret Manager) can now authenticate using Atmos auth identities instead of default credential chains.
- Added identity field in store configuration to specify which auth identity each store should use.
Documentation
- Added comprehensive guide and examples for identity-based store authentication.
- Added blog post explaining the new store identity support feature.

🚀 Enhancements

fix: Add retry and missing workflow step properties to all schema copies @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2113)

Schema validation rejected retry field in workflow steps despite Go code supporting it since the feature was added. Users hitting validation errors when using retry with YAML anchors or inline configuration.

Changes

Added missing workflow step properties to all 6 atmos-manifest schema copies in the repository:

website/static/schemas/atmos/atmos-manifest/1.0/atmos-manifest.json (public schema)
pkg/datafetcher/schema/atmos/manifest/1.0.json (embedded schema - compiled into binary)
tests/fixtures/schemas/atmos/atmos-manifest/1.0/atmos-manifest.json (test fixtures)
examples/demo-localstack/schemas/atmos-manifest.json
examples/demo-context/schemas/atmos-manifest.json
examples/demo-helmfile/schemas/atmos-manifest.json

Properties added to workflow step definitions:

retry - Full RetryConfig spec with max_attempts, initial_delay, max_delay, backoff_strategy, multiplier, random_jitter, max_elapsed_time
working_directory - Step-level working directory override
identity - Authentication identity for step execution
env - Step-level environment variables

All properties support !include directive and follow existing schema patterns (e.g., source_retry).

Example

workflows:
  deploy:
    steps:
      - name: apply
        command: terraform apply demo
        retry:
          max_attempts: 3
          backoff_strategy: exponential
          max_elapsed_time: 5m
        working_directory: /tmp/deploy
        env:
          TF_LOG: DEBUG

Previously failed with additionalProperties 'retry' not allowed. Now validates correctly across all schema copies including the embedded schema used at runtime.

Original prompt

This section details on the original issue you should resolve

<issue_title>Schema: error with schema not recognizing retry in workflows</issue_title>
<issue_description>### Describe the Bug

Added retry to the workflows properties to out workflows and now atmos describe breaks

Expected Behavior

Retry should be added to schema so atmos doesn't blow up

Steps to Reproduce

# ============================================================
# ATMOS REPRO: `retry` field in workflow steps breaks `atmos describe stacks`
# ============================================================

# --- 0) Create isolated workspace ---
WORKDIR="$(mktemp -d -t atmos-repro-XXXXXX)"
echo "Working in: ${WORKDIR}"
cd "${WORKDIR}"

# --- 1) Write atmos.yaml ---
cat <<'EOF' > atmos.yaml
base_path: "."
components:
  terraform:
    base_path: "components/terraform"
    command: "tofu"
stacks:
  name_template: "{{ .vars.name }}"
  base_path: "stacks"
  included_paths:
    - "orgs/**/*"
workflows:
  base_path: "stacks/workflows"

# Validation schemas (for validating atmos stacks and components)
schemas:
  # https://json-schema.org
  jsonschema:
    # Can also be set using 'ATMOS_SCHEMAS_JSONSCHEMA_BASE_PATH' ENV var, or '--schemas-jsonschema-dir' command-line argument
    # Supports both absolute and relative paths
    base_path: "stacks/schemas/jsonschema"
  opa:
    # Can also be set using `ATMOS_SCHEMAS_OPA_BASE_PATH` ENV var, or `--schemas-opa-dir` command-line arguments
    # Supports both absolute and relative paths
    base_path: "stacks/schemas/opa"
  # JSON Schema to validate Atmos manifests
  atmos:
    # Can also be set using 'ATMOS_SCHEMAS_ATMOS_MANIFEST' ENV var, or '--schemas-atmos-manifest' command-line arguments
    # Supports both absolute and relative paths (relative to the `base_path` setting in `atmos.yaml`)
    manifest: "https://atmos.tools/schemas/atmos/atmos-manifest/1.0/atmos-manifest.json"
EOF

# --- 2) Write a minimal stack so describe has something to parse ---
mkdir -p stacks/orgs/demo
cat <<'EOF' > stacks/orgs/demo/demo.yaml
vars:
  name: demo
terraform:
  backend_type: local
components:
  terraform:
    demo:
      vars: {}
EOF

# --- 3) Write the workflow WITH retry (the reproducer) ---
mkdir -p stacks/workflows

cat <<'EOF' > stacks/workflows/demo.yaml
name: "demo"

x-retry: &retry-config
  max_attempts: 3
  max_elapsed_time: 120m

workflows:
  01-plan:
    description: "Plan demo"
    steps:
      - name: demo
        command: terraform plan demo
        retry: *retry-config
      - name: end
        type: shell
        command: echo "done"
EOF

# --- 4) Sanity check layout ---
echo "== tree =="
find . -maxdepth 4 -type f | sed 's|^\./||' | sort

# --- 5) The actual repro ---
echo ""
echo "== describe stacks (should succeed without retry field, breaks with it) =="
atmos describe stacks 

echo ""
echo "Done. Workspace preserved at: ${WORKDIR}"

== describe stacks (should succeed without retry field, breaks with it) ==

   Error 

   Error: Atmos manifest JSON Schema validation error in the file 'workflows/demo.yaml':
   {
   "valid": false,
   "errors": [
   {
   "keywordLocation": "",
   "absoluteKeywordLocation": "https://json.schemastore.org/atmos-manifest.json#",
   "instanceLocation": "",
   "error": "doesn't validate with https://json.schemastore.org/atmos-manifest.json#"
   },
   {
   "keywordLocation": "/properties/workflows/$ref",
   "absoluteKeywordLocation": "https://json.schemastore.org/atmos-manifest.json#/properties/workflows/$ref",
   "instanceLocation": "/workflows",
   "error": "doesn't validate with '/definitions/workflows'"
   },
   {
   "keywordLocation": "/properties/workflows/$ref/oneOf",
   "absoluteKeywordLocation": "https://json.schemastore.org/atmos-manifest.json#/definitions/workflows/oneOf",
   "instanceLocation": "/workflows",
   "error": "oneOf failed"
   },
   {
   "keywordLocation": "/properties/workflows/$ref/oneOf/0/type",
   "absoluteKeywordLocation": "https://json.schemastore.org/atmos-manifest.json#/definitions/workflows/oneOf/0/type",
   "instanceLocation": "/workflows",
   "error": "expected string, but got object"
   },
   {
   "keywordLocation": "/properties/workflows/$ref/oneOf/1/patternProperties/%5E%5B1a-zA-Z0-9-_%7B%7D.%20%5D+$/$ref",
   "absoluteKeywordLocation": "https://json.schemastore.org/atmos-manifest.json#/definitions/workflows/oneOf/1/patternPropertie
s/%5E%5B1a-zA-Z0-9-%7B%7D.%20%5D+$/$ref",                                                                                         "instanceLocation": "/workflows/01-plan",
   "error": "doesn't validate with '/definitions/workflow_manifest'"
   },
   {
   "keywordLocation": "/properties/workflows/$ref/oneOf/1/patternProperties/%5E%5B~1a-zA-Z0-9-%7B%7D.%20%5D+$/$ref/oneOf",
   "absoluteKeywordLocation": "https://json.schemastore.org/atmos-manifest.json#/definitions/workflow_manifest/oneOf",
   "instanceLocation": "/workflows/01-plan",
   "error": "oneOf failed"
   },
   {...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes cloudposse/atmos#2112

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.
</details>

<details>
  <summary>Fix: Convert toolchain paths to absolute in PATH to resolve exec.LookPath failures @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2095)</summary>
Custom command steps that invoke `atmos terraform` subcommands fail with:

exec: "terraform": cannot run executable found relative to current directory


This occurs when the toolchain feature is enabled with `.tool-versions` managing tool versions and `install_path` is configured with relative paths (e.g., `.tools`). The issue doesn't occur when running the same `atmos terraform` command directly from the shell.

**Root Cause:** The `BuildToolchainPATH()` function was constructing PATH entries using potentially relative paths (e.g., `.tools/bin/hashicorp/terraform/1.14.2`). Go 1.19+ introduced a security change in `exec.LookPath()` that rejects executables found via relative PATH entries, causing the custom commands to fail.

## Solution

Convert all toolchain paths to absolute paths before adding them to PATH in the `BuildToolchainPATH()` function. This ensures compatibility with Go 1.19+ exec.LookPath security requirements.

## Changes

**`pkg/dependencies/installer.go`:**
- Convert `binPath` to absolute path using `filepath.Abs()` before appending to PATH
- Add explanatory comments about Go 1.19+ exec.LookPath requirements
- Simplified error handling (filepath.Abs rarely fails in practice)

**`pkg/dependencies/installer_test.go`:**
- `TestBuildToolchainPATH_ConvertsRelativeToAbsolute` - Verifies relative paths (`.tools`) are converted to absolute paths
- `TestBuildToolchainPATH_WithAbsolutePath` - Tests absolute path handling  
- `TestBuildToolchainPATH_WithMultipleTools` - Tests multiple tools in one call
- `TestBuildToolchainPATH_SkipsInvalidTools` - Tests error handling for invalid tools

## Code Diff

**Before:**
```go
binPath := filepath.Join(toolsDir, "bin", owner, repo, version)
paths = append(paths, binPath)

After:

binPath := filepath.Join(toolsDir, "bin", owner, repo, version)

// Convert to absolute path to avoid Go 1.19+ exec.LookPath security issues.
// Go 1.19+ rejects executables found via relative PATH entries.
// Note: filepath.Abs rarely fails in practice; we trust it to succeed here.
absBinPath, _ := filepath.Abs(binPath)

paths = append(paths, absBinPath)

Example

Before (fails in custom commands):

PATH=.tools/bin/hashicorp/terraform/1.14.2:/usr/bin

After (works correctly):

PATH=/absolute/path/to/project/.tools/bin/hashicorp/terraform/1.14.2:/usr/bin

Test Coverage

Coverage Metrics

BuildToolchainPATH function: 100% coverage (was 91.7%)
Overall package coverage: 96.7% (was 95.7%)
Patch coverage: 100% (was 22.22%)

Tests Verify

✅ Relative paths (.tools) converted to absolute paths
✅ Absolute paths handled correctly
✅ Multiple tools processed correctly
✅ Invalid tools skipped gracefully
✅ All existing functionality maintained
✅ Backward compatibility preserved

Original prompt

This section details on the original issue you should resolve
<issue_title>Toolchain prepends relative paths to PATH, causing exec: "terraform": cannot run executable found relative to current directory in custom commands</issue_title>
<issue_description>### Describe the Bug
Custom command steps that invoke atmos terraform subcommands fail with the error:
exec: "terraform": cannot run executable found relative to current directory
This only occurs when the command is executed through a custom command step. Running the exact same atmos terraform command directly from the shell succeeds without issues.
For example, this fails:
atmos util lock <component> -s <stack>
But the underlying command it executes works fine when run directly:
atmos terraform providers lock -platform=windows_amd64 -platform=darwin_amd64 -platform=linux_amd64 <component> -s <stack>
The toolchain feature is enabled with .tool-versions managing terraform and atmos versions.
Expected Behavior

The toolchain should prepend absolute paths to PATH (e.g., /absolute/path/to/project/.tools/bin/hashicorp/terraform/1.14.2) so that subprocess invocations resolve toolchain binaries correctly regardless of working directory context.
Steps to Reproduce

Configure atmos.yaml with toolchain enabled:
    toolchain:
      file_path: ".tool-versions"
      install_path: ".tools"

    components:
      terraform:
        command: terraform
Create .tool-versions:
    atmos 1.206.2
    terraform 1.14.2
Define a custom command in .atmos.d/commands.yaml:
    commands:
      - name: util
        commands:
          - name: lock
            description: Execute 'terraform lock' command for all OS platforms
            arguments:
              - name: component
                description: Name of the component
            flags:
              - name: stack
                shorthand: s
                description: Name of the stack
                required: true
            steps:
              - atmos terraform providers lock -platform=windows_amd64 -platform=darwin_amd64 -platform=linux_amd64 {{ .Arguments.component }} -s {{ .Flags.stack }}
Run the custom command:
    atmos util lock <component> -s <stack>
Observe the error:
    Error: exec: "terraform": cannot run executable found relative to current directory
Verify by adding a debug step (echo "PATH=$PATH") — PATH shows:
    PATH=.tools/bin/cloudposse/atmos/1.206.2:.tools/bin/hashicorp/terraform/1.14.2:/usr/local/bin:...
Running the same command directly works:
    atmos terraform providers lock -platform=windows_amd64 -platform=darwin_amd64 -platform=linux_amd64 <component> -s <stack>
Screenshots

No response
Environment

Atmos version: 1.206.2
Terraform version: 1.14.2
OS: macOS (darwin/arm64)

Additional Context

The root cause is in the toolchain PATH setup logic. When atmos prepends toolchain directories to PATH for custom command execution, it uses relative paths like .tools/bin/hashicorp/terraform/1.14.2 instead of resolving them to absolute paths first. This conflicts with Go's exec.LookPath security change in Go 1.19 that rejects executables found via relative PATH entries.
Current workaround is to override PATH in the custom command step with absolute paths:
steps:
  - >-
    PATH="$(pwd)/.tools/bin/cloudposse/atmos/1.206.2:$(pwd)/.tools/bin/hashicorp/terraform/1.14.2:$PATH"
    atmos terraform providers lock ...
```</issue_description>

## Comments on the Issue (you are @copilot in this section)

<comments>
</comments>

Fixes #2089

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Fix workdir collision for component instances sharing base component @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2093)

Closes #2091

Problem

When multiple component instances share the same Terraform component (metadata.component), they cannot be applied in parallel because they all write to the same workdir. This is because the workdir path was generated using metadata.component (the base component name) instead of the component instance name.

Example scenario that was broken:

12 ElastiCache clusters, all with metadata.component: elasticache
All instances mapped to .workdir/terraform/dev-elasticache
Parallel execution caused lock contention and file conflicts

Solution

Modified the workdir path generation in pkg/provisioner/workdir/workdir.go to use atmos_component (the full component instance path) instead of metadata.component. This ensures each component instance gets its own unique workdir:

.workdir/terraform/dev-elasticache-redis-cluster-1
.workdir/terraform/dev-elasticache-redis-cluster-2
.workdir/terraform/dev-elasticache-redis-cluster-3

The fix includes:

Prioritize atmos_component for workdir path generation
Fallback to extractComponentName() for backward compatibility
Comprehensive test coverage for component instances sharing the same base component

Changes

pkg/provisioner/workdir/workdir.go: Updated component name extraction to use atmos_component first
pkg/provisioner/workdir/integration_test.go: Added TestComponentInstancesWithSameBaseComponent test

Total changes: 2 files, 118 insertions, 3 deletions

Testing

All existing workdir unit tests pass (30+ tests)
New test validates that 3 component instances sharing the same base component get unique workdirs
Verified backward compatibility with the fallback mechanism
Clean rebase on latest main

Impact

Enables parallel terraform apply for component instances like:

components:
  terraform:
    elasticache-redis-cluster-1:
      metadata:
        component: elasticache  # Base component
      provision:
        workdir:
          enabled: true

    elasticache-redis-cluster-2:
      metadata:
        component: elasticache  # Same base component
      provision:
        workdir:
          enabled: true

Before: Both instances → .workdir/terraform/dev-elasticache (shared, conflicts)
After: Unique workdirs → .workdir/terraform/dev-elasticache-redis-cluster-{1,2} (isolated)

Each instance now gets its own .terraform/ directory, lock files, and generated configs—no coordination required.

Original prompt

This section details on the original issue you should resolve

<issue_title>Parallel apply of multiple component instances sharing the same Terraform component is not possible</issue_title>
<issue_description>### Describe the Feature

Atmos cannot apply multiple component instances that share the same Terraform component (metadata.component) in parallel. All instances write to the same component source directory, causing lock contention, checksum races, and corrupted provider binaries. The existing provision.workdir.enabled feature does not solve this — it isolates by <stack>-<component>, so all instances of the same component within the same stack still share one workdir.

Expected Behavior

Running multiple atmos terraform apply commands in parallel for component instances that share the same base component should work without file conflicts. Each instance already has its own Terraform workspace and separate remote state — the only barrier is local filesystem contention that atmos should manage internally.

Use Case

We have 12 ElastiCache clusters, all referencing metadata.component: elasticache, deployed to the same stack. Each has its own Terraform workspace and separate S3 state file. Applying them sequentially is slow. They are completely independent resources with no dependencies between them — there is no reason they can't run concurrently.

This pattern is common: many instances of the same component type (N Redis clusters, N IAM roles, N S3 buckets) in a single stack, all sharing one Terraform module.

Describe Ideal Solution

Option A: The workdir path should incorporate the full component instance path, not just the base metadata.component name. The workdirs should be:

.workdir/terraform/<stack>-elasticache-redis-cluster-1
.workdir/terraform/<stack>-elasticache-redis-cluster-2
.workdir/terraform/<stack>-elasticache-redis-cluster-3

Instead of all mapping to:

.workdir/terraform/<stack>-elasticache

Option B: A built-in parallel apply mechanism:

atmos terraform apply --parallel \
  components/elasticache/redis-cluster-1 \
  components/elasticache/redis-cluster-2 \
  -s my-stack

Investigation details

Root cause analysis

When atmos runs terraform apply for a component, it writes several files to the component source directory:

.terraform/ — provider binaries, module cache, local state lock (terraform.tfstate)
.terraform.lock.hcl — provider dependency checksums
backend.tf.json — generated backend configuration
providers_override.tf.json — generated provider overrides
*.terraform.tfvars.json — generated variable files
*.planfile — plan output files

When 12 processes write to the same directory simultaneously, we observed three distinct failure modes.

Test 1: Naive parallel apply (no isolation)

for component in "${COMPONENTS[@]}"; do
  atmos terraform apply "$component" -s "$STACK" &
done
wait

Result: Most processes fail. .terraform lock file contention, provider checksum mismatches on .terraform.lock.hcl, and corrupted generated files from concurrent writes.

Test 2: `TF_DATA_DIR` isolation

TF_DATA_DIR is an official Terraform env var that redirects the .terraform directory to a custom path. We gave each parallel process its own:

for component in "${COMPONENTS[@]}"; do
  TF_DATA_DIR="/tmp/work/tf-data/$(basename "$component")" \
    atmos terraform apply "$component" -s "$STACK" &
done

Result: 7/12 succeeded, 5/12 failed. TF_DATA_DIR isolates the .terraform directory, but .terraform.lock.hcl lives in the component source directory, NOT inside .terraform. So all 12 processes still race on writing that file.

Failure mode A: provider checksum mismatch (4 failures)

Error: Required plugins are not installed

the cached package for registry.terraform.io/hashicorp/aws 6.31.0
does not match any of the checksums recorded in the dependency lock file

Process A writes checksums to .terraform.lock.hcl, process B overwrites them, then process A's cached provider no longer matches. Classic TOCTOU race.

Failure mode B: corrupt provider binary (1 failure)

Error: Failed to load plugin schemas
Could not load the schema for provider registry.terraform.io/hashicorp/aws:
failed to instantiate provider
Unrecognized remote plugin message: Failed to read any lines from plugin's stdout

Multiple processes downloaded the AWS provider to TF_PLUGIN_CACHE_DIR simultaneously. One process read a partially-written binary. The architecture check passed (darwin arm64 matches arm64) — the binary was simply incomplete.

Test 3: `TF_DATA_DIR` + `TF_PLUGIN_CACHE_DIR` + pre-init (working workaround)

export TF_PLUGIN_CACHE_DIR="/tmp/work/plugin-cache"

# Single init to populate .terraform.lock.hcl and provider cache B...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes cloudposse/atmos#2091

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for you](https://github.com/cloudposse/atmos/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

</details>

<details>
  <summary>fix(auth): propagate TTY state to subprocesses for SSO device flow in workflows @Benbentwo (#2126)</summary>

## What

Fixes `atmos auth login` failing with "AWS SSO device flow requires an interactive terminal (TTY)" when called as a step inside an atmos workflow.

## Why

When atmos runs workflow command steps via `ExecuteShellCommand`, `cmd.Stderr` is wrapped in a `MaskWriter` (for secret masking). This creates a pipe between the parent and subprocess, so the subprocess's `term.IsTerminal(os.Stderr.Fd())` returns `false` — even when the user is sitting at an interactive terminal. As a result, the SSO device auth flow's `isInteractive()` guard blocks authentication.

**Two-layer fix:**

1. **`pkg/auth/providers/aws/sso.go`** — `isInteractive()` now checks `--force-tty` / `ATMOS_FORCE_TTY` via viper before falling back to raw TTY detection. This allows any caller to explicitly signal an interactive context.

2. **`internal/exec/shell_utils.go`** — `ExecuteShellCommand` now checks whether the _parent_ process has a real TTY on stderr. When it does (and `ATMOS_FORCE_TTY` is not already set), it injects `ATMOS_FORCE_TTY=true` into the subprocess environment. This propagates the interactive context through the MaskWriter pipe automatically.

**Behavior matrix after fix:**

| Scenario | Before | After |
|---|---|---|
| Direct `atmos auth login` from terminal | ✅ works | ✅ works |
| Workflow `command: auth login` from terminal | ❌ fails | ✅ works |
| Workflow in CI/CD (no TTY) | ❌ fails (correct) | ❌ fails (correct) |
| `ATMOS_FORCE_TTY=true` set manually | ❌ ignored | ✅ respected |
| Cached SSO token exists | ✅ works | ✅ works |

## Tests

- `TestIsInteractive_ForceTTY` — verifies `force-tty` viper key enables interactive mode
- `TestEnvKeyIsSet` — table-driven tests for the `envKeyIsSet` helper (6 cases)
- `TestExecuteShellCommand` — two new subtests:
  - `ATMOS_FORCE_TTY preserved when explicitly set in step env` — user-set value is never overridden
  - `ATMOS_FORCE_TTY not auto-injected in non-TTY environment` — CI/pipe environments stay non-interactive

## References

- Root cause: `MaskWriter` wraps `os.Stderr` as a pipe in `ExecuteShellCommand`, breaking `term.IsTerminal()` in subprocess even when the parent process has a real TTY

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

* **Bug Fixes**
  * More reliable TTY detection and propagation so subprocesses (e.g., interactive auth flows) behave correctly across CI, non‑TTY, and parent‑TTY environments.
  * Prevents unintended overrides when a TTY-related environment flag is explicitly set.

* **Tests**
  * Expanded coverage validating TTY behavior and environment‑flag handling across multiple scenarios.

* **Chores**
  * Updated displayed version list content in CLI output snapshots.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
</details>

<details>
  <summary>fix(security): prevent SSRF in GitHub OIDC token URL handling (CWE-918) @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2106)</summary>
- [x] Identify SSRF vulnerability in GitHub OIDC URL handling (CWE-918, CodeQL alert #5164)
- [x] Fix SSRF in `pkg/auth/providers/github/oidc.go` - add `validateGitHubActionsURL` helper, injectable HTTP client
- [x] Fix SSRF in `pkg/auth/providers/azure/oidc.go` - add scheme + hostname validation in `fetchGitHubActionsToken`
- [x] Fix SSRF in `pkg/pro/api_client.go` - add URL validation, proper URL query manipulation, variadic client injection
- [x] Update `pkg/auth/providers/github/oidc_test.go` - use TLS server, inject client
- [x] Update `pkg/auth/providers/azure/oidc_test.go` - use TLS server, inject client
- [x] Update `pkg/pro/api_client_test.go` - use TLS server, inject client via `oidcHTTPClientOverride`
- [x] Update `pkg/pro/api_client_get_github_oidc_token_test.go` - use TLS server, https URL for network test
- [x] Add `TestValidateGitHubActionsURL` unit tests
- [x] Add `TestGetGitHubOIDCToken_URLValidation` unit tests
- [x] Apply gofmt formatting fixes
- [x] Add `TestOIDCProvider_FetchGitHubActionsToken_URLValidation` (azure) - cover http scheme and empty host error paths
- [x] Add `TestOIDCProvider_getHTTPClient` (github) - cover both branches of getHTTPClient
- [x] Add `TestOIDCProvider_Authenticate_URLValidation` (github) - cover validateGitHubActionsURL error path in requestParams
- [x] All tests pass

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

</details>

<details>
  <summary>Add retry field to workflow step schema @utafrali (#2114)</summary>
Adds `retry` field support to workflow steps with a new `workflow_retry` schema defining retry behavior (max attempts, backoff strategy, delay config). Fixes #2112.

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

**New Features**
- Workflow steps now support optional retry configuration with control over maximum attempts, initial and maximum delays, backoff strategies (constant, linear, exponential), delay multiplier, random jitter, and total elapsed time limits.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
</details>

<details>
  <summary>fix(auth): auto-detect GitHub Actions WIF with proper audience, host validation, and lazy GSM init @shirkevich (#2109)</summary>

## What

Fix GCP Workload Identity Federation (WIF) for GitHub Actions CI environments by auto-detecting the OIDC token source, constructing the correct audience, and validating token URL hosts.

### Core fixes (GitHub Actions WIF)
- **Auto-detect GitHub Actions OIDC** when `token_source` is not explicitly configured — detects `ACTIONS_ID_TOKEN_REQUEST_URL` and defaults to URL-based token fetch.
- **Auto-construct WIF audience** from `project_number`, `workload_identity_pool_id`, and `workload_identity_provider_id` so the OIDC token's `aud` claim matches what GCP STS expects. Without this, GitHub defaults to the repo owner URL (e.g., `https://github.com/org`), which STS rejects.
- **Validate env-sourced OIDC URLs** against a GitHub Actions host whitelist (`*.actions.githubusercontent.com`) instead of skipping host validation entirely.
- **Fail fast on empty audience** when running in GitHub Actions with incomplete WIF config, instead of hitting a cryptic STS audience mismatch error downstream.

### Supporting fixes
- **Defer GSM client creation** to first use via `ensureClient()` instead of eagerly creating it in the constructor — solves the chicken-and-egg problem where store init during config loading happens before auth sets credentials.
- **Treat ADC `client_id`/`client_secret` as an atomic pair** — custom ID without matching secret falls back to the full default pair to prevent `invalid_client` errors.
- **Drop unused error return** from `resolveADCClientCredentials()` (always returns a valid pair).
- **Add `gitleaks:allow`** marker to public gcloud OAuth secret.
- **Add `//nolint:forbidigo`** to `os.Getenv` calls in auth code that runs before Viper/flags are wired.
- **Add missing docstrings and `perf.Track`** calls per project conventions.
- **Make tests hermetic** — replace external GitHub URL dependency with `httptest` servers; remove redundant `os.Unsetenv` calls.

## Why

GCP WIF authentication was failing in GitHub Actions CI with three sequential errors:
1. `token_source not configured` — nil token source with no auto-detection
2. `token URL host not allowed` — GitHub Actions uses dynamic subdomains
3. `audience mismatch` — OIDC `aud` claim defaulted to repo owner URL instead of WIF provider resource name

## References
- Tested in CI via pre-release binaries on `shirkevich/atmos`
</details>

<details>
  <summary>fix: resolve !include failure in describe affected for base-ref stacks @aknysh (#2100)</summary>

## what

- Fix `!include` YAML function failing when `atmos describe affected` processes base-ref stacks
- Save and restore `BasePath` and `BasePathAbsolute` in `executeDescribeAffected()` when switching context to the remote repo
- Update `findLocalFile` to prefer `BasePathAbsolute` over `BasePath` for reliable path resolution
- Add comprehensive unit and integration tests for `!include` with `describe affected`
- Expand `!include` integration test coverage for `!include.raw`, `.txt`, `.tf`, extensionless files, and advanced YQ expressions

## why

- `atmos describe affected` compares HEAD and BASE stacks by temporarily pointing `atmosConfig` paths to the base-ref checkout. It saved/restored 5 path fields but missed `BasePath` and `BasePathAbsolute`, causing `!include` to fail when resolving files relative to the base path in the remote repo
- `findLocalFile` used `BasePath` (which can be a relative path like `"./"`) instead of `BasePathAbsolute`, causing resolution failures when the CWD differs from the repo root
- The first `!include` in a file could succeed (file exists in CWD) while the second fails (file only exists in the PR branch), making the bug appear intermittent

## references

- Closes #2090
- Fix documentation: `docs/fixes/2026-02-22-describe-affected-include-basepath.md`

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

* **Bug Fixes**
  * Fixes include resolution during base-ref comparisons so manifest includes resolve correctly across HEAD and BASE, avoiding missing-file errors and incorrect path lookups.

* **Tests**
  * Adds extensive unit and integration tests covering base-path handling, multi-include sequences, raw includes, cross-repo/base-ref scenarios, and extended include cases.

* **Documentation**
  * Clarifies ordered resolution for !include and !include.raw with examples and guidance on manifest-relative vs base_path-relative paths.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
</details>

<details>
  <summary>fix(security): replace math/rand with math/rand/v2 to resolve CWE-338 @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2105)</summary>
CodeQL alert 5157 flags `math/rand.New(rand.NewSource(time.Now().UnixNano()))` in `pkg/retry/retry.go` as CWE-338 (Weak PRNG) — the seed is fully predictable from system time.

## Changes

- **`pkg/retry/retry.go`**: Replace `math/rand` (time-seeded) with `math/rand/v2` global functions, which are automatically seeded with cryptographically random entropy by the Go 1.22+ runtime
  - Drop `rand *rand.Rand` field from `Executor` struct
  - Remove explicit `rand.New(rand.NewSource(time.Now().UnixNano()))` initialization
  - Call `rand.Float64()` directly (global, auto-seeded) instead of via `e.rand`

```go
// Before
type Executor struct {
    config schema.RetryConfig
    rand   *rand.Rand  // seeded with time.Now().UnixNano()
}

// After
type Executor struct {
    config schema.RetryConfig
    // rand.Float64() from math/rand/v2 — globally auto-seeded
}

Jitter behavior is unchanged; only the entropy source improves.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Fix terraform shell with JIT source vendoring + workdir @[copilot-swe-agent[bot]](https://github.com/apps/copilot-swe-agent) (#2111)

Test code in `pkg/provisioner/workdir/workdir_test.go` used spaces instead of tabs for indentation, bypassing precommit formatting checks.

Changes

Applied gofumpt formatting to fix indentation in test table structs
Corrected tab/space inconsistencies throughout the test file

The formatting issues affected the TestHasSource and related test functions where table-driven tests were using space indentation instead of the Go standard tab indentation.

Original prompt

This section details on the original issue you should resolve
<issue_title>atmos terraform shell does not work with JIT Vendoring</issue_title>
<issue_description>### Describe the Bug
I updated to Atmos 1.207.0
Getting this error when using JIT vendoring, which I think was partially worked on here? #2054
Extra context

this same stack works with an atmos terraform apply...
I've tried this with both a populated .workdir and empty workdir

Expected Behavior

It should drop you into the component after hydration of the backend.tf and varfiles and tf initialization
Steps to Reproduce
# Minimal repro: Atmos config + one stack using a local backend.

WORKDIR="$(mktemp -d -t atmos-repro-XXXXXX)"
echo "Working in: ${WORKDIR}"
cd "${WORKDIR}"

# atmos.yaml
cat <<'EOF' > atmos.yaml
base_path: "."

components:
  terraform:
    base_path: "components/terraform" # Path to your Terraform components
    command: "tofu" # Explicitly setting OpenTofu binary
    workspaces_enabled: true
    apply_auto_approve: false
    deploy_run_init: true
    init_run_reconfigure: true
    # Enable automatic generation of backend.tf files
    auto_generate_backend_file: true

stacks:
  name_template: "{{ .vars.name }}"
  base_path: "stacks"
  included_paths:
    - "**/*"
EOF

# stacks/demo.yaml (includes local backend)
mkdir -p stacks
cat <<'EOF' > stacks/demo.yaml
vars:
  name: demo

terraform:
  backend_type: local

components:
  terraform:
    demo:
      source:
        uri: "git::github.com/cloudposse/terraform-aws-ecs-codepipeline"
        version: "main"
      provision:
        workdir:
          enabled: true
EOF


echo
echo "== discovered stacks =="
atmos describe stacks

echo
echo "== shell (drops into the component workdir) =="
echo "this will fail"
atmos terraform shell demo -s demo

echo
echo "== generate varfile =="
atmos terraform generate varfile demo -s demo



echo "Done. Workspace preserved at: ${WORKDIR}"
echo "To clean up, run: rm -rf ${WORKDIR}"
Error
atmos terraform shell demo -s demo
ℹ Provisioning workdir for component 'demo'

   Error 

   Error: provisioner failed: provisioner failed: workdir provisioning failed
  
  ## Explanation

   provisioner execution failed before terraform shell
  
  ## Hints

   💡 Check that the component exists at components/terraform/demo
Screenshots

No response
Environment

Atmos version 1.207.0
Additional Context

No response</issue_description>
<agent_instructions>Please confirm if we handle source provisioning with atmos terraform shell together with workdir as well.</agent_instructions>
Comments on the Issue (you are @copilot in this section)
@osterman Aha, we probably did not test this. Will kick off an investigation.

Fixes #2110

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

fix(toolchain): resolve install failures for checkov, trivy, yq, and jq @osterman (#2107)

Summary

Fix 5 bugs in Aqua registry parsing and installer download logic that caused atmos toolchain install to fail for checkov, trivy, yq, and jq (only terraform worked)
Add regression tests using real-world Aqua registry YAML for all 4 failing tools
Rename findToolInLegacyRegistries → findToolInBuiltinRegistries

Root Causes

versionOverride.Files missing Src field — checkov needs src: dist/checkov to locate the binary inside the archive. The anonymous struct only had Name.
findMatchingOverride not stripping version prefix before semver — jq uses version_prefix: "jq-", so semver.NewVersion("jq-1.8.1") failed, silently skipping all semver constraints.
buildAssetTemplateData ignoring tool.Replacements — OS/Arch values like amd64 weren't mapped to tool-specific values (X86_64 for checkov, 64bit for trivy, macos for jq).
tryFallbackVersion hardcoded "v" prefix — jq fallback produced "vjq-1.8.1" instead of stripping "jq-" prefix.
versionOverride struct missing Aqua fields — rosetta2, windows_arm_emulation, no_asset, supported_envs, checksum, version_prefix were silently dropped during YAML parsing.

Summary by CodeRabbit

New Features
- Per-tool version prefixes, richer registry overrides (per-env, per-OS format), Rosetta2 and Windows ARM emulation fallbacks, improved asset/binary resolution, two-pass asset template rendering, and optional GitHub-tag based version lookup.
Bug Fixes
- More robust version-fallback and download fallback behavior; platform-matching and env-based overrides handle prefixed versions and emulation cases correctly.
Tests
- Extensive new/regression tests for version fallback, constraint evaluation, templating, platform/env overrides, caching, and file-extension handling.
Docs/UX
- Improved GitHub rate-limit error messaging with actionable hints.

fix(vendor): show descriptive explanation instead of bare count in error @osterman (#2103)

what

Vendor pull failures now show descriptive error explanations listing failed component names instead of just a bare integer count
When 1 of 7 components fails, the error now shows "Failed to vendor 1 of 7 components: my-vpc" in the explanation section instead of just "1"
Added tracking of failed package names in the model and proper error builder usage with WithExplanation()

why

Error messages showed "## Explanation\n1" which was confusing and unhelpful. Users couldn't tell which component failed or how many total components were being vendored. The fix uses the error builder pattern to provide meaningful context about what went wrong.

references

Implements error handling best practices from Atmos error builder pattern
Adds regression tests to prevent future occurrences of bare integer explanations

Summary by CodeRabbit

Bug Fixes
- Vendoring failure messages now include a clear explanation and an explicit list/count of which components failed for easier troubleshooting.
- No changes to public APIs or other user-facing behavior beyond improved messaging.
Tests
- Added tests to ensure error formatting and failed-component tracking behave as expected.

fix: YAML functions fail with templated args when custom delimiters include quotes @aknysh (#2098)

what

Fixed a bug where all YAML functions (!terraform.state, !terraform.output, !store, !store.get, !env, !exec, !include, !include.raw, !random, etc.) fail with yaml: line NNN: did not find expected key when custom template delimiters contain single-quote characters (e.g., ["'{{", "}}'"])
Added ConvertToYAMLPreservingDelimiters function that detects delimiter/YAML quoting conflicts and forces double-quoted YAML style for affected scalar values, preserving delimiter characters literally in the serialized output
Updated all 5 template processing call sites (internal/exec/utils.go, internal/exec/describe_stacks.go (3 places), internal/exec/terraform_generate_varfiles.go, internal/exec/terraform_generate_backends.go) to use the new function

why

When custom delimiters include single quotes, the yaml.v3 encoder's single-quoted style escapes internal ' as ''. For example, !terraform.state vpc '{{ .stack }}' vpc_id becomes '!terraform.state vpc ''{{ .stack }}'' vpc_id'
The Go template engine with delimiters '{{ and }}' then finds '{{ within the ''{{ sequence and performs template replacement, producing '!terraform.state vpc 'nonprod' vpc_id' — which is invalid YAML because unescaped single quotes break the single-quoted string
The fix uses double-quoted YAML style instead, which does not escape single quotes: "!terraform.state vpc '{{ .stack }}' vpc_id". After template replacement: "!terraform.state vpc nonprod vpc_id" — valid YAML
The fix is generic at the YAML serialization level (any scalar containing ' gets double-quoted when delimiters contain '), so it automatically protects all current and future YAML functions
Static arguments and default delimiters ({{/}}) are unaffected — the function falls back to standard encoding when there is no conflict

Tests

Unit tests (pkg/utils/yaml_utils_delimiter_test.go) — 63 subtests:

TestDelimiterConflictsWithYAMLQuoting — 8 subtests for delimiter conflict detection (nil, empty, single-element, default, custom with quotes)
TestEnsureDoubleQuotedForDelimiterSafety — 6 subtests for yaml.Node style modification (scalars, mappings, sequences, document nodes, nil safety)
TestConvertToYAMLPreservingDelimiters — 10 subtests for end-to-end serialization (round-trip value preservation, template replacement producing valid YAML, standard encoding breaking with custom delimiters, nested maps, lists, custom indent, empty data)
TestAllYAMLFunctionsPreservedWithCustomDelimiters — 12 subtests verifying every YAML function prefix (!terraform.state, !terraform.output, !store, !store.get, !env, !exec, !template, !include, !include.raw, !repo-root, !cwd, !random)
TestAllYAMLFunctionsTemplateReplacementWithCustomDelimiters — 9 subtests simulating the full pipeline (serialize → template replace → parse) for each function
TestStandardEncodingBreaksAllYAMLFunctionsWithCustomDelimiters — 18 subtests proving standard encoding breaks AND delimiter-safe encoding works for each of the 9 affected functions

Integration tests (tests/yaml_functions_custom_delimiters_test.go) — 4 subtests:

TestTerraformStateWithCustomDelimiters — Regular templates with custom delimiters, !terraform.state with static args, !terraform.state with templated stack arg (core issue reproduction)
TestCustomDelimitersTemplateProcessing — Settings template resolution with custom delimiters

Test fixture (tests/fixtures/scenarios/atmos-terraform-state-custom-delimiters/):

atmos.yaml with custom delimiters ["'{{", "}}'"]
Stack file with components using both template expressions and !terraform.state with templated args
Mock Terraform component with local state file

references

closes #2052

Summary by CodeRabbit

Bug Fixes
- Fixed YAML function failures for templated arguments when custom delimiters contain quotes (resolves terraform.state-related templating issues).
Documentation
- Added comprehensive docs describing the issue, root cause, and delimiter-safe YAML behavior.
Tests
- Added extensive unit/integration tests and fixtures covering custom-delimiter templates, YAML quoting, and terraform state/output scenarios.
Chores
- Updated CLI snapshot data affecting the displayed available versions list.

cloudposse/atmos v1.208.0 on GitHub

what

why

references

Summary by CodeRabbit

what

Skills (21 total, 1 plugin)

Claude Code Plugin Marketplace

Other AI Tools

Open Standards

why

references

what

why

references

Summary by CodeRabbit

Release Notes

what

why

Summary by CodeRabbit

what

why

Summary by CodeRabbit

what

why

references

Configuration example

Example

Files changed

Summary by CodeRabbit

Release Notes

🚀 Enhancements

Changes

Example

Expected Behavior

Steps to Reproduce

Example

Test Coverage

Coverage Metrics

Tests Verify

Expected Behavior

Steps to Reproduce

Screenshots

Environment

Additional Context

Problem

Solution

Changes

Testing

Impact

Expected Behavior

Use Case

Describe Ideal Solution

Investigation details

Root cause analysis

Test 1: Naive parallel apply (no isolation)

Test 2: TF_DATA_DIR isolation

Failure mode A: provider checksum mismatch (4 failures)

Failure mode B: corrupt provider binary (1 failure)

Test 3: TF_DATA_DIR + TF_PLUGIN_CACHE_DIR + pre-init (working workaround)

Changes

Expected Behavior

Steps to Reproduce

Error

Screenshots

Environment

Additional Context

Comments on the Issue (you are @copilot in this section)

Summary

Root Causes

Summary by CodeRabbit

what

why

references

Summary by CodeRabbit

what

why

Tests

references

Summary by CodeRabbit

cloudposse/atmos v1.208.0
on GitHub

Test 2: `TF_DATA_DIR` isolation

Test 3: `TF_DATA_DIR` + `TF_PLUGIN_CACHE_DIR` + pre-init (working workaround)