github cloudposse/atmos v1.212.0

latest releases: v1.213.0-test.3, v1.213.0-test.2, v1.213.0-test.1...
7 hours ago
feat: add required field to identities for multi-account Terraform components @osterman (#2180)

what

  • Add required: true boolean field to identity config for automatic authentication without prompting
  • required and default are orthogonal: default: true sets the primary identity, required: true means auto-authenticate
  • Multiple identities can be required: true — all are authenticated before Terraform runs
  • Required non-default identities are authenticated as secondary (profiles written to shared credentials file)
  • Non-primary identity failures are non-fatal (logged as warnings)
  • Precedence for primary: --identity CLI flag > default: true identity > error
  • Align embedded schema with structured auth definitions (component_auth, auth_providers, auth_identity, etc.)
  • Fix patternProperties regex to allow slash-delimited identity names (e.g., core/network)

why

When Terraform components use multiple AWS provider aliases for multi-account patterns (hub-spoke networking, cross-account peering), each provider assumes a different IAM role. In CI with OIDC, only the primary identity's profile existed in the shared credentials file, causing "failed to get shared config profile" errors for additional provider aliases.

Marking identities as required: true ensures they are automatically authenticated — no prompting, no selection. This is cleaner than the previous auth.needs array approach because the declaration lives on the identity itself rather than in a disconnected list.

example

auth:
  identities:
    core-network:
      kind: aws/assume-role
      default: true       # Primary identity (sets AWS_PROFILE)
      required: true      # Auto-authenticate without prompting
    plat-prod:
      kind: aws/assume-role
      required: true      # Auto-authenticate as secondary
    plat-staging:
      kind: aws/assume-role
      required: true      # Auto-authenticate as secondary

references

  • Design discussion on concurrent identity authentication
  • Refactored from auth.needs array to per-identity required boolean

Summary by CodeRabbit

  • New Features

    • Mark identities as required for automatic, non-interactive authentication; default identity remains primary and CLI identity flag still takes precedence
    • Authenticate required non-primary identities as secondary (failures logged as warnings, non-fatal)
  • Configuration

    • Added component-level auth schema and new auth properties (needs, realm, integrations)
    • Provider/identity names may include slashes
  • Documentation

    • New guide and blog post explaining required identities and multi-account workflows
  • Tests

    • Added schema and unit tests covering auth and required-identity behavior
fix: resolve deep merge type-override regression and optimize describe affected --upload (250x speedup) @aknysh (#2248)

what

Two critical fixes addressing regressions and performance issues:

1. Deep merge type-mismatch regression

  • Remove overly strict type-check guards in deepMergeNative that rejected stack configs where lists are overridden with {} (empty map), scalars, or null — while overriding a list with {} is technically a misconfiguration (the correct way is []), this pattern exists in production configs and worked with the previous mergo-based merge
  • The native merge (PR #2201) added type-check guards that were stricter than the previous mergo-based merge. The previous merge allowed any type to override any other type at the same key. The native merge rejected some of these overrides (specifically list→map and list→scalar), breaking configs that relied on the previous behavior
  • Example of the misconfiguration that was broken: allow_ingress_from_vpc_accounts: {} overriding a list of maps — the correct way would be allow_ingress_from_vpc_accounts: [], but the {} pattern worked before and must not break in a patch release
  • This fix preserves backward compatibility to prevent regressions — a future release may add warnings for type-mismatched overrides to guide users toward correct patterns

2. describe affected --upload timeout on large infrastructures (~250x speedup)

  • --upload forces --include-dependents, which called ExecuteDescribeDependents for every affected component — each call did a full ExecuteDescribeStacks resolution from scratch with no caching
  • For large infrastructures with ~2,400 affected components, this resulted in ~2,400 full stack resolutions (~1s each = 40+ minutes, never completing)
  • Applied three incremental optimizations:
    • Cache ExecuteDescribeStacks result: called once instead of N times (40+ min → ~3.5 min)
    • Cache component lookup: extract component sections from cached stacks instead of calling ExecuteDescribeComponent per item (~3.5 min → ~1:54)
    • Pre-built reverse dependency index: build index once from stacks data, then O(1) lookup per component instead of O(stacks × components) scan (~1:54 → ~10s)

why

Deep merge regression

  • PR #2201 (native deep merge, 3.5x faster) introduced type-mismatch guards that are too strict for real-world Atmos configurations
  • Some production configs override inherited lists with {} (empty map) instead of [] (empty list) — while this is a misconfiguration, it worked with the previous mergo-based merge and must not break in a patch release
  • Stack-processing commands (list stacks, describe stacks, etc.) fail on affected configs

describe affected --upload timeout

  • The --upload flag is used by Atmos Pro integration to upload affected stacks
  • Large infrastructures with many components across many stacks generate thousands of affected items
  • The O(N × full_stack_resolution) cost made the command unusable, blocking CI/CD pipelines

Test results

Deep merge type-override

Test Result
11 new type-override unit tests (list→map, list→scalar, list→nil, list→bool, nested, with slice flags) All pass
Minimal stack fixture (merge-type-override) with 3 override patterns atmos list stacks succeeds
All existing merge tests (updated 6 tests from error→success expectations) All pass
Merge package coverage 92.5% overall, 100% on merge_native.go functions

describe affected --upload optimization

Metric Before After
describe affected (no dependents) ~7s ~7s (unchanged)
describe affected --include-dependents 40+ min (never completes) ~10s
Payload size (with dependents) N/A ~1.2 MB
Test Result
All 30+ existing affected/dependent tests All pass
8 new dependency index tests (build, lookup, self-reference, abstract skip, multi-stack, helmfile, edge cases) All pass
Changed function coverage: all above 80% (findDependentsByScan 95.3%, findDependentsFromIndex 100%, buildDependencyIndex 81.1%) Above threshold

references

  • PR #2201: perf: replace mergo with native deep merge (introduced the type-mismatch regression)
  • docs/fixes/2026-03-24-deep-merge-type-mismatch-regression.md: detailed analysis of the merge regression
  • docs/fixes/2026-03-24-describe-affected-upload-timeout.md: detailed analysis with incremental timing breakdown

Summary by CodeRabbit

  • Bug Fixes

    • Restored deep-merge behavior so stack overrides can replace list-typed values with maps, scalars or nulls.
    • Fixed describe-affected --upload hang/timeout by caching stack resolution and using an indexed dependent lookup.
  • Documentation

    • Added detailed pages describing both regressions, root causes, and verification steps.
  • Tests

    • Added unit and end-to-end tests covering cross-type merge overrides and dependency-index dependent resolution.
fix: describe affected crashes on deleted components when dependents enabled @milldr (#2237)

What

Skip dependent resolution for deleted components in atmos describe affected. When --include-dependents or --upload is used, deleted components are now skipped during dependent resolution instead of crashing with "invalid component".

One-line fix: Added a guard in addDependentsToAffected to skip components marked Deleted: true, giving them empty Dependents slices.

Why

When a PR removes a component or stack, atmos describe affected correctly detects it as deleted via detectDeletedComponents. However, when --include-dependents is enabled (also auto-enabled by --upload), the code attempts to resolve dependents for ALL affected items — including deleted ones. Since deleted components don't exist in HEAD, ExecuteDescribeDependentsExecuteDescribeComponentdetectComponentType fails with "invalid component".

This blocks Atmos Pro from handling deleted stacks entirely — the CLI crashes before any data reaches the API.

Reproduction:

  1. Create a PR that removes a component import from a stack YAML
  2. Run atmos describe affected --upload
  3. Crash: Error: invalid component — Could not find the component X in the stack Y

Example failures:

References

Summary by CodeRabbit

  • Bug Fixes

    • Prevented processing of dependents for deleted components, avoiding a prior crash and ensuring deleted items report no dependents.
  • Tests

    • Added test coverage confirming deleted components are marked deleted, have empty dependents, and that dependent-resolution no longer errors.
perf(merge): replace mergo pre-copy loop with reflection-free native deep merge (3.5× faster) @nitrocode (#2201)
  • All prior CR items (v1-v9) resolved in previous sessions
  • TestExecuteMainTerraformCommand_Error_Propagates (item 2 exec test)
  • TestMergeWithOptions_EmptyInputs_ReturnsEmptyMap, StrategyFlags_WireThrough (item 2 merge tests)
  • validate_stacks_test hardening (item 3)
  • Workspace recovery negative-path logging (item 4)
  • compare_mergo header with CrossValidate (item 5)
  • BenchmarkMerge_ProductionScale 10×25 keys + node_groups (item 6 code)
  • Mergo follow-up issue #2242 in blog (item 7 partial)
  • compile-guard, isTerraformCurrentWorkspace comment (item 8 partial)
  • terraform_execute_exit_wrapping_test.go — contract test for ExitCodeError wrapping
  • terraform_execute_single_invocation_test.go — spy counter via _ATMOS_TEST_COUNTER_FILE in testmain
  • testmain_test.go updated to support _ATMOS_TEST_COUNTER_FILE
  • Blog updated: 25 keys + node_groups list-of-map-of-list (item 6 blog)
  • roadmap.js: added Migrate remaining mergo call-sites milestone with issue: 2242 (item 7)

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Summary by CodeRabbit

  • New Features

    • ~3.5× faster deep-merge for stack configuration resolution (native implementation)
    • Improved Terraform workspace detection and more tolerant recovery behavior
  • Bug Fixes

    • Fixed slice-merge precedence and eliminated unintended data aliasing/corruption
    • Resolved workspace/state edge cases and clarified recovery logging
  • Tests

    • Strengthened test coverage with runtime gating, negative-path checks, and cross-validation opt‑ins
  • Documentation

    • Added detailed deep-merge blog post and a release notes/fixes page documenting behavior and migration guidance

Don't miss a new atmos release

NewReleases is sending notifications on new releases.