feat: add required field to identities for multi-account Terraform components @osterman (#2180)
what
- Add
required: trueboolean field to identity config for automatic authentication without prompting requiredanddefaultare orthogonal:default: truesets the primary identity,required: truemeans auto-authenticate- Multiple identities can be
required: true— all are authenticated before Terraform runs - Required non-default identities are authenticated as secondary (profiles written to shared credentials file)
- Non-primary identity failures are non-fatal (logged as warnings)
- Precedence for primary:
--identityCLI flag >default: trueidentity > error - Align embedded schema with structured auth definitions (
component_auth,auth_providers,auth_identity, etc.) - Fix
patternPropertiesregex to allow slash-delimited identity names (e.g.,core/network)
why
When Terraform components use multiple AWS provider aliases for multi-account patterns (hub-spoke networking, cross-account peering), each provider assumes a different IAM role. In CI with OIDC, only the primary identity's profile existed in the shared credentials file, causing "failed to get shared config profile" errors for additional provider aliases.
Marking identities as required: true ensures they are automatically authenticated — no prompting, no selection. This is cleaner than the previous auth.needs array approach because the declaration lives on the identity itself rather than in a disconnected list.
example
auth:
identities:
core-network:
kind: aws/assume-role
default: true # Primary identity (sets AWS_PROFILE)
required: true # Auto-authenticate without prompting
plat-prod:
kind: aws/assume-role
required: true # Auto-authenticate as secondary
plat-staging:
kind: aws/assume-role
required: true # Auto-authenticate as secondaryreferences
- Design discussion on concurrent identity authentication
- Refactored from
auth.needsarray to per-identityrequiredboolean
Summary by CodeRabbit
-
New Features
- Mark identities as required for automatic, non-interactive authentication; default identity remains primary and CLI identity flag still takes precedence
- Authenticate required non-primary identities as secondary (failures logged as warnings, non-fatal)
-
Configuration
- Added component-level auth schema and new auth properties (needs, realm, integrations)
- Provider/identity names may include slashes
-
Documentation
- New guide and blog post explaining required identities and multi-account workflows
-
Tests
- Added schema and unit tests covering auth and required-identity behavior
fix: resolve deep merge type-override regression and optimize describe affected --upload (250x speedup) @aknysh (#2248)
what
Two critical fixes addressing regressions and performance issues:
1. Deep merge type-mismatch regression
- Remove overly strict type-check guards in
deepMergeNativethat rejected stack configs where lists are overridden with{}(empty map), scalars, or null — while overriding a list with{}is technically a misconfiguration (the correct way is[]), this pattern exists in production configs and worked with the previous mergo-based merge - The native merge (PR #2201) added type-check guards that were stricter than the previous mergo-based merge. The previous merge allowed any type to override any other type at the same key. The native merge rejected some of these overrides (specifically list→map and list→scalar), breaking configs that relied on the previous behavior
- Example of the misconfiguration that was broken:
allow_ingress_from_vpc_accounts: {}overriding a list of maps — the correct way would beallow_ingress_from_vpc_accounts: [], but the{}pattern worked before and must not break in a patch release - This fix preserves backward compatibility to prevent regressions — a future release may add warnings for type-mismatched overrides to guide users toward correct patterns
2. describe affected --upload timeout on large infrastructures (~250x speedup)
--uploadforces--include-dependents, which calledExecuteDescribeDependentsfor every affected component — each call did a fullExecuteDescribeStacksresolution from scratch with no caching- For large infrastructures with ~2,400 affected components, this resulted in ~2,400 full stack resolutions (~1s each = 40+ minutes, never completing)
- Applied three incremental optimizations:
- Cache
ExecuteDescribeStacksresult: called once instead of N times (40+ min → ~3.5 min) - Cache component lookup: extract component sections from cached stacks instead of calling
ExecuteDescribeComponentper item (~3.5 min → ~1:54) - Pre-built reverse dependency index: build index once from stacks data, then O(1) lookup per component instead of O(stacks × components) scan (~1:54 → ~10s)
- Cache
why
Deep merge regression
- PR #2201 (native deep merge, 3.5x faster) introduced type-mismatch guards that are too strict for real-world Atmos configurations
- Some production configs override inherited lists with
{}(empty map) instead of[](empty list) — while this is a misconfiguration, it worked with the previous mergo-based merge and must not break in a patch release - Stack-processing commands (
list stacks,describe stacks, etc.) fail on affected configs
describe affected --upload timeout
- The
--uploadflag is used by Atmos Pro integration to upload affected stacks - Large infrastructures with many components across many stacks generate thousands of affected items
- The O(N × full_stack_resolution) cost made the command unusable, blocking CI/CD pipelines
Test results
Deep merge type-override
| Test | Result |
|---|---|
| 11 new type-override unit tests (list→map, list→scalar, list→nil, list→bool, nested, with slice flags) | All pass |
Minimal stack fixture (merge-type-override) with 3 override patterns
| atmos list stacks succeeds
|
| All existing merge tests (updated 6 tests from error→success expectations) | All pass |
| Merge package coverage | 92.5% overall, 100% on merge_native.go functions |
describe affected --upload optimization
| Metric | Before | After |
|---|---|---|
describe affected (no dependents)
| ~7s | ~7s (unchanged) |
describe affected --include-dependents
| 40+ min (never completes) | ~10s |
| Payload size (with dependents) | N/A | ~1.2 MB |
| Test | Result |
|---|---|
| All 30+ existing affected/dependent tests | All pass |
| 8 new dependency index tests (build, lookup, self-reference, abstract skip, multi-stack, helmfile, edge cases) | All pass |
Changed function coverage: all above 80% (findDependentsByScan 95.3%, findDependentsFromIndex 100%, buildDependencyIndex 81.1%)
| Above threshold |
references
- PR #2201: perf: replace mergo with native deep merge (introduced the type-mismatch regression)
docs/fixes/2026-03-24-deep-merge-type-mismatch-regression.md: detailed analysis of the merge regressiondocs/fixes/2026-03-24-describe-affected-upload-timeout.md: detailed analysis with incremental timing breakdown
Summary by CodeRabbit
-
Bug Fixes
- Restored deep-merge behavior so stack overrides can replace list-typed values with maps, scalars or nulls.
- Fixed describe-affected --upload hang/timeout by caching stack resolution and using an indexed dependent lookup.
-
Documentation
- Added detailed pages describing both regressions, root causes, and verification steps.
-
Tests
- Added unit and end-to-end tests covering cross-type merge overrides and dependency-index dependent resolution.
fix: describe affected crashes on deleted components when dependents enabled @milldr (#2237)
What
Skip dependent resolution for deleted components in atmos describe affected. When --include-dependents or --upload is used, deleted components are now skipped during dependent resolution instead of crashing with "invalid component".
One-line fix: Added a guard in addDependentsToAffected to skip components marked Deleted: true, giving them empty Dependents slices.
Why
When a PR removes a component or stack, atmos describe affected correctly detects it as deleted via detectDeletedComponents. However, when --include-dependents is enabled (also auto-enabled by --upload), the code attempts to resolve dependents for ALL affected items — including deleted ones. Since deleted components don't exist in HEAD, ExecuteDescribeDependents → ExecuteDescribeComponent → detectComponentType fails with "invalid component".
This blocks Atmos Pro from handling deleted stacks entirely — the CLI crashes before any data reaches the API.
Reproduction:
- Create a PR that removes a component import from a stack YAML
- Run
atmos describe affected --upload - Crash:
Error: invalid component — Could not find the component X in the stack Y
Example failures:
References
- Deleted detection implementation: #2063
- Atmos Pro side: cloudposse-corp/apps#915
- Linear: AP-161
Summary by CodeRabbit
-
Bug Fixes
- Prevented processing of dependents for deleted components, avoiding a prior crash and ensuring deleted items report no dependents.
-
Tests
- Added test coverage confirming deleted components are marked deleted, have empty dependents, and that dependent-resolution no longer errors.
perf(merge): replace mergo pre-copy loop with reflection-free native deep merge (3.5× faster) @nitrocode (#2201)
- All prior CR items (v1-v9) resolved in previous sessions
-
TestExecuteMainTerraformCommand_Error_Propagates(item 2 exec test) -
TestMergeWithOptions_EmptyInputs_ReturnsEmptyMap,StrategyFlags_WireThrough(item 2 merge tests) - validate_stacks_test hardening (item 3)
- Workspace recovery negative-path logging (item 4)
- compare_mergo header with CrossValidate (item 5)
- BenchmarkMerge_ProductionScale 10×25 keys + node_groups (item 6 code)
- Mergo follow-up issue #2242 in blog (item 7 partial)
- compile-guard, isTerraformCurrentWorkspace comment (item 8 partial)
-
terraform_execute_exit_wrapping_test.go— contract test for ExitCodeError wrapping -
terraform_execute_single_invocation_test.go— spy counter via _ATMOS_TEST_COUNTER_FILE in testmain -
testmain_test.goupdated to support _ATMOS_TEST_COUNTER_FILE - Blog updated: 25 keys + node_groups list-of-map-of-list (item 6 blog)
- roadmap.js: added
Migrate remaining mergo call-sitesmilestone with issue: 2242 (item 7)
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.
Summary by CodeRabbit
-
New Features
- ~3.5× faster deep-merge for stack configuration resolution (native implementation)
- Improved Terraform workspace detection and more tolerant recovery behavior
-
Bug Fixes
- Fixed slice-merge precedence and eliminated unintended data aliasing/corruption
- Resolved workspace/state edge cases and clarified recovery logging
-
Tests
- Strengthened test coverage with runtime gating, negative-path checks, and cross-validation opt‑ins
-
Documentation
- Added detailed deep-merge blog post and a release notes/fixes page documenting behavior and migration guidance