fix: resolve auth realm isolation issues @aknysh (#2075)
what
- Fixed 4 auth realm isolation issues introduced in PR #2043 (v1.206.0) that broke CI/CD pipelines, profile handling, and multi-stack configurations
- Added realm mismatch warning that detects when credentials exist under a different realm than the current one
- Added realm isolation documentation to the auth configuration guide
- Improved test coverage with behavior-descriptive test names
why
After merging PR #2043 (auth realm isolation with credential separation), several critical regressions were reported:
Issue 1: Profile Incorrectly Required for Default Identity (#2071)
Problem: After upgrading to v1.206.0, any atmos command (including atmos version) fails with a "profile not found" error when ATMOS_PROFILE is set to an auth identity name (e.g., ATMOS_PROFILE=root-admin).
Cause: Users set ATMOS_PROFILE thinking it controls auth identity selection, but it controls configuration profiles. When set to an identity name that doesn't match any profile directory, loadProfiles() fails.
Fix: Enhanced the "profile not found" error to detect when the profile name matches an auth identity name and suggest using ATMOS_IDENTITY or --identity instead.
Issue 2: Realm-Scoped Credential Paths Break CI/CD Workflows
Problem: CI/CD pipelines fail with AWS credential errors because credential file paths changed from {baseDir}/aws/{provider}/credentials to {baseDir}/{realm}/aws/{provider}/credentials.
Cause: PR #2043 auto-generated a realm from a SHA256 hash of the CLI config path, making realm isolation always-on.
Fix: Made realm isolation opt-in. When auth.realm is not configured and ATMOS_AUTH_REALM is not set, credentials use the backward-compatible path with no realm subdirectory. Also fixed CleanupAll() to safely handle empty realm.
Issue 3: Loading Unrelated Stack Files for Auth Defaults (#2072)
Problem: Running a command scoped to a specific stack causes a "multiple default identities" error when different stacks define different defaults.
Cause: LoadStackAuthDefaults() scanned ALL stack files without knowing which stack is the target.
Fix: When different stack files define conflicting default identities, the defaults are discarded (returns empty) rather than raising an error. The per-stack default is resolved after full stack processing.
Issue 4: ECR Docker Push Fails with 403 After Upgrading
Problem: GitHub Actions CI/CD fails with 403 Forbidden when pushing to AWS ECR. Docker login succeeds but push fails. Pinning to v1.205.1 resolves it.
Cause: assumeRoleIdentity hardcoded empty realm "" in Environment(), PrepareEnvironment(), and CredentialsExist() instead of using i.realm. This caused Authenticate() to write credentials to the realm-scoped path while GetEnvironmentVariables() returned the non-realm path, so the AWS SDK fell back to the runner's default IAM role.
Fix: Changed all three NewAWSFileManager("", "") calls to NewAWSFileManager("", i.realm) to match the pattern used by all other identity types.
Realm Mismatch Warning
Added detection that warns users when credentials exist under a different realm than the current one. This helps diagnose issues after changing auth.realm or ATMOS_AUTH_REALM. Only runs on credential-not-found failure paths (zero happy-path overhead).
Realm Documentation
Added "Credential Realm Isolation" section to the auth configuration guide covering: configuration, precedence, credential storage paths, naming rules, and usage.
references
Summary by CodeRabbit
-
New Features
- Opt-in credential realm isolation via env/config, with backward-compatible default behavior.
-
Bug Fixes
- Fixed AWS credential path mismatches (assume-role/ECR) and added realm-aware handling.
- Prevented loading unrelated stack defaults; conflicting stack defaults are now detected and not applied globally.
- Emits a one-time warning when credentials are found in a different realm.
-
Documentation
- Added docs describing credential realm isolation and precedence.
-
Tests
- Expanded tests covering realm behavior, stack default conflicts, and assume-role paths.
feat: strip unused fields from `describe affected --upload` output @milldr (#2067)
what
- When
--uploadflag is used, strips fields not needed by Atmos Pro's Inngest processing - Reduces payload size by ~70-75% to stay within Inngest's 256KB limit
- Fields kept:
component,stack,included_in_dependents,dependents,settings.pro - Fields removed:
settings.depends_on,settings.github,component_type,component_path,namespace,tenant,environment,stage,stack_slug,affected
why
- Large infrastructure repos exceed Inngest's 256KB payload limit, causing 500 errors
- The removed fields are not used in Atmos Pro's Inngest event handlers
- Stripping at CLI level prevents uploading unnecessary data (vs server-side stripping)
- Tested against realistic fixtures: 328KB → 85KB (74% reduction)
Summary by CodeRabbit
-
New Features
- Added a --upload option to describe-affected that produces a minimized payload for external uploads.
-
Documentation
- Added a PRD detailing the upload behavior, retained fields, compatibility, testing, and migration guidance.
-
Tests
- Added unit tests covering payload trimming, recursive dependents, nil/empty cases, and settings retention.