docs(prd): add PRD for DAG-based concurrent execution @osterman (#2194)
what
Added comprehensive Product Requirements Document for implementing DAG-based concurrent execution in Atmos. The PRD proposes a ready-queue scheduler that enables concurrent execution of components across all types (Terraform, Packer, Ansible, custom registry) while respecting dependency graphs and maintaining safe defaults (sequential by default with opt-in parallelism via --max-concurrency).
why
Currently Atmos executes components sequentially even when they have no dependencies and could safely run in parallel. For large deployments with dozens or hundreds of components, this serialization is the dominant bottleneck. The PRD establishes architectural principles, justifies ready-queue scheduling through industry research (Terragrunt, Make, Ninja, Bazel, Buck2, and 10+ other tools all use this pattern), and provides a phased rollout plan. The document also addresses critical concerns: output isolation under concurrency via stream injection, integration with legacy built-in component types without requiring migration, and configuration of concurrency defaults through atmos.yaml.
references
- Prerequisite: PR #2193 (dependencies.components format with cross-type dependencies)
- Related: PR #1516 (pkg/dependency/ graph package)
- Related: PR #2159 (alternative proposal for concurrent provisioning)
- Related PRD: docs/prd/terraform-dependency-order.md
Summary by CodeRabbit
- Documentation
- Added specification document for DAG-based concurrent execution capabilities, including cross-component parallelism, configurable concurrency limits, stream isolation, and comprehensive dependency graph handling.
feat: Add EKS kubeconfig authentication integration (ATMOS-157) @Benbentwo (#2149)
## what- EKS kubeconfig integration: Auto-provisions kubeconfig for linked EKS clusters during
atmos auth loginvia the integration framework atmos auth eks-tokencommand: New kubectl exec credential plugin that generates EKS bearer tokens using AWS credentials, eliminating AWS CLI dependency- Go SDK execution paths: Enhanced
atmos aws eks update-kubeconfigwith--integrationflag and direct identity-based cluster access without requiring components or stacks - Integration cleanup: Linked integrations are cleaned up during
atmos auth logout(non-fatal, doesn't block logout) - Environment composition: KUBECONFIG paths from integrations are merged via colon-separated lists with deduplication
why
- Authentication in Atmos: After
atmos auth login, users previously had to manually run AWS CLI commands to generate kubeconfig. This integrates that into the auth flow - No AWS CLI required: The kubectl exec credential plugin uses AWS SDK directly instead of shelling out to AWS CLI, improving security and eliminating deployment dependencies
- Consistent integration pattern: Follows the established ECR integration pattern, enabling future cloud-specific integrations (GCP, Azure)
- Simplified kubeconfig generation: Supports multiple modes (merge/replace/error) for flexible kubeconfig management across different workflows
references
- PRD:
docs/prd/eks-kubeconfig.md - Linear: ATMOS-157
- Related PRD: #1887 (browser-based auth), #1884 (EKS kubeconfig integration)
closes #2076
Summary by CodeRabbit
-
New Features
- AWS EKS support: new eks-token CLI (kubectl exec credential plugin), automatic kubeconfig provisioning on login, and update-kubeconfig modes including --integration/--identity
- Integrations can now provide deterministic environment variables and perform idempotent cleanup during logout
-
Documentation
- New docs, tutorial, and blog post covering EKS kubeconfig auth, eks-token, and integration configuration
-
Chores
- Updated AWS/Kubernetes-related dependencies and NOTICE license entries
-
Tests
- Extensive unit tests for EKS, token generation, kubeconfig management, integrations, and env composition
Dynamic multi-announcement bar system @osterman (#2228)
what
- Implemented a data-driven multi-announcement queue system that cycles through curated product announcements
- Users dismiss one announcement and after a 3-day cooldown, the next appears automatically
- Added 11 announcements covering key features: Reference Architecture, Atmos Pro, Native CI, AI support, Toolchain, Auth, code generation, component provisioning, locals, backends, and GitHub OIDC
- Each announcement has distinct theming (unique background color) for visual differentiation
- Swizzled the Docusaurus AnnouncementBar component to replace the single static announcement bar
why
- The previous announcement bar was static and one-time dismissible—once dismissed, it never reappeared
- Product messaging was limited to hardcoded content and couldn't evolve with features
- A deliberate queue of announcements keeps messaging fresh and intentional without becoming a changelog feed
- The cooldown between announcements respects user attention while ensuring discovery of important features
- Decoupling announcements into a data file makes it trivial to add, reorder, or update messaging without code changes
references
- Closes the gap for curated product messaging vs. static announcements
- Uses localStorage for persistence and Docusaurus's built-in anti-layout-shift mechanism
- Follows existing Atmos patterns for data-driven configuration (roadmap.js model)
Summary by CodeRabbit
-
New Features
- Reworked announcement banner: closeable, styled multi-announcement queue that persists dismissals and enforces a 3‑day cooldown.
- Announcements support rich HTML content, configurable background/text colors, and automatically advance to the next item when dismissed.
-
Chores
- Removed the previous global announcement banner configuration.
-
Documentation
- Added a Terraform docs category and updated the Terraform index metadata.
feat: AP-163 send raw instance status to Atmos Pro, extend to apply @milldr (#2216)
What
- CLI sends raw
command+exit_codeto Atmos Pro instead of interpreting exit codes into status strings - Extends status upload from plan-only to both plan and apply (both require explicit
--upload-statusflag) - Reads
--upload-statusflag via Cobra/Viper (fixes silent no-op when flag was consumed by Cobra before reaching the upload code path) - Treats plan exit code 2 (changes detected) as success after upload completes, so CI workflows don't fail
Why
Instances on the Atmos Pro dashboard show "Unknown" status after completed workflow runs. The CLI was only uploading status for plan (not apply), and was interpreting exit codes client-side. Moving interpretation server-side means status logic can be updated without a CLI release, and all exit codes (including errors) are now reported.
Ref
- AP-163
- PRD:
docs/prd/instance-status-raw-upload.md - Full PRD (Atmos Pro):
cloudposse-corp/apps → apps/atmos-pro/prd/instance-status-from-workflow-hooks.md - Atmos Pro counterpart: cloudposse-corp/apps
qa-1branch
Summary by CodeRabbit
Release Notes
-
New Features
- Added
--upload-statusflag to upload raw Terraform execution data (command and exit code) to Atmos Pro for bothplanandapplyoperations. - Introduced configurable CI exit code mapping to remap Terraform exit codes while preserving original execution results in cloud uploads.
- Added
-
Bug Fixes
- Improved plan output detection to correctly identify cases where only output values change.
-
Documentation
- Added feature documentation and blog post explaining the new upload capability.
feat: Isolated browser sessions for multi-account console access @osterman (#2229)
what
- Add isolated browser sessions support to
atmos auth console - New
pkg/browser/package with cross-platform Chrome detection and browser opening - Chrome detection on macOS (apps in /Applications), Linux (PATH search), and Windows (Program Files paths)
- Isolated browser opener using Chrome's
--user-data-dirfor per-identity browser contexts - New
--isolatedflag onatmos auth consolecommand - Global configuration support via
auth.console.isolated_sessionsinatmos.yaml - Session directories are deterministic (hash of realm + identity), allowing session reuse
why
Users frequently need to work with multiple cloud accounts simultaneously (e.g., comparing configs, debugging cross-account issues). Cloud providers enforce single-session per browser context, forcing users to log out/in when switching accounts. Isolated sessions solve this by giving each identity its own Chrome profile via --user-data-dir, allowing multiple console sessions without logout friction.
references
- #1879 - Related auth system work
- Closes isolated sessions feature request
- See
docs/prd/auth-console-isolated-sessions.mdfor full design
Summary by CodeRabbit
-
New Features
- Isolated browser sessions for the auth console so multiple identities can be open simultaneously without logout conflicts.
- New --isolated flag for per-command isolated sessions and auth.console.isolated_sessions for global control.
- Deterministic per-identity session directories so sessions are reused per identity.
- Graceful fallback to the default browser if Chrome/Chromium is unavailable; clearer status messages when opening fails.
-
Documentation
- CLI docs, config docs, PRD, and blog post added/updated with usage examples and guidance.
-
Refactor
- Improved browser-opening flow for more reliable, testable launches and better error surfacing.
🚀 Enhancements
fix: include component name and file location in terraform load errors @osterman (#2243)
what
- When
atmos describe affectedfails to load a terraform component (e.g. HCL syntax errors), the error now includes the component name and file:line location - Previously the error was generic:
failed to load terraform component Variables not allowed... - Now it reads:
failed to load terraform component 'vpc' at main.tf:3: Variables not allowed...
why
- The generic error gave no indication of which component failed, making debugging difficult in repos with many components
- The
describe affectedcode path used simpleerrors.Joinwithout context, while thedescribe componentpath already had rich error reporting via the error builder
references
- Follows the same diagnostic extraction pattern already used in
internal/exec/utils.goforProcessComponentConfig
Summary by CodeRabbit
-
Bug Fixes
- Improved error messages when configuration parsing fails, now including component names and file location details for faster troubleshooting.
-
Tests
- Added test coverage for configuration parsing error scenarios to validate error reporting accuracy.
refactor: reduce ExecuteDescribeStacks cyclomatic complexity 247→10 + near-100% unit test coverage @nitrocode (#2204)
- [x] Investigate CI test failures (`TestDescribeStacksWithEmptyStacks`, `TestDescribeStacksWithVariousEmptyStacks`) - [ ] Remove debug test file accidentally created during investigation - [ ] Add empty stack fixture file to complete scenario so the `includeEmptyStacks` tests are reliable on all platforms - [ ] Verify tests pass locally📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.
Summary by CodeRabbit
-
Bug Fixes
- Fixed orphan stack entries when
NameTemplateis set with empty manifest names. - Improved component emptiness detection accuracy.
- Added type assertion guards to prevent potential crashes.
- Corrected stack file processing order.
- Fixed orphan stack entries when
-
Tests
- Enhanced test coverage for describe stacks functionality, including edge cases and error handling.
refactor(terraform): reduce ExecuteTerraform complexity 160→9, improve test coverage @nitrocode (#2226)
what
- Extract 30+ focused helper functions from the
ExecuteTerraformmonolith (~900 lines, cyclomatic complexity 160), reducing complexity to ~9 across 4 source files all under 600 lines - Add 100+ unit tests across 6 test files covering argument builders, auth setup, workspace management, cleanup, exit-code resolution, and the execution pipeline
- Fix all 16 golangci-lint issues: cyclomatic complexity, magic constants, unused params, forbidden API calls, huge params, nested blocks, argument limits
- Address all 10 CodeRabbit audit pass 7 items: remove duplicate/tautological tests, add missing coverage for
tenv != nil,buildTerraformCommandArgsinit branch, andstoreAutoDetectedIdentityguard - Document the
GenerateFilesForComponentdouble-invocation whenAutoGenerateFiles=true(pre-existing behavior, not a regression)
Architecture after refactor
| File | Lines | Responsibility |
|---|---|---|
terraform.go
| 189 | Orchestrator: ExecuteTerraform → prepareComponentExecution → executeCommandPipeline → cleanupTerraformFiles
|
terraform_execute_helpers.go
| 546 | Auth, env vars, init, validation, config generation helpers |
terraform_execute_helpers_args.go
| 156 | Per-subcommand argument builders (plan/apply/init/workspace/destroy) |
terraform_execute_helpers_exec.go
| 350 | Execution pipeline, workspace setup, TTY guard, exit-code resolution, cleanup |
Key design decisions
- Injectable test vars (
defaultMergedAuthConfigGetter,defaultComponentConfigFetcher,defaultAuthManagerCreator) enable isolated unit testing of auth paths without real infrastructure - Named subcommand constants (
subcommandApply,subcommandDeploy,subcommandInit,subcommandWorkspace) replace 16+ magic string literals - Mutual exclusion contract between
executeTerraformInitPhaseandbuildInitSubcommandArgsis documented — both callprepareInitExecutionbut are guarded bySubCommand == "init"branching
Lint fixes (16 issues → 0)
revive/cyclomatic: extractshouldSkipWorkspaceSetup,runPreExecutionSteps,autoGenerateComponentFiles,provisionComponentSource,logAndWriteComponentVars,logCliVarsOverrides,handlePlanStatusUploadrevive/add-constant:subcommandApply/Deploy/Init/Workspace,dirPermissionsgocritic/hugeParam:handleVersionSubcommandtakes pointer paramsgocritic/unlambda:defaultMergedAuthConfigGetteruses direct function refgocritic/filepathJoin: split path segments in testunparam: remove unusedplanFilefrombuildApplySubcommandArgsforbidigo: nolint forTF_WORKSPACE(Terraform convention)nestif: extracthandlePlanStatusUploadrevive/argument-limit: nolint for variadic opts
why
ExecuteTerraformwas the highest-complexity function in the codebase (cyclomatic 160) — untestable as a unit, any change risked regressions in auth, workspace, plan-file, or cleanup logic- The function handled 10+ distinct responsibilities in a single method: path resolution, auto-generation, JIT provisioning, toolchain deps, auth hooks, env var assembly, init pre-step, argument construction, workspace setup, TTY guard, command execution, status upload, and cleanup
- Breaking it into focused helpers means each responsibility is independently testable, and adding new subcommands or flags requires changes in one place instead of wading through 900 lines
- The 16 lint issues were blocking pre-commit hooks on any file in the same package
references
- Fix doc:
docs/fixes/2026-03-20-executeterraform-refactor.md - Blog post:
website/blog/2026-03-18-refactoring-executeterraform.mdx - Roadmap:
website/src/data/roadmap.js(Code Quality initiative milestone)
Summary by CodeRabbit
Release Notes
-
Refactor
- Improved internal code organization and maintainability by decomposing complex Terraform execution logic into smaller, focused components. Complexity metrics reduced significantly.
-
Tests
- Added comprehensive unit test suite (100+ tests) to improve reliability and catch edge cases.
-
Documentation
- Added documentation detailing refactoring approach and audit findings.
- Published blog post about internal improvements and lessons learned.
-
Chores
- Updated roadmap to reflect completed internal optimization work.
refactor(cli_utils): DRY processArgsAndFlags with table-driven flag parsing, 100% unit test coverage @nitrocode (#2225)
- [x] **Bug fix**: Boolean flags (`--dry-run`, `--skip-init`, `--affected`, `--all`, `--process-templates`, `--process-functions`, `-h`/`--help`) were unconditionally stripping `args[i+1]` from Terraform/Helmfile pass-through args — silently dropping user-intended flags like `--refresh=false` or `--parallelism=10`. Fix: added `valueTakingCommonFlags` set; only value-consuming flags strip `args[i+1]`. - [x] **Bug fix**: `strings.Split(arg, "=")` rejected flag values containing `=` (e.g. `--query=.tags[?env==prod]`, `--append-user-agent=Key=Val`). Fix: `strings.SplitN(arg, "=", 2)` throughout. - [x] **Bug fix**: `strings.HasPrefix(arg+"=", flag)` was logically inverted — `--terraform-command-extra` matched `--terraform-command`. Fix: `strings.HasPrefix(arg, flag+"=")` at all call sites. - [x] **Bug fix**: `--from-plan` and `--identity` unconditionally consumed `args[i+1]` even when it was another flag. Fix: both only strip `args[i+1]` when it does not start with `-`. - [x] **Bug fix**: `--settings-list-merge-strategy` was parsed into `ArgsAndFlagsInfo` but absent from `commonFlags`, leaking it into Terraform pass-through args. Fix: added to both `commonFlags` and `stringFlagDefs`. - [x] **Refactor**: replaced 25+ copy-paste `if/else` chains (~200 lines) with a table-driven design: `stringFlagDefs` (26 entries), `parseFlagValue` helper, `parseIdentityFlag` helper, `parseFromPlanFlag` helper, and `valueTakingCommonFlags` set. Cyclomatic complexity: ~67 → ~15. - [x] **Tests**: added `internal/exec/cli_utils_helpers_test.go` — 100% coverage on `processArgsAndFlags`, `parseFlagValue`, `parseIdentityFlag`, `parseFromPlanFlag`, `parseQuotedCompoundSubcommand`. Regression tests confirm boolean flags no longer drop adjacent pass-through args. - [x] **Docs**: blog post `website/blog/2026-03-18-process-args-flags-refactor.mdx`, roadmap milestone `website/src/data/roadmap.js`, code comment clarifying `--process-templates`/`--process-functions` are Cobra-only flags. - [x] Closes #2204✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.
fix: base path resolution fallback when git root is unavailable @aknysh (#2236)
what
- Fix
failed to find importerror whenATMOS_BASE_PATHis set to a relative path on CI workers without a.gitdirectory (e.g., Spacelift) - Make
tryResolveWithGitRootandtryResolveWithConfigPathsource-aware by passing thesourceparameter through the call chain - For runtime sources (
ATMOS_BASE_PATHenv var,--base-pathflag,atmos_base_pathprovider param),tryResolveWithConfigPathnow tries CWD first before config dir - Both paths use
os.Statvalidation with fallback - Add 5 bug reproduction tests for the base path resolution fix
- Add 4 cycle detection tests for the
metadata.componentstack overflow fix verification - Extract
basePathSourceRuntimeandcwdResolutionErrFmtconstants - Extract
tryCWDRelativehelper to reduce cognitive complexity
why
The v1.210.1 fix (PR #2215) added os.Stat + CWD fallback to tryResolveWithGitRoot, but this only works when git root IS available. On CI workers without .git (e.g., Spacelift), getGitRootOrEmpty() returns "" and the code falls through to tryResolveWithConfigPath — which lacked the same fallback. It unconditionally joined with cliConfigPath (the atmos.yaml directory), producing the wrong path.
The broken code path (before this fix)
- User sets
ATMOS_BASE_PATH=.terraform/modules/monorepo processEnvVarssetsBasePathandBasePathSource = "runtime"resolveAbsolutePathclassifies.terraform/modules/monorepoas a bare path → callstryResolveWithGitRootgetGitRootOrEmpty()returns""— no.giton CI- Falls to
tryResolveWithConfigPath(".terraform/modules/monorepo", "/workspace") - Unconditionally returns
/workspace/.terraform/modules/monorepo— WRONG (doesn't exist) - Correct path:
/workspace/components/terraform/iam-delegated-roles/.terraform/modules/monorepo
Why absolute paths work
When ATMOS_BASE_PATH is set to an absolute path, resolveAbsolutePath returns it as-is at the first check (filepath.IsAbs), bypassing all resolution logic.
The asymmetry fixed
| Function | Before | After |
|---|---|---|
tryResolveWithGitRoot
| Has os.Stat + CWD fallback (v1.210.1)
| Unchanged — now passes source to fallback
|
tryResolveWithConfigPath
| No os.Stat, no CWD fallback
| Source-aware: runtime → CWD first, config → config dir first, both with os.Stat
|
Resolution order after fix
| Source | Git Root Available | Git Root Unavailable |
|---|---|---|
| Runtime (env var, CLI flag, provider param) | git root → CWD (existing) | CWD → config dir (new) |
Config (base_path in atmos.yaml)
| git root → CWD (existing) | config dir → CWD (new) |
Stack overflow verification
The same user also reported fatal error: stack overflow when abstract components have metadata.component set (fixed in v1.210.0, PR #2214). Additional tests were written to verify the cycle detection works for various patterns including multiple abstract/real pairs, cross-component cycles, and defer delete re-entry. All pass — the cycle detection is working correctly for all reproducible patterns.
references
- Follow-up to #2215 (base path resolution fix, v1.210.1)
- Follow-up to #2214 (stack overflow fix for
metadata.component, v1.210.0) - Related issue: #2183
- Fix doc:
docs/fixes/2026-03-19-failed-to-find-import-no-git-root-fallback.md - Previous fix doc:
docs/fixes/2026-03-17-failed-to-find-import-base-path-resolution.md - Stack overflow fix doc:
docs/fixes/2026-03-16-metadata-component-abstract-stack-overflow.md
Summary by CodeRabbit
-
Documentation
- New guide describing CI-only "failed to find import" behavior with relative ATMOS_BASE_PATH and how resolution differs vs absolute paths.
-
Bug Fixes
- Source-aware base-path resolution with proper existence checks and adjusted fallback ordering to prefer CWD for runtime cases, preventing incorrect joined paths.
-
Tests
- New and updated regression tests covering base-path resolution scenarios and multiple cycle-detection cases to prevent stack-overflow and ensure correct fallback behavior.