cloudposse/atmos v1.219.0 on GitHub

Improve Atmos Algolia search ranking @osterman (#2406)

what

Add a repo-managed Algolia crawler config for atmos.tools with URL taxonomy, canonical URL handling, definition-term weighting, and manual pageRank ranking.
Add Algolia deploy, dry-run, crawler unit test, and opt-in live relevance test scripts under website/algolia.
Add the Algolia GitHub Actions workflow so PRs validate without secrets and only merges to main deploy crawler config/settings through the algolia environment.
Move production crawler reindexing into a separate algolia environment job after website deploy, remove the deprecated preview scraper reindex path, and refresh Algolia indexing docs.

why

Improve atmos auth search results so command/user-intent docs outrank configuration reference definitions.
Keep Algolia credentials and crawler triggers scoped to the algolia GitHub environment instead of the production website deployment environment.
Prevent crawler config uploads from pull requests while still validating the payload before merge.
Delete stale non-crawler indexing behavior and document the new crawler-based deployment split.

references

Summary by CodeRabbit

New Features
- CI workflows to validate Algolia crawler on PRs and deploy/reindex from main (including manual dispatch and post-deploy trigger).
Documentation
- Rewrote Algolia indexing guide; added crawler README, commands, troubleshooting, and env-secret guidance.
New Tools
- Added crawler config/dashboard, deployment utility, and index management scripts.
Tests
- Expanded crawler unit tests and added optional gated live relevance integration tests.
Chores
- Updated site deploy flow and removed legacy preview reindex path.

feat(claude): add pull-request skill for labeling + changelog + roadmap workflow @osterman (#2424)

what

Adds a new pull-request skill at .claude/skills/pull-request/SKILL.md that future agents (and humans) invoke via /pull-request before opening or updating any PR. The skill encodes the three policies we keep getting burned by:

Every PR needs a semver label. Unlabeled PRs fail the PR Semver Labels CI check.
minor / major PRs require a blog post AND a roadmap update. The Check for changelog and roadmap updates workflow gates merging on both.
featured[] in the roadmap is curated, max 6 items. Never auto-promote a shipped milestone — only the user decides.

what the skill covers

Label decision tree — no-release / patch / minor / major, with the explicit "would a user see ANY change?" framing to default plumbing PRs to no-release instead of inflating them to minor.
Common-mistake call-outs — labeling foundation PRs as minor because they're part of a larger feature; labeling default flips as patch because "it's just a default."
Apply-the-label-first rule — gh pr edit <num> --add-label <label> immediately after opening, not after CI complains.
Blog post rules — when to write one, file path conventions, MDX template, tags from tags.yml, authors from authors.yml.
Roadmap rules — delegate to the existing roadmap agent, do NOT touch featured[], milestone schema with pr + changelog fields.
End-to-end checklist — 8 items to run through before pushing.
Updating-an-existing-PR path — for the common case where a PR is opened without a label.

why now

I just opened a stack of 6 foundation PRs (#2417–#2423) and shipped them all unlabeled. CI failed on all of them. Backfilling labels worked but the underlying problem — there was nothing telling me (or any future agent) how to decide the label up front — is what this skill fixes.

The skill is also written to be self-contained so an agent invoked cold (no prior context) can still apply it correctly. CLAUDE.md now points to the skill from the existing Pull Requests section.

references

CI workflow: .github/workflows/changelog-check.yml
CI workflow: .github/workflows/feature-release.yml
Existing roadmap agent: .claude/agents/roadmap.md (this skill delegates to it for roadmap edits)
Tags: website/blog/tags.yml
Authors: website/blog/authors.yml

Summary by CodeRabbit

Documentation
- New PR workflow requiring exactly one semver label (no-release/patch/minor/major) chosen via a decision flow and applied atomically; mandatory use of the pull-request helper before opening/updating PRs.
- CI-enforced rules for when changelog blog posts and roadmap updates are required, validated blog MDX/front-matter, allowed tags/authors, and exceptions for internal refactors.
- Mandatory GPG/SSH-signed commits with verification/re-sign guidance, ordered pre/post-push checklists, guidance for relabeling existing PRs, and a CLI body-file workaround for PR creation.

feat(list): add --skip flag; fix --stack/--filter/--query on list instances @osterman (#2413)

what

--skip across every atmos list subcommand. instances, components, metadata, sources, and stacks now accept --skip <yaml-function> (repeatable, e.g. --skip terraform.state --skip terraform.output). Mirrors the surface already exposed by describe affected | component | stacks. Bound to ATMOS_SKIP; the existing ATMOS_AFFECTED_SKIP continues to work on list affected as a back-compat alias. Threads through ExecuteDescribeStacks — which already accepted skip but was being passed nil at every list callsite.
--stack, --filter, and --query on atmos list instances now work. Three documented flags were previously silent: --stack was ignored (every instance returned), --filter was a TODO stub, and --query was read into options and dropped. --stack now uses path.Match glob semantics, --filter evaluates a YQ predicate per row, and --query projects each row via YQ (scalars land in a value column, maps flatten to row keys). Closes a latent ENV-precedence gap so ATMOS_LIST_FORMAT and ATMOS_UPLOAD are honored via viper.
Tests. Parser, options, and propagation tests for --skip (with a regression test for the literal list instances --upload --skip terraform.state failure). Unit + integration tests for the stack/filter/query work. New pkg/list/filter/yq.go (YQPredicateFilter, YQProjector, isTruthy) with full coverage.
Docs + release artifacts. --skip documented on all five list pages; two release-blog entries; one roadmap milestone under the Discoverability initiative.

why

The concrete failure on --skip: atmos list instances --upload --skip terraform.state errored with unknown flag. --process-functions=false is not a substitute because it also disables !template, which Atmos Pro uploads need so settings.pro.enabled evaluates to a real boolean instead of a literal string.
The concrete failure on --stack/--filter/--query: the docs promised filtering on list instances and the implementation didn't honor it. Users hit silent wrong-result behavior, not an error.
Both features ship through the same set of list files; bundling avoids merge churn and keeps the test+docs surface coherent.

references

Pattern source for --skip rollout: #2363 (--process-templates / --process-functions rollout across list)
Origin of --skip (describe family): #1006

Summary by CodeRabbit

New Features
- Repeatable --skip flag (ATMOS_SKIP; legacy ATMOS_AFFECTED_SKIP preserved) added across list commands; list instances adds --stack glob filtering.
Enhancements
- YQ-based --filter (truthy predicates) and --query (projections); format-aware validation for tree/matrix; improved config precedence for list flags.
Bug Fixes
- --stack / --filter / --query now honor documented behavior.
Tests
- Expanded coverage for skip flag, YQ filters/projectors, and stack-glob filtering.
Documentation
- Updated CLI docs, blog posts, and roadmap.

🚀 Enhancements

feat(mcp): example + blog for MCP-for-AI-coding-assistants, plus fixes from review @aknysh (#2425)

what

New example and blog post — "MCP for AI Coding Assistants"

New example at examples/mcp-for-ai-coding-assistants/ showing how to use Atmos as the configuration / auth / toolchain layer for MCP servers consumed by Claude Code, OpenAI Codex CLI, and Google Gemini CLI.
- One atmos.yaml defines 9 MCP servers — the embedded atmos MCP server + the full AWS suite (aws-docs, aws-knowledge, aws-pricing, aws-billing, aws-iam, aws-cloudtrail, aws-security, aws-api) — plus the HTTP-transport Atmos Pro MCP server registered alongside.
New blog post at website/blog/2026-05-16-mcp-for-ai-coding-assistants.mdx walking through the problem (three config formats, three auth flows, three toolchain stories), the solution (one atmos.yaml), and per-CLI wiring instructions for all three CLIs.

MCP implementation review — punch list fixes

Detailed in docs/fixes/2026-05-15-mcp-review-fixes.md. Nine issues found while reviewing `pkg/mcp/` and `cmd/mcp/`:

P1 — `atmos mcp export` now injects the toolchain PATH. Removed the duplicate `mcpJSONConfig`/`mcpJSONServer`/`buildMCPJSONEntry` types from `cmd/mcp/client/export.go` and delegated to the shared `mcpclient.GenerateMCPConfig` (which already threads `toolchainPATH` into each server's `env.PATH`). Symptom this fixes: IDE-spawned subprocesses couldn't find toolchain-managed `uvx` / `npx`.
P1 — MCP server registers the full Atmos AI tool surface. Replaced the hand-rolled 7-tool list in `cmd/mcp/server/start.go::initializeAIComponents` with a single call to the canonical `atmosTools.RegisterTools` factory. Picks up `describe_affected` (which the docs advertised but the registry was missing), `search_files`, `execute_atmos_command`, and the rest.
P2 — `atmos mcp restart` help text clarifies the validation semantics. Stop+start+stop is intentional for stdio servers, but the old `Short` ("Restart an MCP server") misled users.
P2 — Concurrent-safe temp file in `WriteMCPConfigToTempFile`. Switched from a fixed path to `os.CreateTemp(..., "atmos-mcp-config-*.json")`. Two concurrent `atmos ai ask` invocations no longer race.
P3 — Defensive copy in `Session.Tools()`. Returns `append([]*mcpsdk.Tool(nil), s.tools...)` so callers can mutate the slice without affecting the cache.
P3 — Hardened `firstSentence` helper in `cmd/mcp/client/tools.go`. Recognizes `! ` and `? ` (not just `. `), picks the earliest terminator, applies a 80-rune ceiling with ellipsis for descriptions without terminators.
P3 — `atmos mcp tools` migrated to the renderer pipeline. Now exposes `--format` / `--columns` / `--sort` / `--delimiter` with the same env-var fallbacks as `mcp list`. `--format=json` works.
P4 — Dropped unused `ScopedAuthProvider.baseConfig`. Was carrying "for future extensibility" comments while doing nothing.
P4 — `atmos mcp test` no longer double-prints errors. `RunE` returns `nil` so `main.go`'s `errUtils.Format` pipeline is skipped; `printTestResult`'s ✓/✗ markers are the single source of stderr output.

Also incidentally regenerated `pkg/telemetry/mock/mock_posthog_client.go` because `posthog-go` added `EvaluateFlags` and `golangci-lint` was blocking on it — unrelated to MCP, but had to be addressed to land the rest.

Test coverage

Per-issue regression tests landed alongside each fix. After the review-fix sweep we also added a focused coverage push for the lowest-coverage MCP paths:

`cmd/mcp/client`: 49.3% → 54.6%
`pkg/mcp`: 86.8% → 91.2%

Highlights:

`cmd/mcp/client/export.go::executeMCPExport`: 0% → 84.2% (4 new tests covering no-servers early return, happy path with file-mode 0600, existing-file overwrite tightening, and the WriteFile error wrap).
`pkg/mcp/server.go::Server.Run`: 0% → 100% (driven via the MCP SDK's `NewInMemoryTransports()` so the test doesn't need stdio/HTTP).
`pkg/mcp/client/mcpconfig.go::WriteMCPConfigToTempFile`: 52% → 57% (picked up the `os.CreateTemp` failure branch + the `TMPDIR` contract).

why

Example + blog

Anyone using Claude Code, OpenAI Codex CLI, or Google Gemini CLI with the awslabs/mcp suite hits the same friction: three config formats (`.mcp.json` vs `config.toml` vs `settings.json`), three credential stories (`aws configure`, `AWS_PROFILE` juggling, role-ARN copy-paste), and three toolchain stories (where `uvx` lives, which Python it picks up). Each is solvable individually; doing all three for three CLIs is annoying.

Atmos already has the primitives — `atmos.yaml` for centralized config, Atmos Auth for SSO/role assumption with per-server identity routing, the Atmos toolchain for binary pinning, and the embedded `atmos` MCP server for project introspection. `atmos mcp export` ties them together into a single `.mcp.json` every CLI can consume.

The example makes that workflow concrete; the blog post tells the story to drive adoption.

Fixes + tests

The MCP implementation shipped 6 phases of work (per the PRD at `docs/prd/atmos-mcp-integrations.md`) and the architecture is sound, but a review surfaced 9 quality / correctness issues — most notably the `atmos mcp export` path silently dropping toolchain PATH (P1) and the MCP server registering a curated subset of AI tools that drifted from the docs (P1). These are the kinds of issues that quietly degrade the user experience without triggering loud failures, so fixing them with regression tests prevents future drift.

The coverage boost specifically targets the paths the fix doc touched, so the regression tests aren't just present but exercised by the coverage gate.

references

Example: examples/mcp-for-ai-coding-assistants/
Blog: website/blog/2026-05-16-mcp-for-ai-coding-assistants.mdx
Fix doc: docs/fixes/2026-05-15-mcp-review-fixes.md
PRD (already shipped, this PR doesn't change it): docs/prd/atmos-mcp-integrations.md
Related: Atmos Agent Skills announcement — the Skills work that pairs with this MCP example
Related: Atmos Pro MCP server install

Summary by CodeRabbit

New Features
- Standardized mcp tools listing with format/columns/sort/delimiter and truncated descriptions.
- Example project, README and blog post for wiring MCP servers to AI coding assistants.
- mcp export now delegates to the shared generator for consistent exports and identity wrapping.
Bug Fixes / Improvements
- Exports use unique temp files with 0600 permissions to avoid races; restart help clarifies stop+start semantics; Session.Tools returns a defensive copy.
Tests
- Expanded test coverage across export, tools rendering, restart, temp-file handling, auth, server/session regressions.
Documentation
- Added review-fixes doc, examples, README, and blog post.

fix(ci): fire CI hooks per-component in --all / --query plan mode (#2… @thejrose1984 (#2430)

what

Fixes atmos terraform plan --all (and --query, --components, stack-without-component) producing only a single CI summary entry for the last component instead of one entry per component
Adds an optional PerComponentHook callback to ConfigAndStacksInfo that fires after each component executes in multi-component mode, with that component's captured stdout+stderr and exit code
Wires runCIHooksForPlanComponent as the per-component hook for the plan subcommand so $GITHUB_STEP_SUMMARY receives one entry per component with the correct component/stack context
Adds a wasMultiComponentExecution sentinel so the global PostRunE CI hook call is suppressed when per-component hooks have already fired, preventing double-firing

why

In multi-component mode, terraformRunWithOptions routes to ExecuteTerraformQuery and discards the shellOpts capture buffers passed by plan.go RunE, so capturedPlanOutput was always empty
PostRunE fired once after all components completed, calling RunCIHooks with an empty output buffer and the last component's info.Component/info.Stack, misattributing the summary
For stacks with 5 components, only 1 summary entry appeared (or none) instead of 5 — silently hiding the plan result for every component except the last

references

Closes #2397

Summary by CodeRabbit

New Features
- Per-component CI hook execution for terraform plan in multi-component runs, with captured, cleaned plan output passed to each hook.
Bug Fixes
- Ensure CI plan output capture and hook invocation behave consistently across early exits and multi-component executions to avoid duplicate/incorrect hooks.
Tests
- Added tests validating per-component hook behavior, output handling, and multi-component execution paths.

fix(validate): respect stacks.excluded_paths during validation @kapats (#2389)

The ValidateStacks function scans all YAML files in the stacks directory using the pattern `**/*`, but only excluded template files (.tmpl). It did not respect the user-configured `stacks.excluded_paths` from atmos.yaml.

This caused non-Atmos YAML files (e.g., Helm Chart.yaml with a dependencies: key) placed inside the stacks directory to be incorrectly parsed as Atmos stack manifests, resulting in errors like: "invalid dependencies section in file '...Chart'"

Fix: append atmosConfig.Stacks.ExcludedPaths to the validation exclusion list so users can exclude directories containing non-Atmos YAML files (e.g., **/argocd/**).

what

why

references

Summary by CodeRabbit

New Features
- Stack validation now respects user-configured path exclusions, allowing you to specify directories that should not be validated alongside default YAML file exclusions.

fix(toolchain): resolve aqua packages with binary subdir (e.g. openbao) @osterman (#2414)

what

Fix the aqua registry resolver so it can find packages whose YAML lives at pkgs/<owner>/<repo>/<binary>/registry.yaml — i.e. packages whose binary name differs from the repo name, like OpenBao (openbao/openbao/bao).
GetTool now consults the cached aqua-registry index first and fetches the per-package YAML using the full registry path it supplies, instead of relying on a hardcoded list of known 3-segment prefixes.
The aqua-registry base URL is now a single configurable field (registryBaseURL + WithRegistryBaseURL), used by both the index fetch and the new index-driven per-package fetch.
New tests cover the 3-segment path resolution (regression for #2383), continued 2-segment behavior, graceful fallback when the index is unreachable, and the path-index side effect of convertPackagesToTools.

why

Before this fix, atmos toolchain search openbao listed openbao/openbao but atmos toolchain info openbao/openbao and atmos toolchain install openbao/openbao both failed with tool not in registry. The resolver only probed pkgs/<owner>/<repo>/registry.yaml plus a small hardcoded set of pkg roots (hashicorp, helm, kubernetes/kubernetes, opentofu), so any package whose binary name differs from its repo name and isn't on that list was silently unreachable.
The aqua-registry index already publishes each package's full path (e.g. name: openbao/openbao/bao); we just weren't using it. Driving resolution from the index makes the entire class of binary-subdir packages installable without per-package allowlists.
Verified end-to-end against the live aqua-registry: from a directory with no inline registry config, atmos toolchain install openbao/openbao@v2.5.3 now downloads, installs, and runs (bao --version → OpenBao v2.5.3 …).

references

Closes #2383

Summary by CodeRabbit

New Features
- Custom Aqua registry base URL support.
- Short-name resolution that maps tool short names to canonical owner/repo and reports ambiguities.
- Default registry accessor for callers without configuration.
Improvements
- Index-driven package lookup with caching, aliases, and robust 2-/3-segment fallbacks (monorepo-aware).
- Installer rejects empty tool specs and uses registry short-name resolution.
- Downloads now retry transient failures with exponential backoff (404s not retried).
Tests
- Expanded tests for index resolution, short-name logic, download retries, fallbacks, and ambiguity scenarios.
CI
- OS-specific OpenTofu install steps in the test workflow.

fix(list): make `list components` output deterministic across runs @osterman (#2422)

what

atmos list components (and especially --enabled=false / --locked=true) now returns deterministic, consistent results across invocations on the same workspace.
Aggregate per-stack enabled/locked state into the deduplicated component view: any-disabled-wins for enabled, any-locked-wins for locked. status / status_text are recomputed from the aggregate.
Sort stack names, component names, and the output slice so iteration order no longer depends on Go's randomized map iteration.
Documentation for --enabled / --locked now describes the cross-stack aggregation semantics and points at atmos list instances for per-stack-instance state.
Adds regression tests in pkg/list/extract/: a 200-iteration determinism test for --enabled=false, an aggregation-policy test for enabled/locked, and an output-order stability test.

why

Reported in #2359: running atmos list components --enabled=false repeatedly produced different output every time — sometimes empty, sometimes 1 component, sometimes 2 or 3 different components — with no changes to the workspace.
Root cause: extractUniqueComponentType in pkg/list/extract/components.go extracted metadata only from the first stack iterated for each component, and UniqueComponents iterated stacksMap (a Go map) in randomized order. When the same component had enabled: false in some stacks and enabled: true in others, the recorded value depended on iteration order — so BoolFilter.Apply would include or exclude the component non-deterministically.
Sorting alone would have made the output stable but not necessarily correct for users with mixed-state components; the aggregation policy ensures a component disabled anywhere shows up under --enabled=false, which matches the user's expected behavior in the issue.

references

closes #2359

Summary by CodeRabbit

Bug Fixes
- Resolved non-deterministic output in list components command; results now consistent across invocations.
- Adjusted --enabled and --locked flags to aggregate state across stack instances: a component is shown as disabled/locked if any instance has that state.
Documentation
- Clarified list components flag behavior for cross-stack aggregation reporting.
Tests
- Added regression test coverage for deterministic component aggregation.

fix(auth): normalize --identity=false to disable authentication @osterman (#2412)

what

Normalize boolean-false values (false, 0, no, off, case-insensitive) passed via --identity=<value> and --identity <value> to the disabled sentinel (cfg.IdentityFlagDisabledValue), so the auth pre-hook and CreateAndAuthenticateManager* short-circuit instead of trying to authenticate with the literal name "false".
Patch is in internal/exec/cli_utils.go::parseIdentityFlag — the single arg-walker that feeds info.Identity / configAndStacksInfo.Identity for every ProcessCommandLineArgs consumer (terraform, helmfile, packer, list, describe, workflow, vendor, pro, validate, atlantis, docs, generate).
Add parser-level unit cases in TestParseIdentityFlag for =false / =False / =FALSE / =0 / =no / =off plus space-separated form, and end-to-end cases in TestProcessArgsAndFlags_IdentityFlag{,Helmfile,Packer} asserting info.Identity == cfg.IdentityFlagDisabledValue.

why

Regression: ATMOS_IDENTITY=false was fixed in #1935 by normalizing the env-var fallback at cli_utils.go:199, but the env fallback only runs when Identity == "". When --identity=false is passed on the CLI, parseIdentityFlag populated the literal "false", the env-fallback branch was skipped, and the literal flowed through to pkg/auth/hooks.go::isAuthenticationDisabled (which only matches __DISABLED__).
#2225 extracted parseIdentityFlag into a new helper without porting the normalization from the env path, silently breaking the documented --identity=false contract. Reported by users for atmos terraform * and atmos list instances.
Centralizing normalization at the parse site means every command sharing ProcessCommandLineArgs is fixed in one place, and matches the behavior already implemented for the StandardParser path (pkg/flags/global_registry.go) and the cmd/identity_helpers.go / cmd/list/utils.go / cmd/list/affected.go per-command identity reads.

references

Builds on #1900 / #1935 (env-var normalization) and corrects the regression introduced by the refactor in #2225.

Summary by CodeRabbit

Bug Fixes
- The --identity flag now recognizes common boolean-false values (false, 0, no, off, case-insensitive) to disable authentication.
New Features
- Passing --identity=false (or equivalent) disables per-component auth resolution and is honored across describe, list, and state-resolution flows.
- Describe and list commands now surface and propagate an "auth disabled" option so outputs respect disabled auth.
Tests
- Expanded tests for identity parsing and auth-disabled behavior across commands, env vars, and execution paths.

Fail fast on missing component identity @osterman (#2411)

what

Fail fast when per-component auth resolution fails for components that declare a default identity.
Stop eager auth-manager error printing so the top-level command emits one formatted error.
Add regression coverage and document the fix under docs/fixes.

why

Missing or invalid component identities are fatal because falling back to parent or ambient auth can use the wrong backend or account.
This prevents atmos list instances --upload from repeating the same identity error per component and continuing after bad auth.

references

See docs/fixes/2026-05-15-list-instances-auth-fail-fast.md.

Summary by CodeRabbit

Bug Fixes
- atmos list instances --upload now fails fast when per-component auth resolution fails for components with a default identity; returns a clear error including component and stack context and stops further processing.
- Eager error printing during auth initialization was removed and centralized to avoid repeated initialization messages.
Documentation
- Added docs describing the updated authentication failure behavior.
Tests
- Updated tests to verify resolver failures are treated as fatal and include component/stack context.