Improve Atmos Algolia search ranking @osterman (#2406)
what
- Add a repo-managed Algolia crawler config for
atmos.toolswith URL taxonomy, canonical URL handling, definition-term weighting, and manualpageRankranking. - Add Algolia deploy, dry-run, crawler unit test, and opt-in live relevance test scripts under
website/algolia. - Add the
AlgoliaGitHub Actions workflow so PRs validate without secrets and only merges tomaindeploy crawler config/settings through thealgoliaenvironment. - Move production crawler reindexing into a separate
algoliaenvironment job after website deploy, remove the deprecated preview scraper reindex path, and refresh Algolia indexing docs.
why
- Improve
atmos authsearch results so command/user-intent docs outrank configuration reference definitions. - Keep Algolia credentials and crawler triggers scoped to the
algoliaGitHub environment instead of the production website deployment environment. - Prevent crawler config uploads from pull requests while still validating the payload before merge.
- Delete stale non-crawler indexing behavior and document the new crawler-based deployment split.
references
- https://docsearch.algolia.com/docs/v3/required-configuration
- https://www.algolia.com/doc/tools/crawler/apis/configuration/initial-index-settings/
- https://www.algolia.com/doc/rest-api/crawler/patch-config
Summary by CodeRabbit
-
New Features
- CI workflows to validate Algolia crawler on PRs and deploy/reindex from main (including manual dispatch and post-deploy trigger).
-
Documentation
- Rewrote Algolia indexing guide; added crawler README, commands, troubleshooting, and env-secret guidance.
-
New Tools
- Added crawler config/dashboard, deployment utility, and index management scripts.
-
Tests
- Expanded crawler unit tests and added optional gated live relevance integration tests.
-
Chores
- Updated site deploy flow and removed legacy preview reindex path.
feat(claude): add pull-request skill for labeling + changelog + roadmap workflow @osterman (#2424)
what
Adds a new pull-request skill at .claude/skills/pull-request/SKILL.md that future agents (and humans) invoke via /pull-request before opening or updating any PR. The skill encodes the three policies we keep getting burned by:
- Every PR needs a semver label. Unlabeled PRs fail the
PR Semver LabelsCI check. minor/majorPRs require a blog post AND a roadmap update. TheCheck for changelog and roadmap updatesworkflow gates merging on both.featured[]in the roadmap is curated, max 6 items. Never auto-promote a shipped milestone — only the user decides.
what the skill covers
- Label decision tree —
no-release/patch/minor/major, with the explicit "would a user see ANY change?" framing to default plumbing PRs tono-releaseinstead of inflating them tominor. - Common-mistake call-outs — labeling foundation PRs as
minorbecause they're part of a larger feature; labeling default flips aspatchbecause "it's just a default." - Apply-the-label-first rule —
gh pr edit <num> --add-label <label>immediately after opening, not after CI complains. - Blog post rules — when to write one, file path conventions, MDX template, tags from
tags.yml, authors fromauthors.yml. - Roadmap rules — delegate to the existing
roadmapagent, do NOT touchfeatured[], milestone schema withpr+changelogfields. - End-to-end checklist — 8 items to run through before pushing.
- Updating-an-existing-PR path — for the common case where a PR is opened without a label.
why now
I just opened a stack of 6 foundation PRs (#2417–#2423) and shipped them all unlabeled. CI failed on all of them. Backfilling labels worked but the underlying problem — there was nothing telling me (or any future agent) how to decide the label up front — is what this skill fixes.
The skill is also written to be self-contained so an agent invoked cold (no prior context) can still apply it correctly. CLAUDE.md now points to the skill from the existing Pull Requests section.
references
- CI workflow:
.github/workflows/changelog-check.yml - CI workflow:
.github/workflows/feature-release.yml - Existing roadmap agent:
.claude/agents/roadmap.md(this skill delegates to it for roadmap edits) - Tags:
website/blog/tags.yml - Authors:
website/blog/authors.yml
Summary by CodeRabbit
- Documentation
- New PR workflow requiring exactly one semver label (no-release/patch/minor/major) chosen via a decision flow and applied atomically; mandatory use of the pull-request helper before opening/updating PRs.
- CI-enforced rules for when changelog blog posts and roadmap updates are required, validated blog MDX/front-matter, allowed tags/authors, and exceptions for internal refactors.
- Mandatory GPG/SSH-signed commits with verification/re-sign guidance, ordered pre/post-push checklists, guidance for relabeling existing PRs, and a CLI body-file workaround for PR creation.
feat(list): add --skip flag; fix --stack/--filter/--query on list instances @osterman (#2413)
what
--skipacross everyatmos listsubcommand.instances,components,metadata,sources, andstacksnow accept--skip <yaml-function>(repeatable, e.g.--skip terraform.state --skip terraform.output). Mirrors the surface already exposed bydescribe affected | component | stacks. Bound toATMOS_SKIP; the existingATMOS_AFFECTED_SKIPcontinues to work onlist affectedas a back-compat alias. Threads throughExecuteDescribeStacks— which already acceptedskipbut was being passednilat every list callsite.--stack,--filter, and--queryonatmos list instancesnow work. Three documented flags were previously silent:--stackwas ignored (every instance returned),--filterwas a TODO stub, and--querywas read into options and dropped.--stacknow usespath.Matchglob semantics,--filterevaluates a YQ predicate per row, and--queryprojects each row via YQ (scalars land in avaluecolumn, maps flatten to row keys). Closes a latent ENV-precedence gap soATMOS_LIST_FORMATandATMOS_UPLOADare honored via viper.- Tests. Parser, options, and propagation tests for
--skip(with a regression test for the literallist instances --upload --skip terraform.statefailure). Unit + integration tests for the stack/filter/query work. Newpkg/list/filter/yq.go(YQPredicateFilter,YQProjector,isTruthy) with full coverage. - Docs + release artifacts.
--skipdocumented on all five list pages; two release-blog entries; one roadmap milestone under the Discoverability initiative.
why
- The concrete failure on
--skip:atmos list instances --upload --skip terraform.stateerrored withunknown flag.--process-functions=falseis not a substitute because it also disables!template, which Atmos Pro uploads need sosettings.pro.enabledevaluates to a real boolean instead of a literal string. - The concrete failure on
--stack/--filter/--query: the docs promised filtering onlist instancesand the implementation didn't honor it. Users hit silent wrong-result behavior, not an error. - Both features ship through the same set of
listfiles; bundling avoids merge churn and keeps the test+docs surface coherent.
references
- Pattern source for
--skiprollout: #2363 (--process-templates/--process-functionsrollout acrosslist) - Origin of
--skip(describe family): #1006
Summary by CodeRabbit
-
New Features
- Repeatable --skip flag (ATMOS_SKIP; legacy ATMOS_AFFECTED_SKIP preserved) added across list commands; list instances adds --stack glob filtering.
-
Enhancements
- YQ-based --filter (truthy predicates) and --query (projections); format-aware validation for tree/matrix; improved config precedence for list flags.
-
Bug Fixes
- --stack / --filter / --query now honor documented behavior.
-
Tests
- Expanded coverage for skip flag, YQ filters/projectors, and stack-glob filtering.
-
Documentation
- Updated CLI docs, blog posts, and roadmap.
🚀 Enhancements
feat(mcp): example + blog for MCP-for-AI-coding-assistants, plus fixes from review @aknysh (#2425)
what
New example and blog post — "MCP for AI Coding Assistants"
- New example at
examples/mcp-for-ai-coding-assistants/showing how to use Atmos as the configuration / auth / toolchain layer for MCP servers consumed by Claude Code, OpenAI Codex CLI, and Google Gemini CLI.- One
atmos.yamldefines 9 MCP servers — the embeddedatmosMCP server + the full AWS suite (aws-docs,aws-knowledge,aws-pricing,aws-billing,aws-iam,aws-cloudtrail,aws-security,aws-api) — plus the HTTP-transport Atmos Pro MCP server registered alongside.
- One
- New blog post at
website/blog/2026-05-16-mcp-for-ai-coding-assistants.mdxwalking through the problem (three config formats, three auth flows, three toolchain stories), the solution (oneatmos.yaml), and per-CLI wiring instructions for all three CLIs.
MCP implementation review — punch list fixes
Detailed in docs/fixes/2026-05-15-mcp-review-fixes.md. Nine issues found while reviewing `pkg/mcp/` and `cmd/mcp/`:
- P1 — `atmos mcp export` now injects the toolchain PATH. Removed the duplicate `mcpJSONConfig`/`mcpJSONServer`/`buildMCPJSONEntry` types from `cmd/mcp/client/export.go` and delegated to the shared `mcpclient.GenerateMCPConfig` (which already threads `toolchainPATH` into each server's `env.PATH`). Symptom this fixes: IDE-spawned subprocesses couldn't find toolchain-managed `uvx` / `npx`.
- P1 — MCP server registers the full Atmos AI tool surface. Replaced the hand-rolled 7-tool list in `cmd/mcp/server/start.go::initializeAIComponents` with a single call to the canonical `atmosTools.RegisterTools` factory. Picks up `describe_affected` (which the docs advertised but the registry was missing), `search_files`, `execute_atmos_command`, and the rest.
- P2 — `atmos mcp restart` help text clarifies the validation semantics. Stop+start+stop is intentional for stdio servers, but the old `Short` ("Restart an MCP server") misled users.
- P2 — Concurrent-safe temp file in `WriteMCPConfigToTempFile`. Switched from a fixed path to `os.CreateTemp(..., "atmos-mcp-config-*.json")`. Two concurrent `atmos ai ask` invocations no longer race.
- P3 — Defensive copy in `Session.Tools()`. Returns `append([]*mcpsdk.Tool(nil), s.tools...)` so callers can mutate the slice without affecting the cache.
- P3 — Hardened `firstSentence` helper in `cmd/mcp/client/tools.go`. Recognizes `! ` and `? ` (not just `. `), picks the earliest terminator, applies a 80-rune ceiling with ellipsis for descriptions without terminators.
- P3 — `atmos mcp tools` migrated to the renderer pipeline. Now exposes `--format` / `--columns` / `--sort` / `--delimiter` with the same env-var fallbacks as `mcp list`. `--format=json` works.
- P4 — Dropped unused `ScopedAuthProvider.baseConfig`. Was carrying "for future extensibility" comments while doing nothing.
- P4 — `atmos mcp test` no longer double-prints errors. `RunE` returns `nil` so `main.go`'s `errUtils.Format` pipeline is skipped; `printTestResult`'s ✓/✗ markers are the single source of stderr output.
Also incidentally regenerated `pkg/telemetry/mock/mock_posthog_client.go` because `posthog-go` added `EvaluateFlags` and `golangci-lint` was blocking on it — unrelated to MCP, but had to be addressed to land the rest.
Test coverage
Per-issue regression tests landed alongside each fix. After the review-fix sweep we also added a focused coverage push for the lowest-coverage MCP paths:
- `cmd/mcp/client`: 49.3% → 54.6%
- `pkg/mcp`: 86.8% → 91.2%
Highlights:
- `cmd/mcp/client/export.go::executeMCPExport`: 0% → 84.2% (4 new tests covering no-servers early return, happy path with file-mode 0600, existing-file overwrite tightening, and the WriteFile error wrap).
- `pkg/mcp/server.go::Server.Run`: 0% → 100% (driven via the MCP SDK's `NewInMemoryTransports()` so the test doesn't need stdio/HTTP).
- `pkg/mcp/client/mcpconfig.go::WriteMCPConfigToTempFile`: 52% → 57% (picked up the `os.CreateTemp` failure branch + the `TMPDIR` contract).
why
Example + blog
Anyone using Claude Code, OpenAI Codex CLI, or Google Gemini CLI with the awslabs/mcp suite hits the same friction: three config formats (`.mcp.json` vs `config.toml` vs `settings.json`), three credential stories (`aws configure`, `AWS_PROFILE` juggling, role-ARN copy-paste), and three toolchain stories (where `uvx` lives, which Python it picks up). Each is solvable individually; doing all three for three CLIs is annoying.
Atmos already has the primitives — `atmos.yaml` for centralized config, Atmos Auth for SSO/role assumption with per-server identity routing, the Atmos toolchain for binary pinning, and the embedded `atmos` MCP server for project introspection. `atmos mcp export` ties them together into a single `.mcp.json` every CLI can consume.
The example makes that workflow concrete; the blog post tells the story to drive adoption.
Fixes + tests
The MCP implementation shipped 6 phases of work (per the PRD at `docs/prd/atmos-mcp-integrations.md`) and the architecture is sound, but a review surfaced 9 quality / correctness issues — most notably the `atmos mcp export` path silently dropping toolchain PATH (P1) and the MCP server registering a curated subset of AI tools that drifted from the docs (P1). These are the kinds of issues that quietly degrade the user experience without triggering loud failures, so fixing them with regression tests prevents future drift.
The coverage boost specifically targets the paths the fix doc touched, so the regression tests aren't just present but exercised by the coverage gate.
references
- Example:
examples/mcp-for-ai-coding-assistants/ - Blog:
website/blog/2026-05-16-mcp-for-ai-coding-assistants.mdx - Fix doc:
docs/fixes/2026-05-15-mcp-review-fixes.md - PRD (already shipped, this PR doesn't change it):
docs/prd/atmos-mcp-integrations.md - Related: Atmos Agent Skills announcement — the Skills work that pairs with this MCP example
- Related: Atmos Pro MCP server install
Summary by CodeRabbit
-
New Features
- Standardized mcp tools listing with format/columns/sort/delimiter and truncated descriptions.
- Example project, README and blog post for wiring MCP servers to AI coding assistants.
- mcp export now delegates to the shared generator for consistent exports and identity wrapping.
-
Bug Fixes / Improvements
- Exports use unique temp files with 0600 permissions to avoid races; restart help clarifies stop+start semantics; Session.Tools returns a defensive copy.
-
Tests
- Expanded test coverage across export, tools rendering, restart, temp-file handling, auth, server/session regressions.
-
Documentation
- Added review-fixes doc, examples, README, and blog post.
fix(ci): fire CI hooks per-component in --all / --query plan mode (#2… @thejrose1984 (#2430)
what
- Fixes
atmos terraform plan --all(and--query,--components, stack-without-component) producing only a single CI summary entry for the last component instead of one entry per component - Adds an optional
PerComponentHookcallback toConfigAndStacksInfothat fires after each component executes in multi-component mode, with that component's captured stdout+stderr and exit code - Wires
runCIHooksForPlanComponentas the per-component hook for theplansubcommand so$GITHUB_STEP_SUMMARYreceives one entry per component with the correct component/stack context - Adds a
wasMultiComponentExecutionsentinel so the globalPostRunECI hook call is suppressed when per-component hooks have already fired, preventing double-firing
why
- In multi-component mode,
terraformRunWithOptionsroutes toExecuteTerraformQueryand discards theshellOptscapture buffers passed byplan.go RunE, socapturedPlanOutputwas always empty PostRunEfired once after all components completed, callingRunCIHookswith an empty output buffer and the last component'sinfo.Component/info.Stack, misattributing the summary- For stacks with 5 components, only 1 summary entry appeared (or none) instead of 5 — silently hiding the plan result for every component except the last
references
Closes #2397
Summary by CodeRabbit
-
New Features
- Per-component CI hook execution for terraform plan in multi-component runs, with captured, cleaned plan output passed to each hook.
-
Bug Fixes
- Ensure CI plan output capture and hook invocation behave consistently across early exits and multi-component executions to avoid duplicate/incorrect hooks.
-
Tests
- Added tests validating per-component hook behavior, output handling, and multi-component execution paths.
fix(validate): respect stacks.excluded_paths during validation @kapats (#2389)
The ValidateStacks function scans all YAML files in the stacks directory using the pattern `**/*`, but only excluded template files (.tmpl). It did not respect the user-configured `stacks.excluded_paths` from atmos.yaml.This caused non-Atmos YAML files (e.g., Helm Chart.yaml with a dependencies: key) placed inside the stacks directory to be incorrectly parsed as Atmos stack manifests, resulting in errors like: "invalid dependencies section in file '...Chart'"
Fix: append atmosConfig.Stacks.ExcludedPaths to the validation exclusion list so users can exclude directories containing non-Atmos YAML files (e.g., **/argocd/**).
what
why
references
Summary by CodeRabbit
- New Features
- Stack validation now respects user-configured path exclusions, allowing you to specify directories that should not be validated alongside default YAML file exclusions.
fix(toolchain): resolve aqua packages with binary subdir (e.g. openbao) @osterman (#2414)
what
- Fix the aqua registry resolver so it can find packages whose YAML lives at
pkgs/<owner>/<repo>/<binary>/registry.yaml— i.e. packages whose binary name differs from the repo name, like OpenBao (openbao/openbao/bao). GetToolnow consults the cached aqua-registry index first and fetches the per-package YAML using the full registry path it supplies, instead of relying on a hardcoded list of known 3-segment prefixes.- The aqua-registry base URL is now a single configurable field (
registryBaseURL+WithRegistryBaseURL), used by both the index fetch and the new index-driven per-package fetch. - New tests cover the 3-segment path resolution (regression for #2383), continued 2-segment behavior, graceful fallback when the index is unreachable, and the path-index side effect of
convertPackagesToTools.
why
- Before this fix,
atmos toolchain search openbaolistedopenbao/openbaobutatmos toolchain info openbao/openbaoandatmos toolchain install openbao/openbaoboth failed withtool not in registry. The resolver only probedpkgs/<owner>/<repo>/registry.yamlplus a small hardcoded set of pkg roots (hashicorp,helm,kubernetes/kubernetes,opentofu), so any package whose binary name differs from its repo name and isn't on that list was silently unreachable. - The aqua-registry index already publishes each package's full path (e.g.
name: openbao/openbao/bao); we just weren't using it. Driving resolution from the index makes the entire class of binary-subdir packages installable without per-package allowlists. - Verified end-to-end against the live aqua-registry: from a directory with no inline registry config,
atmos toolchain install openbao/openbao@v2.5.3now downloads, installs, and runs (bao --version→OpenBao v2.5.3 …).
references
Closes #2383
Summary by CodeRabbit
-
New Features
- Custom Aqua registry base URL support.
- Short-name resolution that maps tool short names to canonical owner/repo and reports ambiguities.
- Default registry accessor for callers without configuration.
-
Improvements
- Index-driven package lookup with caching, aliases, and robust 2-/3-segment fallbacks (monorepo-aware).
- Installer rejects empty tool specs and uses registry short-name resolution.
- Downloads now retry transient failures with exponential backoff (404s not retried).
-
Tests
- Expanded tests for index resolution, short-name logic, download retries, fallbacks, and ambiguity scenarios.
-
CI
- OS-specific OpenTofu install steps in the test workflow.
fix(list): make `list components` output deterministic across runs @osterman (#2422)
what
atmos list components(and especially--enabled=false/--locked=true) now returns deterministic, consistent results across invocations on the same workspace.- Aggregate per-stack
enabled/lockedstate into the deduplicated component view: any-disabled-wins forenabled, any-locked-wins forlocked.status/status_textare recomputed from the aggregate. - Sort stack names, component names, and the output slice so iteration order no longer depends on Go's randomized map iteration.
- Documentation for
--enabled/--lockednow describes the cross-stack aggregation semantics and points atatmos list instancesfor per-stack-instance state. - Adds regression tests in
pkg/list/extract/: a 200-iteration determinism test for--enabled=false, an aggregation-policy test forenabled/locked, and an output-order stability test.
why
- Reported in #2359: running
atmos list components --enabled=falserepeatedly produced different output every time — sometimes empty, sometimes 1 component, sometimes 2 or 3 different components — with no changes to the workspace. - Root cause:
extractUniqueComponentTypeinpkg/list/extract/components.goextracted metadata only from the first stack iterated for each component, andUniqueComponentsiteratedstacksMap(a Go map) in randomized order. When the same component hadenabled: falsein some stacks andenabled: truein others, the recorded value depended on iteration order — soBoolFilter.Applywould include or exclude the component non-deterministically. - Sorting alone would have made the output stable but not necessarily correct for users with mixed-state components; the aggregation policy ensures a component disabled anywhere shows up under
--enabled=false, which matches the user's expected behavior in the issue.
references
- closes #2359
Summary by CodeRabbit
-
Bug Fixes
- Resolved non-deterministic output in
list componentscommand; results now consistent across invocations. - Adjusted
--enabledand--lockedflags to aggregate state across stack instances: a component is shown as disabled/locked if any instance has that state.
- Resolved non-deterministic output in
-
Documentation
- Clarified
list componentsflag behavior for cross-stack aggregation reporting.
- Clarified
-
Tests
- Added regression test coverage for deterministic component aggregation.
fix(auth): normalize --identity=false to disable authentication @osterman (#2412)
what
- Normalize boolean-false values (
false,0,no,off, case-insensitive) passed via--identity=<value>and--identity <value>to the disabled sentinel (cfg.IdentityFlagDisabledValue), so the auth pre-hook andCreateAndAuthenticateManager*short-circuit instead of trying to authenticate with the literal name"false". - Patch is in
internal/exec/cli_utils.go::parseIdentityFlag— the single arg-walker that feedsinfo.Identity/configAndStacksInfo.Identityfor everyProcessCommandLineArgsconsumer (terraform, helmfile, packer, list, describe, workflow, vendor, pro, validate, atlantis, docs, generate). - Add parser-level unit cases in
TestParseIdentityFlagfor=false/=False/=FALSE/=0/=no/=offplus space-separated form, and end-to-end cases inTestProcessArgsAndFlags_IdentityFlag{,Helmfile,Packer}assertinginfo.Identity == cfg.IdentityFlagDisabledValue.
why
- Regression:
ATMOS_IDENTITY=falsewas fixed in #1935 by normalizing the env-var fallback atcli_utils.go:199, but the env fallback only runs whenIdentity == "". When--identity=falseis passed on the CLI,parseIdentityFlagpopulated the literal"false", the env-fallback branch was skipped, and the literal flowed through topkg/auth/hooks.go::isAuthenticationDisabled(which only matches__DISABLED__). - #2225 extracted
parseIdentityFlaginto a new helper without porting the normalization from the env path, silently breaking the documented--identity=falsecontract. Reported by users foratmos terraform *andatmos list instances. - Centralizing normalization at the parse site means every command sharing
ProcessCommandLineArgsis fixed in one place, and matches the behavior already implemented for the StandardParser path (pkg/flags/global_registry.go) and thecmd/identity_helpers.go/cmd/list/utils.go/cmd/list/affected.goper-command identity reads.
references
- Builds on #1900 / #1935 (env-var normalization) and corrects the regression introduced by the refactor in #2225.
Summary by CodeRabbit
-
Bug Fixes
- The --identity flag now recognizes common boolean-false values (false, 0, no, off, case-insensitive) to disable authentication.
-
New Features
- Passing --identity=false (or equivalent) disables per-component auth resolution and is honored across describe, list, and state-resolution flows.
- Describe and list commands now surface and propagate an "auth disabled" option so outputs respect disabled auth.
-
Tests
- Expanded tests for identity parsing and auth-disabled behavior across commands, env vars, and execution paths.
Fail fast on missing component identity @osterman (#2411)
what
- Fail fast when per-component auth resolution fails for components that declare a default identity.
- Stop eager auth-manager error printing so the top-level command emits one formatted error.
- Add regression coverage and document the fix under docs/fixes.
why
- Missing or invalid component identities are fatal because falling back to parent or ambient auth can use the wrong backend or account.
- This prevents
atmos list instances --uploadfrom repeating the same identity error per component and continuing after bad auth.
references
- See
docs/fixes/2026-05-15-list-instances-auth-fail-fast.md.
Summary by CodeRabbit
-
Bug Fixes
atmos list instances --uploadnow fails fast when per-component auth resolution fails for components with a default identity; returns a clear error including component and stack context and stops further processing.- Eager error printing during auth initialization was removed and centralized to avoid repeated initialization messages.
-
Documentation
- Added docs describing the updated authentication failure behavior.
-
Tests
- Updated tests to verify resolver failures are treated as fatal and include component/stack context.