feat: MCP server integrations — connect Atmos to external MCP servers @aknysh (#2267)
## what- Connect Atmos to external MCP servers (AWS, GCP, Azure, custom) and use their tools in
atmos ai ask,atmos ai chat, andatmos ai exec - Add CLI MCP management commands:
atmos mcp list,tools,test,status,restart,start,export - Add smart server routing — automatically selects only the MCP servers relevant to the user's question using a lightweight routing call to the configured AI provider
- Add
--mcpflag on all AI commands for manual server selection (supports--mcp aws-iam,aws-billingand--mcp aws-iam --mcp aws-billing), env varATMOS_AI_MCP - Add
atmos mcp exportto emit.mcp.jsonfor Claude Code / Cursor / IDE integration - Add Atmos Auth integration —
identityfield on server config for automatic credential injection (references identities from theauthsection) - Add toolchain integration — resolves
uvx/npxfrom.tool-versionsbefore starting servers - Add
BridgedToolpattern to wrap external MCP tools as native Atmostools.Toolinterface - Add human-readable tool names in output (
aws-iam → list_rolesinstead ofaws-iam__list_roles) - Add tool execution display to
atmos ai askoutput via MarkdownFormatter - Add configurable
ai.max_tool_iterations(default 25, was hardcoded 10) to support complex multi-tool queries - Add complete example with 8 pre-configured AWS MCP servers at
examples/mcp/ - Add comprehensive documentation: MCP Configuration, MCP Commands, AI Landing Page
why
- Leverage the ecosystem — 100+ MCP servers exist for AWS, GCP, Azure, databases, monitoring, CI/CD. Instead of reimplementing cloud provider functionality, Atmos orchestrates existing MCP servers
- Parity with AI tools — Claude Code, Cursor, Windsurf all manage MCP servers. Atmos should too
- Speed — Installing an AWS MCP server takes seconds. Building equivalent functionality takes weeks
- Composability — Users can mix native Atmos tools (describe stacks, validate) with external tools (AWS billing, security, IAM) in the same AI conversation
references
- PRD:
docs/prd/atmos-mcp-integrations.md - Blog post:
website/blog/2026-03-29-mcp-server-integrations.mdx - Example:
examples/mcp/— complete working example with 8 AWS MCP servers - AWS MCP Servers — 20+ servers for the AWS ecosystem
- MCP Protocol Specification
See It in Action
All outputs below are from real AWS accounts. Account IDs, resource identifiers,
and internal names have been redacted. Cost figures represent an example of real-world spending.
List configured servers
$ atmos mcp list
NAME STATUS DESCRIPTION
─────────────────────────────────────────────────────────────────────────────────────────
aws-api stopped AWS API — direct AWS CLI access with security controls
aws-billing stopped AWS Billing — billing summaries and payment history
aws-cloudtrail stopped AWS CloudTrail — event history and API call auditing
aws-docs stopped AWS Documentation — search and fetch AWS docs
aws-iam stopped AWS IAM — role/policy analysis and access patterns
aws-knowledge stopped AWS Knowledge — managed AWS knowledge base (remote)
aws-pricing stopped AWS Pricing — real-time pricing and cost analysis
aws-security stopped AWS Security — Well-Architected security posture assessment
Explore tools from a server
$ atmos mcp tools aws-security
TOOL DESCRIPTION
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
CheckSecurityServices Verify if selected AWS security services are enabled in the specified region and account.
GetSecurityFindings Retrieve security findings from AWS security services.
GetStoredSecurityContext Retrieve security services data that was stored in context from a previous CheckSecurityServices call.
CheckStorageEncryption Check if AWS storage resources have encryption enabled.
ListServicesInRegion List all AWS services being used in a specific region.
CheckNetworkSecurity Check if AWS network resources are configured for secure data-in-transit.
Test server connectivity
$ atmos mcp test aws-docs
✓ Server started successfully
✓ Initialization handshake complete
✓ 4 tools available
✓ Server responds to ping
Ask AI — documentation search (smart routing selects aws-knowledge)
$ atmos ai ask "How do I configure S3 bucket lifecycle rules?"
ℹ MCP routing selected 1 of 8 servers: aws-knowledge
ℹ MCP server "aws-knowledge" started (6 tools)
ℹ Registered 6 tools from 1 MCP server(s)
ℹ AI tools initialized: 16
👽 Thinking...
Configuring S3 Bucket Lifecycle Rules
S3 lifecycle rules automate object management by transitioning objects between
storage classes, archiving, or expiring them...
## Tool Executions (1)
1. ✅ aws-knowledge → aws.search_documentation (2874ms)
Ask AI — billing analysis (smart routing selects aws-billing)
$ atmos ai ask "Show our billing summary for the past 2 months"
ℹ MCP routing selected 1 of 8 servers: aws-billing
ℹ MCP server "aws-billing" started (25 tools)
ℹ Registered 25 tools from 1 MCP server(s)
ℹ AI tools initialized: 35
👽 Thinking...
## 📊 AWS Billing Summary — February & March 2026
Service │ Feb 2026 │ Mar 2026 │ Change
──────────────────────────────────────────┼──────────┼───────────┼──────────
Amazon Virtual Private Cloud │ $309.53 │ $261.17 │ ▼ $48.36
EC2 - Other │ $88.12 │ $123.39 │ ▲ $35.27
...
## Tool Executions (1)
1. ✅ aws-billing → cost-explorer (381ms)
Ask AI — security posture across all regions (smart routing selects aws-api + aws-security)
$ atmos ai ask "Is GuardDuty enabled in all regions?"
ℹ MCP routing selected 2 of 8 servers: aws-api, aws-security
ℹ MCP server "aws-api" started (2 tools)
ℹ MCP server "aws-security" started (6 tools)
ℹ Registered 8 tools from 2 MCP server(s)
ℹ AI tools initialized: 18
👽 Thinking...
## ❌ GuardDuty is NOT Enabled in All Regions
GuardDuty is only enabled in 1 out of 34 regions checked.
### ✅ Enabled (1 region)
Region │ Detector ID
───────────┼────────────────────────────────────
us-east-2 │ <detector-id-redacted>
### ❌ Not Enabled (33 regions)
Region │ Region │ Region
────────────────┼────────────────┼────────────────
ap-south-1 │ ap-south-2 │ ap-southeast-1
eu-central-1 │ eu-west-1 │ us-east-1
us-west-2 │ ... │
### 🔒 Recommendations
1. Enable GuardDuty in all active regions
2. Use delegated administrator via AWS Organizations
3. Prioritize us-east-1, us-west-2, eu-west-1 immediately
## Tool Executions (4)
1. ✅ aws-api → call_aws (400ms)
2. ✅ aws-api → call_aws (14ms)
3. ✅ aws-api → call_aws (7ms)
4. ✅ aws-api → call_aws (9450ms)
Ask AI — IAM audit (smart routing selects aws-iam)
$ atmos ai ask "List all IAM roles with admin access"
ℹ MCP routing selected 1 of 8 servers: aws-iam
ℹ MCP server "aws-iam" started (29 tools)
ℹ Registered 29 tools from 1 MCP server(s)
ℹ AI tools initialized: 39
👽 Thinking...
## 🔐 IAM Roles with Admin Access
### 1. ✅ Direct AdministratorAccess Policy (4 attachments)
Role Name │ Description │ Trust Principal
──────────────────────────────────────────────────┼────────────────────────────────────────────────┼───────────────────────────
AWSReservedSSO_AdministratorAccess_... │ Allow Full Administrator access to the account │ AWS SSO (SAML Federation)
AWSReservedSSO_RootAccess_... │ Centralized root access to member accounts │ AWS SSO (SAML Federation)
AWSReservedSSO_TerraformApplyAccess_... │ Full Terraform state and account access │ AWS SSO (SAML Federation)
AWSReservedSSO_TerraformApplyAccess-Core_... │ Full Terraform access (core backend) │ AWS SSO (SAML Federation)
### 🛡️ Security Recommendations
1. Review SSO assignments for AdministratorAccess and RootAccess roles.
2. Audit TerraformApplyAccess roles — ensure MFA/session policies are enforced.
3. Monitor tfstate roles — cross-account trust across 14 accounts.
4. Enable CloudTrail for AssumeRole calls on high-privilege roles.
## Tool Executions (2)
1. ✅ aws-iam → list_roles (314ms)
2. ✅ aws-iam → list_policies (174ms)
Check status of all servers
$ atmos mcp status
NAME STATUS TOOLS DESCRIPTION
─────────────────────────────────────────────────────────────────────────────────────────
aws-api running 2 AWS API — direct AWS CLI access with security controls
aws-billing running 25 AWS Billing — billing summaries and payment history
aws-cloudtrail running 5 AWS CloudTrail — event history and API call auditing
aws-docs running 4 AWS Documentation — search and fetch AWS docs
aws-iam running 29 AWS IAM — role/policy analysis and access patterns
aws-knowledge running 6 AWS Knowledge — managed AWS knowledge base (remote)
aws-pricing running 9 AWS Pricing — real-time pricing and cost analysis
aws-security running 6 AWS Security — Well-Architected security posture assessment
Summary by CodeRabbit
-
New Features
- External MCP server support with new mcp commands: list, tools, test, status, restart, export
- New --mcp / ATMOS_AI_MCP flag for ai ask/chat/exec to select servers (overrides routing)
- Smart MCP routing to choose relevant servers per prompt
- Human-friendly tool names in AI responses
- Configurable AI request timeouts and max tool‑iteration limits
-
Documentation
- Extensive MCP and AI integration docs, examples, and an AWS MCP example
🚀 Enhancements
fix: add process-level credential cache @AleksandrMatveev (#2272)
## what- Added a process-level in-memory credential cache (
sync.Map) toauthenticateChain()that stores successfully authenticated credentials keyed by realm + chain identity - When a subsequent authentication request matches the same realm and chain within the same CLI invocation, the cached credentials are returned (after expiration validation) without making additional API calls
- The previous fix (dbcba35) that skips cached target identity credentials in keyring/file storage remains intact - this cache layer sits above it
why
- The previous fix correctly prevented stale cached credentials from being returned by always forcing re-authentication of the target identity (e.g., AssumeRole). However, during
atmos describe affected, each!terraform.stateYAML function resolution creates a newAuthManagerviaresolveAuthManagerForNestedComponent, and each one triggers a full AssumeRole API call - This caused
atmos describe affectedto degrade from ~2 minutes to ~17 minutes due to N redundant STS AssumeRole calls for N components sharing the same auth chain - The in-memory cache is inherently safe: unlike keyring/file caches that persist across processes and may contain stale data from different auth mechanisms (e.g., pod credentials vs. role credentials), process-scoped credentials were authenticated during the current invocation and are guaranteed correct
- Cached entries are validated against the existing expiration buffer (
minCredentialValidityBuffer= 15m) before reuse, and the cache resets naturally when the process exits
references
Summary by CodeRabbit
-
New Features
- Process-level in-memory credential cache to speed repeated authentications and share valid credentials across instances.
- Automatic detection and removal of expired/invalid cached credentials with transparent re-authentication.
- Ability to reset the process credential cache.
-
Tests
- Tests verifying cache hits, isolation by realm/chain, expiration handling, and cache reset behavior.
fix: use allowlist DTO for instance uploads to prevent sensitive data leakage @milldr (#2269)
## What - Introduce `dtos.UploadInstance` as an allowlist struct with only the fields Atmos Pro needs - Convert `schema.Instance` → `UploadInstance` at the upload boundary in `list_instances.go` - Sanitize `settings.pro` to handle `map[interface{}]interface{}` from YAML for JSON compatibilityFields included in upload (allowlist)
component— instance identificationstack— instance identificationcomponent_type— terraform or helmsettings.pro— drift detection configuration (enabled flag, detect/remediate workflows)
Fields excluded (never serialized)
vars— can contain secretsenv— can contain secrets and credentialsbackend— contains role ARNs and bucket namessource— not used by Atmos Prometadata— not used by Atmos Pro- All
settingskeys exceptpro— not used by Atmos Pro
Why
- Previously,
InstancesUploadRequestused[]schema.Instancewhich included all sections from the stack config envandvarscan contain secrets that should never leave the CI environment- Nested YAML maps in the excluded fields produce
map[interface{}]interface{}types thatencoding/jsoncannot marshal, causingatmos list instances --uploadto fail withjson: unsupported type: map[interface {}]interface {} - Using an allowlist DTO ensures new fields added to
schema.Instanceare never inadvertently uploaded
Summary by CodeRabbit
Release Notes
- Refactor
- Optimized instance upload payloads by streamlining the data structure to include only essential configuration fields (component, stack, component type, and settings).
- Enhanced upload efficiency by automatically filtering and extracting pro-specific settings before transmission.
refactor(auth): move ECR login from auth to aws namespace (ATMOS-37) @Benbentwo (#2144)
## what- Move
atmos auth ecr-logincommand toatmos aws ecr loginunder the AWS namespace - Create new
cmd/aws/ecr/package with parent ECR command and login subcommand - Move ECR login tests to new package structure (16 tests, all passing)
- Relocate documentation from
auth/toaws/command directory - Update all cross-references in tutorials, blog posts, and internal design docs
why
The auth namespace must remain provider-agnostic per CLI design principles. AWS-specific commands like ECR login belong under the atmos aws namespace hierarchy, following the established pattern with atmos aws eks update-kubeconfig. This ensures the auth namespace is not polluted by provider- or service-specific commands and maintains a clean separation between generic auth operations and cloud-specific integrations.
references
Closes #ATMOS-37
Acceptance criteria from ATMOS-37:
- ✅ No AWS- or ECR-specific commands exist directly under
atmos auth - ✅ Command structure aligns with interface-based design
- ✅ CLI help and docs reflect the updated command hierarchy
Summary by CodeRabbit
-
New Features
- Moved ECR login into the AWS namespace as atmos aws ecr login, added top-level ecr subcommand, introduced a command-local --identity flag, and retained multi-registry --registry support.
-
Tests
- Removed legacy test suite and added a comprehensive test suite validating the new command wiring, flags, argument flows, and auth-manager behavior.
-
Documentation
- Updated CLI docs, tutorials, PRD, and blog examples to reference atmos aws ecr login.
-
Chores
- Added new sentinel errors for clearer ECR login failure and identity-selection reporting.