cloudposse/atmos v1.191.0 on GitHub

feat: enhance error messages in atmos validate stacks with file context @osterman (#1494)

## Summary This PR addresses the issue where `atmos validate stacks` shows cryptic merge errors without any indication of which file contains the problem. Users were seeing duplicate, unformatted error messages with no context about where the configuration conflicts occurred.

Problem

When users encountered merge errors (like type mismatches between arrays and strings), they would see:

Duplicate error messages - The same error printed multiple times to stderr
No file context - No indication of which file or import chain caused the issue
Difficult debugging - Impossible to identify the source of conflicts in complex configurations

Example of the problematic output:

cannot override two slices with different type ([]interface {}, string)
cannot override two slices with different type ([]interface {}, string)
cannot override two slices with different type ([]interface {}, string)
...

Solution

This PR implements two key improvements:

1. Fixed Duplicate Error Printing

Root Cause: merge.go was printing errors directly to stderr using theme.Colors.Error.Fprintln() before returning them
Fix: Removed all direct printing statements (3 locations)
Result: Errors now flow through proper logging channels and appear once

2. Added MergeContext for Enhanced Error Messages

File Tracking: Shows exactly which file is being processed when an error occurs
Import Chain: Displays the complete chain of imports leading to the error
Helpful Hints: Provides specific guidance for common merge errors
Debug Logging: Logs merge operations and failures at Debug level with full context

Example Output

Before:

cannot override two slices with different type ([]interface {}, string)
cannot override two slices with different type ([]interface {}, string)

After:

Error: cannot override two slices with different type ([]interface {}, string)

  File being processed: stacks/deploy/prod/us-east-1.yaml
  Import chain:
    → stacks/catalog/base.yaml
    → stacks/mixins/region/us-east-1.yaml
    → stacks/deploy/prod/us-east-1.yaml

  Likely cause: A key is defined as an array in one file and as a string in another.
  Debug hint: Check the files above for keys that have different types.
  Common issues:
    - vars defined as both array and string
    - settings with inconsistent types across imports
    - overrides attempting to change field types

Changes Made

Created MergeContext struct to track file paths and import chains
Added MergeWithContext and ProcessYAMLConfigFileWithContext functions
Removed direct stderr printing from merge.go
Added debug logging for merge operations
Comprehensive test coverage for all changes
Full backward compatibility maintained

Testing

Unit tests for MergeContext functionality
Tests verifying no duplicate error printing
Integration tests for validate command
Test fixtures demonstrating type mismatch scenarios
All existing tests continue to pass

Benefits

Clear Error Location: Users immediately know which file contains the problem
Import Chain Visibility: Shows the complete path of imports leading to the error
No Duplicates: Errors appear once with proper formatting
Actionable Hints: Specific guidance on resolving type mismatches
Debug Support: Debug logging provides additional context when needed
Non-Breaking: Fully backward compatible with existing code

Test Plan

Run unit tests: go test ./pkg/merge -v
Run integration tests: go test ./internal/exec -run TestValidateStacks
Manual testing with type mismatch scenarios
Verify no stderr printing occurs
Verify debug logging works correctly
Ensure backward compatibility

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Context-rich error messages during validation and merges showing the file being processed, import chain, and actionable hints.
Bug Fixes
- Suppressed duplicate stderr printing; errors are now returned and wrapped with contextual information instead of printed.
Documentation
- Added user guide documenting enhanced error formatting, debugging tips, examples, and backward-compatibility notes.
Tests
- Expanded test suite and fixtures validating context-aware error formatting, duplicate-suppression, import chains, and type-mismatch scenarios.

feat: implement precondition-based test skipping for better developer experience @osterman (#1450)

## Summary - Implements a comprehensive precondition-based test skipping strategy to improve developer experience - Adds SDK-based helper functions for checking AWS, Git, GitHub, and OCI authentication preconditions - Updates all failing tests to gracefully skip when environmental requirements aren't met

Problem

Previously, ~18 tests were failing locally due to missing environmental setup (AWS profiles, Git configuration, network access, GitHub tokens), making it difficult for developers to contribute to Atmos without extensive environment configuration.

Solution

This PR introduces intelligent precondition checking that:

Detects missing requirements using official SDKs (AWS SDK, go-git) rather than manual checks
Skips tests gracefully with clear, actionable messages explaining what's missing and how to fix it
Provides an escape hatch via ATMOS_TEST_SKIP_PRECONDITION_CHECKS=true for CI/CD environments

Changes

New Files

tests/test_preconditions.go - Centralized helper functions for precondition checking
docs/prd/testing-strategy.md - Product Requirements Document for the testing strategy
Updated tests/README.md - Practical developer guidance

Helper Functions Added

RequireAWSProfile(t, profile) - Validates AWS profile configuration
RequireGitRepository(t) - Checks for Git repository
RequireGitRemoteWithValidURL(t) - Validates Git remote URLs
RequireGitHubAccess(t) - Checks GitHub connectivity and rate limits
RequireOCIAuthentication(t) - Validates GitHub token for OCI registry access
RequireNetworkAccess(t, url) - General network connectivity check

Tests Updated

AWS tests: internal/aws_utils, internal/terraform_backend, pkg/store
Git tests: internal/exec/describe_affected, internal/exec/terraform_utils, pkg/atlantis, pkg/describe
Vendor tests: internal/exec/vendor_utils (GitHub + OCI auth)

Testing

All previously failing packages now pass:

ok  github.com/cloudposse/atmos/internal/aws_utils
ok  github.com/cloudposse/atmos/internal/terraform_backend
ok  github.com/cloudposse/atmos/internal/exec
ok  github.com/cloudposse/atmos/pkg/store
ok  github.com/cloudposse/atmos/pkg/atlantis
ok  github.com/cloudposse/atmos/pkg/describe

Tests skip with helpful messages when preconditions aren't met:

TestExecuteVendorPull: GitHub token not configured: required for pulling OCI images from ghcr.io. 
Set GITHUB_TOKEN or ATMOS_GITHUB_TOKEN environment variable, or set ATMOS_TEST_SKIP_PRECONDITION_CHECKS=true

Benefits

✅ Developers can run tests locally without extensive environment setup
✅ Clear skip messages guide developers to fix specific issues
✅ CI/CD can bypass checks when using mocked dependencies
✅ Improves contributor onboarding experience
✅ Distinguishes between environmental issues and actual code failures

References

Fixes issues with local test execution
Implements testing best practices for open-source projects
Follows Go testing conventions with t.Skipf()

🤖 Generated with Claude Code

feat: add pre-commit hooks and development workflow @osterman (#1469)

## what - Add comprehensive pre-commit configuration with Go-specific hooks - Create development workflow commands via `.atmos.d/dev.yaml` - Add GitHub Action for running pre-commit checks in CI - Document development setup and workflow

why

Prevent broken commits: The go-build-mod hook ensures code compiles before allowing commits
Maintain code quality: Consistent formatting with gofumpt and linting with golangci-lint
Improve developer experience: Simple atmos dev commands for setup and validation
Automate CI checks: Pull requests automatically run pre-commit checks via CloudPosse's GitHub Action

Changes

Pre-commit Configuration

.pre-commit-config.yaml: Hooks for gofumpt, go-build-mod, golangci-lint, go-mod-tidy, and file hygiene

Development Workflow

.atmos.d/dev.yaml: Auto-loaded development commands
- atmos dev setup - One-time setup for contributors
- atmos dev check - Run pre-commit on staged files
- atmos dev validate - Comprehensive validation (build, lint, test)
- atmos dev fix - Auto-fix formatting issues

CI Integration

.github/workflows/pre-commit.yml: GitHub Action using cloudposse/github-action-pre-commit@v4.0.0
- Runs on all pull requests
- Checks entire codebase (not just changed files)
- Includes compilation and dependency verification

Documentation

docs/development.md: Comprehensive developer guide
.atmos.d/README.md: Explains auto-loaded configurations
Updated main README.md with development section

Testing

Verified atmos dev commands work correctly
Tested pre-commit configuration is valid
Confirmed Go compilation check prevents broken commits
Verified .atmos.d/ auto-loading functionality

references

Uses CloudPosse's own github-action-pre-commit for CI consistency
Follows Go best practices for formatting and linting
Leverages Atmos's auto-loading from .atmos.d/ directory
Fixes flakey build introduced in due to missing GITHUB_TOKEN and GitHub API rate limits
- #1453

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Added developer-facing guides (docs/development.md, .atmos.d/README.md) describing local dev workflow, custom commands, setup, tooling, CI notes, and troubleshooting.
Chores
- Added a standardized dev command suite (.atmos.d/dev.yaml), pre-commit configuration, and a GitHub workflow to run pre-commit checks on PRs; updated .gitignore to track only the new dev files.
Tests
- Improved test isolation with per-test mock repo roots and an opt-out for loading repository dev configs; added unit tests for the new exclusion behavior.

fix(telemetry): prevent PostHog errors from leaking to user output @osterman (#1491)

## what - Add custom PostHog logger adapter that integrates with Atmos's structured logging system - Route PostHog error messages to debug level instead of letting them print directly to stderr - Add configuration option (`telemetry.posthog_logging`) to control PostHog internal logging - Implement SilentLogger that completely suppresses PostHog messages when logging is disabled - Add comprehensive unit tests and documentation for the new telemetry logging features

why

PostHog errors (like "502 Bad Gateway") were appearing in user output and breaking CLI tests
PostHog was using its own internal logger that wrote directly to stderr, bypassing Atmos's logging controls
Telemetry failures should be transparent to users and not impact their experience
Users need ability to debug telemetry issues when setting up their own PostHog instances
Error messages need to follow Atmos's structured logging conventions for consistency

It was failing builds due to snapshots changing.

"/Users/runner/work/atmos/atmos/tests/snapshots/TestCLICommands_atmos_--help.stdout.golden":
        --- expected
        +++ actual
        @@ -1 +1,7 @@
        +posthog 2025/09/21 23:37:14 INFO: response 502 502 Bad Gateway – <html>

        +<head><title>502 Bad Gateway</title></head>

        +<body>

        +<center><h1>502 Bad Gateway</h1></center>

        +</body>

        +</html>

        +posthog 2025/09/21 23:37:14 ERROR: 1 messages dropped because they failed to be sent and the client was closed

Solution Details

Custom Logger System

Created two logger implementations:

PosthogLogger: Routes PostHog messages through Atmos's structured logging at DEBUG level
SilentLogger: Completely suppresses all PostHog internal messages (default behavior)

New Configuration Option

Added telemetry.posthog_logging setting:

false (default): Uses SilentLogger to suppress all PostHog internal messages
true: Uses PosthogLogger to route messages through Atmos logging at DEBUG level
Can be set via atmos.yaml or ATMOS_TELEMETRY_POSTHOG_LOGGING environment variable

Behavior at Different Log Levels

When posthog_logging: true:

Info Level (default): PostHog messages are hidden from users
Debug Level: PostHog messages appear with proper structured logging format

When posthog_logging: false (default):

All PostHog internal messages are completely suppressed regardless of log level

Test Coverage

Added comprehensive unit tests that verify:

Logger selection based on configuration
Logger methods work correctly with structured output
Errors only appear at debug level when logging is enabled
PostHog errors don't leak to stderr in production
SilentLogger suppresses all messages
Configuration loading from both file and environment variables

Documentation

Updated telemetry documentation to explain:

New posthog_logging configuration option
When to enable PostHog internal logging
How to configure via both atmos.yaml and environment variables

references

Fixes DEV-3633
Linear Issue: https://linear.app/cloudposse/issue/DEV-3633/handle-telemetry-issues-with-posthog-to-avoid-user-impact
example failed build https://github.com/cloudposse/atmos/actions/runs/17900384206/job/50892638861

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Configurable internal telemetry logging with a PostHog logger and a silent/no-op mode; new telemetry Options include logging.
Bug Fixes
- Prevented internal telemetry errors from leaking to stdout/stderr and reduced noise by adjusting log levels.
Improvements
- Telemetry accepts structured options, enqueues events (sent on Close), uses safer client cleanup, and updates messaging to "enqueued".
Tests
- Expanded coverage for logger selection, log-level behavior, env/config flag handling, enqueue/close paths, and no-stderr pollution.
Documentation
- Added telemetry logging configuration docs and fixed a typo.

feat: change !include to use file extension-based parsing and add !include.raw @osterman (#1493)

## what - Changed `!include` function from content-based to extension-based file type detection - Added new `!include.raw` function that always returns content as raw string - Proper handling of URLs with query strings and fragments

why

Predictable behavior: Users reported that auto-detecting file content was unpredictable. With extension-based detection, users know exactly how their files will be parsed
Control over parsing: The previous content-based detection made it impossible to include structured data (like JSON) as a raw string. Now users can either use a .txt extension or the new !include.raw function
URL compatibility: URLs with query strings (e.g., config.json?v=2) now work correctly - the query string doesn't affect extension detection

Key Changes

Extension-Based Parsing

.json files are parsed as JSON
.yaml and .yml files are parsed as YAML
.hcl, .tf, and .tfvars files are parsed as HCL
All other extensions return raw strings

New !include.raw Function

# Always returns content as string, regardless of extension
template: !include.raw template.json
script: !include.raw deploy.sh

URL Query String Support

# Extension detected correctly despite query strings
config: !include https://api.example.com/config.json?version=2&format=raw

Testing

Comprehensive unit tests for extension detection
Tests for URLs with query strings and fragments
Integration tests for both !include and !include.raw
All existing tests pass - backward compatible

Test plan

[x] Unit tests pass
[x] Integration tests pass
[x] Documentation updated
[x] Backward compatibility verified

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- !include now parses based on file extension (JSON, YAML, HCL → structured; others → raw).
- Added !include.raw to force raw text inclusion.
- Remote includes support extension-based parsing for URLs with extensions.
Improvements
- Better handling of URLs (ignores query strings/fragments), hidden/multi-dot filenames, and mixed local/remote resolution.
- Remote/local include behavior and parsing made more robust.
Documentation
- Updated include docs and added !include.raw page with examples.
Tests
- Extensive tests covering include behaviors and extension parsing.

fix: restore config import override behavior while maintaining Windows fix @osterman (#1489)

## what - Fix config import override behavior while maintaining Windows compatibility - Add Windows file locking resilience for Terraform state operations - Ensure main config file settings take precedence over imported configs - Remove redundant `tempViper.MergeInConfig()` call that was breaking Windows tests - Refactor `mergeConfig` into smaller, testable functions for better code coverage - Improve error wrapping in config load functions for better debugging

why

The initial fix for Windows broke the config import override behavior
Windows CI tests were failing with "The process cannot access the file because another process has locked a portion of the file" errors
Test TestInitCliConfig/valid_import_custom_override was failing because imported configs were overriding the main config
The bug was causing YAML template functions (like atmos.Component) to return null values on Windows
Code coverage was below the required threshold (45.45% vs 55.96% target)
Error messages lacked context, making debugging difficult

references

Fixes regression introduced in #1447 ("Make inline atmos config override config from imports")
Addresses test failures in both Windows CI and import override tests
Resolves Windows file locking issues in Terraform output operations

Technical Details

1. Import Override Fix

After processing imports, we now re-merge the original config content to ensure the main config takes precedence:

// Read original content before processing imports
content, err := os.ReadFile(configFilePath)

if processImports {
    // Process imports...
    
    // Re-merge original content to ensure it overrides imports
    err = tempViper.MergeConfig(bytes.NewReader(content))
}

2. Windows File Locking Resilience

Added platform-specific handling for Windows file locking issues:

Error: Failed to read state file

The state file could not be read: read terraform.tfstate.d\nonprod-component-3\terraform.tfstate: The process cannot access the file because another process has locked a portion of the file.

This error occurred during the test TestCLICommands/terraform_output_function_(no_tty) on Windows CI.

Retry Logic with Exponential Backoff

Retry file operations up to 3 times with delays (100ms, 200ms, 500ms)
Wraps critical operations like terraform output and file deletion
Only applies on Windows via build tags

Strategic Delays

Small delays after workspace operations to allow file handles to release
Prevents "file locked by another process" errors during rapid operations

Implementation

// Windows-specific retry wrapper
func retryOnWindows(fn func() error) error {
    // 3 attempts with exponential backoff
}

// Usage in terraform operations
err = retryOnWindows(func() error {
    outputMeta, err = tf.Output(ctx)
    return err
})

3. Refactoring for Better Testability

Broke the monolithic mergeConfig function into smaller, focused functions:

loadConfigFile() - Loads config files into Viper (100% coverage)
readConfigFileContent() - Reads file contents with error context (100% coverage)
processConfigImportsAndReapply() - Handles import processing and reapplication (85.7% coverage)
marshalViperToYAML() - Marshals Viper settings to YAML (80% coverage)
mergeYAMLIntoViper() - Merges YAML into Viper instance (100% coverage)

4. Improved Error Handling

Wrapped errors with descriptive context throughout config loading
Preserves sentinel errors (like ConfigFileNotFoundError) for compatibility
Provides clear error messages with file paths and operation context

Testing

✅ All existing tests pass (including Windows CI)
✅ Added comprehensive tests for Windows retry logic
✅ Added platform-specific tests (Windows vs non-Windows behavior)
✅ Code coverage improved from 45.45% to 81.25% (exceeding target)
✅ Integration tests for file locking scenarios
✅ Tests verify zero performance impact on non-Windows platforms

Files Changed

Core Fixes

pkg/config/load.go - Config loading and import handling
pkg/config/merge.go - Refactored merge logic
internal/exec/terraform_output_utils.go - Windows retry logic
internal/exec/terraform_utils.go - File operation resilience

Platform-Specific Code (Build Tags)

internal/exec/terraform_output_utils_windows.go - Windows implementation
internal/exec/terraform_output_utils_other.go - Unix implementation

Tests

pkg/config/merge_test.go - Config merge tests
internal/exec/terraform_output_utils_*_test.go - Platform-specific tests
internal/exec/terraform_output_utils_integration_test.go - Integration tests

Summary by CodeRabbit

New Features
- More reliable Terraform operations on Windows with automatic retries and short delays.
Bug Fixes
- Configuration import precedence corrected so local settings override imported ones, including nested imports.
- Improved handling of invalid imported YAML to avoid full failures and surface clear errors.
Style
- Standardized trimming of trailing whitespace with targeted exceptions; Go files use tab indentation.
Tests
- Expanded coverage for config loading, import/merge semantics, Windows behavior, and CLI parsing.