Envoy AI Gateway v0.4.0 - November 07, 2025
Release introducing Model Context Protocol (MCP) Gateway, OpenAI Image Generation, Anthropic support (direct and AWS Bedrock), guided output decoding for GCP Vertex AI/Gemini, cross-namespace references, enhanced authentication, and comprehensive observability improvements.
🔗 View release notes on the site
✨ New Features
Model Context Protocol (MCP) Gateway
New MCPRoute CRD
Introduces
MCPRoutecustom resource for routing MCP requests to backend MCP servers, enabling unified AI API for multiple MCP backends.
Complete MCP spec implementation
Includes streamable HTTP transport, JSON-RPC 2.0 support, and MCP spec-compliant OAuth 2.0 authorization with JWKS validation and Protected Resource Metadata.
Server multiplexing and tool routing
Aggregates multiple MCP servers behind a single endpoint with intelligent tool routing, tool filtering (exact match and regex patterns), and collision detection.
Upstream authentication
Supports both OAuth-based authentication and API key authentication for secure backend MCP server communication with configurable headers.
Session management
Implements MCP session handling with encryption, rotatable seeds, and graceful session lifecycle management.
Anthropic Provider Support
Direct api.anthropic.com support
Native integration with Anthropic's API at
api.anthropic.com, complementing existing GCP Vertex AI Anthropic support.
AWS Bedrock native Anthropic Messages API
Support for Claude models on AWS Bedrock using the native Anthropic Messages API format instead of the generic Converse API, enabling full feature parity with direct Anthropic API including prompt caching and extended thinking.
Anthropic API key authentication
Native
x-api-keyheader-based authentication matching Anthropic's API conventions and SDK patterns for direct Anthropic connections.
Passthrough translator with token usage tracking
Efficient passthrough translation layer that captures token usage and maintains API compatibility while minimizing overhead for both direct and AWS Bedrock Anthropic endpoints.
Standalone CLI auto-configuration
Auto-configuration from
ANTHROPIC_API_KEYenvironment variable in standalone mode for zero-config deployments.
Guided Output Support for GCP Vertex AI/Gemini
Guided regex support
Constrains model outputs to match specific regular expressions for GCP Vertex AI/Gemini models, enabling structured text generation.
Guided choice support
Restricts model outputs to predefined choices for GCP Vertex AI/Gemini models, ensuring responses conform to expected values.
Guided JSON support
Ensures model outputs are valid JSON conforming to specified schemas for GCP Vertex AI/Gemini models, with OpenAI-compatible API translation.
Provider-Specific Enhancements
OpenAI Image Generation /v1/images/generations endpoint
End-to-end support for OpenAI's image generation API including request/response translation, Brotli encoding/decoding, and full protocol compatibility.
OpenAI legacy /v1/completions endpoint
Full pass-through support for OpenAI's legacy completions endpoint with complete tracing and metrics, ensuring backward compatibility.
Azure OpenAI embeddings support
Native support for Azure OpenAI embeddings API with proper protocol translation and token usage tracking.
AWS Bedrock reasoning tokens
Full support for reasoning/thinking tokens in AWS Bedrock responses for both streaming and non-streaming modes, properly exposing extended thinking processes in Claude models.
GCP Vertex AI safety settings
Support for GCP-specific safety settings configuration, allowing fine-grained control over content filtering and safety thresholds for Gemini models.
GCP Gemini streaming token accounting
Accurate completion_tokens reporting in streaming usage chunks for Gemini models, ensuring proper token accounting during streaming responses.
Cross-Namespace Resource References
Cross-namespace AIServiceBackend references
AIGatewayRoutecan now referenceAIServiceBackendresources in different namespaces, enabling multi-tenant and organizational separation patterns.
ReferenceGrant validation
Comprehensive ReferenceGrant integration following Gateway API patterns, with automatic validation and clear error messages when grants are missing.
Enhanced Upstream Authentication
AWS SDK default credential chain
Support for AWS SDK's default credential chain including IRSA (IAM Roles for Service Accounts), EKS Pod Identity, EC2 Instance Profiles, and environment variables, eliminating need for static credentials or OIDC settings
Azure API key authentication
Native Azure OpenAI API key authentication using the
api-keyheader, matching Azure SDK conventions and console practices.
Traffic Management and Configuration
Header mutations at route and backend levels
New
headerMutationfields in bothAIServiceBackendandAIGatewayRouteRuleBackendRefenable header manipulation with smart merge logic for advanced routing scenarios.
InferencePool v1 support
Updated to Gateway API Inference Extension v1.0, providing stable intelligent endpoint selection with enhanced performance and reliability.
Cached token usage tracking for actual token usage reporting
Captures and reports cached token statistics from cloud providers (Anthropic, Bedrock, etc.), providing accurate cost attribution for prompt caching features.
Standalone Mode and CLI
Docker image support
Official Docker images for the aigw CLI published to GitHub Container Registry, enabling containerized standalone deployments with proper health checks and lifecycle management.
Multi-provider auto-configuration
Zero-config standalone mode with automatic configuration from
OPENAI_API_KEY,AZURE_OPENAI_API_KEY, orANTHROPIC_API_KEYenvironment variables. Generates complete Envoy configuration with OpenAI SDK compatibility.
MCP server configuration
Native MCP support in standalone mode via
--mcp-configand--mcp-jsonflags, enabling unified LLM and MCP server configuration in a single aigw run invocation without Kubernetes.
XDG Base Directory standards
Proper separation of configuration, data, state, and runtime files following XDG Base Directory specification, improving organization and enabling better cleanup and management of aigw state.
Enhanced readiness monitoring
Improved Envoy readiness detection and status reporting in standalone mode, providing clear insights into when the gateway is ready to accept traffic with better error messages.
Consolidated admin server
Unified admin server on a single port serving both
/metricsand/healthendpoints, simplifying monitoring and health check configuration.
Improved error handling
aigwCLI now fails fast and exits cleanly if external processor fails to start, preventing silent failures and improving debugging experience.
Type-safe Kubernetes client SDK
Generated client libraries for all AI Gateway CRDs following standard Kubernetes client-go patterns, enabling developers to build controllers, operators, and custom integrations with type safety.
Observability Enhancements
MCP operations observability
Comprehensive monitoring, logging, and tracing for MCP operations with configurable access logs and metrics enrichment for MCP server interactions and tool routing.
Image generation tracing and metrics
OpenInference-compliant distributed tracing and OpenTelemetry Gen AI metrics for image generation requests with detailed request parameters and timing information.
OpenTelemetry native metrics export
Support for OTEL-native metrics export (in addition to Prometheus), enabling integration with Elastic Stack, OTEL-TUI, and other OTEL-native observability systems. Includes console exporter for ad-hoc debugging.
Embeddings tracing implementation
Complete OpenInference-compliant tracing for embeddings operations, complementing existing chat completion tracing.
Enhanced /messages endpoint metrics
Distinct metrics for Anthropic's
/messagesendpoint, providing accurate attribution separate from/chat/completionsendpoints.
Original model tracking
Metrics now track both the original requested model and any overridden model names, providing accurate attribution in multi-provider and model virtualization scenarios.
🔗 API Updates
- New MCPRoute CRD
- Introduces MCPRoute custom resource with comprehensive fields for MCP server configuration, tool filtering, authentication policies (OAuth and API key), and Protected Resource Metadata.
- Cross-namespace references in AIGatewayRoute
- Added namespace field to AIGatewayRouteRuleBackendRef, enabling cross-namespace backend references with ReferenceGrant validation.
- Header mutations at route and backend levels
- Added headerMutation fields to both AIServiceBackend and AIGatewayRouteRuleBackendRef for backend-level and per-route header manipulation with smart merge logic.
- New AWSAnthropic API schema
- Added AWSAnthropic schema for Claude models on AWS Bedrock using the native Anthropic Messages API format, providing full feature parity with direct Anthropic API.
- Anthropic API key authentication
- Added AnthropicAPIKey to BackendSecurityPolicy for x-api-key header authentication.
- Azure API key authentication
- Added AzureAPIKey to BackendSecurityPolicy for api-key header authentication.
- AWS credential chain support
- BackendSecurityPolicy AWS auth now supports SDK default credential chain when credentials are not explicitly provided.
- InferencePool v1
- Updated to support Gateway API Inference Extension v1.0 (inference.networking.k8s.io/v1) instead of v1alpha1.
- Enforced Backend resource requirement
- Added CRD validation to AIServiceBackend explicitly requiring Envoy Gateway Backend resources (Kubernetes Service is not supported).
📦 Dependencies Versions
- Go 1.25.3
- Updated to Go 1.25.3 for improved performance and security.
- Envoy Gateway v1.5+
- Built on Envoy Gateway v1.5+ for proven data plane capabilities and enhanced features. This version is also fully compatible with the upcoming v1.6
- Envoy v1.36
- Leveraging Envoy Proxy v1.36's battle-tested networking capabilities.
- Gateway API v1.4.0
- Support for Gateway API v1.4.0 specifications.
- Gateway API Inference Extension v1.0.2
- Integration with Gateway API Inference Extension v1.0.2 for stable intelligent endpoint selection.
🔮 What's Next (beyond v0.4)
We're already working on exciting features for future releases:
- Advanced MCP features - Further enhancements to the MCP protocol support
- Additional LLM provider integrations - Expanding the ecosystem of supported LLM providers
- Enhanced Performance - Improving the runtime performance of the gateway