Release Announcement
Check out the v0.3.0 release notes to learn more about the release.
Envoy AI Gateway v0.3.x
Release version introducing intelligent inference routing with Endpoint Picker Provider, enhanced observability features, Google Vertex AI support, and enhanced provider integrations.
v0.3.0
August 21, 2025
Envoy AI Gateway v0.3.0 introduces intelligent inference routing, expanded provider support (including Google Vertex AI and Anthropic), and enhanced observability with OpenInference tracing and configurable metrics. Key features include Endpoint Picker Provider with InferencePool for dynamic load balancing, model name virtualization, and seamless Gateway API Inference Extension integration.
✨ New Features
Endpoint Picker Provider (EPP) Integration
- Gateway API Inference Extension Support
- Complete integration with Gateway API Inference Extension v0.5.1, enabling intelligent endpoint selection based on real-time AI inference metrics like KV-cache usage, queue depth, and LoRA adapter information.
- Dual Integration Approaches
- Support for both
HTTPRoute + InferencePoolandAIGatewayRoute + InferencePoolintegration patterns, providing flexibility for different use cases from simple to advanced AI routing scenarios.
- Support for both
- Dynamic Load Balancing
- Intelligent routing that automatically selects the optimal inference endpoint for each request, optimizing resource utilization across your entire inference infrastructure with real-time performance metrics.
- Extensible Architecture
- Support for custom endpoint picker providers, allowing implementation of domain-specific routing logic tailored to unique AI workload requirements.
Expanded Provider Ecosystem
- Google Vertex AI Production Support
- Google Vertex AI has moved from work-in-progress to full production support, including complete streaming support for Gemini models with OpenAI API compatibility. View all supported providers →
- Anthropic on Vertex AI Integration
- Complete Anthropic Claude integration via GCP Vertex AI, moving from experimental to production-ready status with multi-tool support and configurable API versions for enterprise deployments.
- Enhanced Gemini Capabilities
- Improved request/response translation for Gemini models with support for tools, response format specification, and advanced conversation handling, making Gemini integration more robust and feature-complete.
- Strengthened OpenAI-Compatible Ecosystem
- Enhanced support for the broader OpenAI-compatible provider ecosystem including Groq, Together AI, Mistral, Cohere, DeepSeek, SambaNova, and more, ensuring seamless integration across the AI provider landscape.
Observability Enhancements
- OpenInference Tracing Support
- Added comprehensive OpenInference distributed tracing with OpenTelemetry integration, providing detailed request tracing and performance monitoring for LLM operations. Includes full chat completion request/response data capture, timing information, and compatibility with evaluation systems like Arize Phoenix. View the documentation →
- Configurable Metrics Labels
- Added support for configuring additional metrics labels corresponding to HTTP request headers. This enables custom labeling of metrics based on specific request headers like user identifiers, API versions, or application contexts, providing more granular monitoring and filtering capabilities.
- Embeddings Metrics Support
- Extended GenAI metrics support to include embeddings operations, providing comprehensive token usage tracking and performance monitoring for both chat completion and embeddings API endpoints with consistent OpenTelemetry semantic conventions.
- Enhanced GenAI Metrics
- Improved AI-specific metrics implementation with better error handling, enhanced attribute mapping, and more accurate token latency measurements. Maintains full compatibility with OpenTelemetry Gen AI semantic conventions while providing more reliable performance analysis data. View the documentation →
Infrastructure and Configuration
-
Model Name Virtualization
- Added a new
modelNameOverridefield in thebackendRefofAIGatewayRoute, enabling flexible model name abstraction across different providers. This allows unified model naming for downstream applications while routing to provider-specific model names, supporting both multi-provider scenarios and fallback configurations. View the documentation →
- Added a new
-
Unified Gateway Support
- Enhanced Gateway resource management by allowing both standard
HTTPRouteandAIGatewayRouteto be attached to the sameGatewayobject. This provides a unified routing configuration that supports both AI and non-AI traffic within a single gateway infrastructure, simplifying deployment and management.
- Enhanced Gateway resource management by allowing both standard
🔗 API Updates
- BackendSecurityPolicy TargetRefs: Added targetRefs field to BackendSecurityPolicy spec, enabling direct targeting of AIServiceBackend resources using Gateway API policy attachment patterns.
- Gateway API Inference Extension: Allows InferencePool resource of Gateway API Inference Extension v0.5.1 to be specified as a backend ref in AIGatewayRoute intelligent endpoint selection.
- modelNameOverride in the backend reference of AIGatewayRoute: Added modelNameOverride field in the backend reference of AIGatewayRoute, allowing for flexible model name rewrite for routing purposes.
Deprecations
backendSecurityPolicyRefPattern: The old pattern of AIServiceBackend referencing BackendSecurityPolicy is deprecated in favor of the new targetRefs approach. Existing configurations will continue to work but should be migrated before v0.4.AIGatewayRoute'stargetRefsPattern: The targetRefs pattern is no longer supported for AIGatewayRoute. Existing configurations will continue to work but should be migrated to parentRefs.AIGatewayRoute's schema Field: The schema field is no longer needed for AIGatewayRoute. Existing configurations will continue to work but should be removed before v0.4.controller.envoyGatewayNamespacehelm value is no longer necessary: This value is no longer necessary and is redundant when configured.controller.podEnvhelm value will be removed: Use controller.extraEnvVars instead. The controller.podEnv value will be removed in v0.4.
📖 Upgrade Guidance
For users upgrading from v0.2.x to v0.3.0:
1. Upgrade Envoy Gateway to v1.5.0 - Ensure you are using Envoy Gateway v1.5.0 or later, as this is required for compatibility with the new AI Gateway features.
2. Update Envoy Gateway config - Update your Envoy Gateway configuration to include the new settings as below. The full manifest is available in the manifests/envoy-gateway-config/config.yaml file as per the getting started guide.
--- a/manifests/envoy-gateway-config/config.yaml
+++ b/manifests/envoy-gateway-config/config.yaml
@@ -43,9 +43,19 @@ data:
extensionManager:
hooks:
xdsTranslator:
+ translation:
+ listener:
+ includeAll: true
+ route:
+ includeAll: true
+ cluster:
+ includeAll: true
+ secret:
+ includeAll: true
post:
- - VirtualHost
- Translation
+ - Cluster
+ - Route
3. Upgrade Envoy AI Gateway to v0.3.0
4. Migrate Gateway target references - Update from the deprecated AIGatewayRoute.targetRefs pattern to the new AIGatewayRoute.parentRefs approach after the upgrade to v0.3.0.
5. Migrate backendSecurityPolicy references - Update from the deprecated AIServiceBackend.backendSecurityPolicyRef pattern to the new BackendSecurityPolicy.targetRefs approach after the upgrade to v0.3.0.
6. Remove AIGatewayRoute.schema - remove the schema field from AIGatewayRoute resources after the upgrade to v0.3.0, as it is no longer used.
📦 Dependencies Versions
- Go 1.24.6
- Updated to latest Go version for improved performance and security.
- Envoy Gateway v1.5
- Built on Envoy Gateway for proven data plane capabilities.
- Envoy v1.35
- Leveraging Envoy Proxy's battle-tested networking capabilities.
- Gateway API v1.3.1
- Support for latest Gateway API specifications.
- Gateway API Inference Extension v0.5.1
- Integration with Gateway API Inference Extension for intelligent endpoint selection.
🙏 Acknowledgements
This release represents the collaborative effort of our growing community. Special thanks to contributors from Tetrate, Bloomberg, Tencent, Google, Nutanix and our independent contributors who made this release possible through their code contributions, testing, feedback, and community participation.
The Endpoint Picker Provider integration represents a significant milestone in making AI inference routing more intelligent and efficient. We appreciate all the feedback and testing from the community that helped shape this feature.
New Contributors
- @sukumargaonkar made their first contribution in #635
- @isyangban made their first contribution in #729
- @whzghb made their first contribution in #743
- @yduwcui made their first contribution in #781
- @googs1025 made their first contribution in #799
- @cvrajeesh made their first contribution in #792
- @terrytangyuan made their first contribution in #866
- @nagar-ajay made their first contribution in #969
- @gavrissh made their first contribution in #947
- @surenraju-careem made their first contribution in #981
- @Hyzhou made their first contribution in #1035
- @VarSuren made their first contribution in #1022
- @carlory made their first contribution in #1072
- @tekumara made their first contribution in #1089
Full Changelog: v0.2.0...v0.3.0