Envoy AI Gateway v0.7.0
Envoy AI Gateway v0.7.0 adds hostname-based routing to AIGatewayRoute, enabling multi-tenant deployments where different hostnames expose different model sets through a single Gateway. A new Anthropic Messages → AWS Bedrock Converse translator lets Anthropic-native clients reach Bedrock without switching protocols. OpenAI audio transcription and translation endpoints arrive alongside Azure OpenAI Responses API support. Quota-aware rate limiting takes its first step with backend rate limit filter injection for QuotaPolicy. Claude Opus 4.7 gains full reasoning support including the display parameter and xhigh effort tier. Anthropic-to-OpenAI translation now handles reasoning blocks and images end-to-end. MCP tools/list responses respect authorization rules, and multimodal support grows with audio_url and video_url content types. Several SSE streaming and provider translation bugs are fixed.
✨ New Features
Multi-Tenant Hostname Routing
- Hostname-based model scoping on
AIGatewayRoute— Serve different model sets from a single Gateway by assigning hostnames to eachAIGatewayRoute. The/v1/modelsendpoint automatically returns only the models declared by routes matching the request'sHostheader, so tenants onteamA.ai.example.comandteamB.ai.example.comeach see their own catalog without separate Gateways. Wildcard hostnames (*.ai.example.com) are supported following the Gateway API hostname matching rules.
Provider Translation
- Anthropic
/v1/messages→ AWS Bedrock Converse API — Send requests in Anthropic Messages format and have them translated to Bedrock's Converse and ConverseStream APIs automatically. Supports text, images, tool use, thinking blocks, and streaming — so Anthropic-native clients can reach any Bedrock model without changing their integration. Complements the existing OpenAI → Bedrock Converse and Anthropic → Bedrock InvokeModel paths. - Reasoning and image support for Anthropic-to-OpenAI translation — The Anthropic
/v1/messages→ OpenAI/v1/chat/completionspath now handles thinking/reasoning content and image blocks end-to-end. Thinking config (enabled/disabled/adaptive) passes through, thinking and redacted_thinking blocks are preserved in multi-turn conversations, and image blocks (base64 and URL) convert to OpenAIimage_urlformat. Previously these were silently dropped. - Claude Opus 4.7 and Mythos Preview reasoning — Full support for Claude Opus 4.7's reasoning features: the
displayparameter (summarized/omitted) controls thinking content visibility, andxhighjoins the reasoning effort tiers for long-horizon agentic and coding tasks. Bothclaude-opus-4-7andclaude-mythos-previewmodels are recognized for effort-based thinking control. - Custom request paths for Anthropic backends via
prefix— Theprefixfield onVersionedAPISchemanow works for Anthropic-schema backends, producing endpoints like/{prefix}/messagesinstead of the default/v1/messages. Useful for routing to Anthropic-compatible providers that use a non-standard path. - Anthropic
anthropic-betaheader forwarded to AWSAnthropic — Theanthropic-betarequest header is now mapped into theanthropic_betabody field when routing to AWSAnthropic backends, so beta features like extended thinking and token counting work through the gateway without manual body rewriting.
OpenAI API Compatibility
- Audio transcription and translation endpoints — Full data-plane support for OpenAI's
/v1/audio/transcriptions(Whisper transcription) and/v1/audio/translations(Whisper translation) endpoints. These acceptmultipart/form-datarequests containing audio files, enabling speech-to-text workloads to flow through the gateway with the same auth, rate limiting, and observability as other traffic. - Azure OpenAI Responses API — The OpenAI-compatible
/v1/responsesendpoint now works with Azure OpenAI backends, routing requests to Azure's/openai/responses?api-version=...path while preserving existing request and response handling. Azure users get Responses API support without changing client code. audio_urlandvideo_urlcontent types — OpenAI chat completion requests can now includeaudio_urlandvideo_urlcontent parts, enabling multimodal audio and video inputs for compatible backends like vLLM with phi-4-mm and Qwen 3.5 models.
Quota-Aware Routing
- Backend quota rate limit filter injection — First step toward quota-aware routing: the controller now injects a backend rate limit filter when a
QuotaPolicyis attached to anAIServiceBackend. The QuotaPolicy controller reconciles the policy, builds rate limit descriptor trees, and configures the rate limit service. This enables per-backend request throttling based on upstream provider quotas.
MCP Gateway
- Authorization-filtered
tools/listresponses — MCPtools/listnow applies the same authorization rules used bytools/call, omitting tools the caller isn't authorized to invoke. Prevents unauthorized callers from discovering tool names and avoids wasted LLM turns on tools that would fail at call time.
Observability
- Smarter log redaction preserves developer-authored metadata — Debug log redaction (
--enableRedaction) no longer masks developer-authored schema metadata that was previously over-redacted: tool definitiondescriptionandparameters, tool callfunction.name,response_format.json_schema, andguided_jsonare now visible in debug logs. User-provided content and AI-generated text remain redacted, making debug logs significantly more useful without compromising privacy.
🔗 API Updates
AIGatewayRoute.spec.hostnames— New optional field accepting a list of hostnames for hostname-based request filtering. When specified, the generatedHTTPRouteincludes these hostnames, and the/v1/modelsendpoint scopes its response to models from matching routes. Follows Gateway API hostname semantics including wildcard support.AIGatewayRoute.spec.rulescapped at 15 — Maximum rules perAIGatewayRoutereduced from 128 to 15 to match the Gateway APIHTTPRoutelimit (one slot is reserved for a controller-injected catch-all rule). To configure more rules on the same Gateway, split them across multipleAIGatewayRouteresources.VersionedAPISchema.prefixsupported for Anthropic — Theprefixfield now applies to Anthropic-schema backends in addition to OpenAI. Theversionfield is ignored for Anthropic; useprefixfor custom paths. Note:prefixis ignored forAWSAnthropicandGCPAnthropicas these override paths internally.QuotaPolicyrate limit filter injection (runtime enforcement) — TheQuotaPolicyCRD (introduced as API-only in v0.6) now has its first runtime behavior: when attached to anAIServiceBackend, a backend rate limit filter is injected to enforce quota-based throttling. Full quota-aware routing across multiple backends is planned for future releases.
🐛 Bug Fixes
- SSE parser handles fields without space after colon — The SSE event parser now correctly handles fields formatted as
data:{json}(no space after the colon), in addition to the standarddata: {json}. Fixes silent field drops when proxying responses from providers that omit the optional space. - Responses API streaming SSE buffering — OpenAI Responses API and speech streaming translators now buffer incomplete SSE events across response body chunks instead of treating each chunk as self-contained. Fixes dropped or mangled events when TCP segment boundaries split an SSE event mid-frame.
- Responses API token usage from incomplete and failed streams — Token usage is now captured from
response.incompleteandresponse.failedSSE events, not justresponse.completed. Streams that hitmax_output_tokensor encounter post-generation failures no longer report zero tokens. - Nil output guard in AWS Bedrock response translator — Bedrock can return HTTP 200 with no
outputfield (e.g. guardrail interventions orUnknownOperationException). Previously this caused a nil-pointer panic in the ext-proc; now it returns a clean error to the caller. - Comprehensive Gemini finish-reason mapping — Gemini finish reasons like
SAFETY,BLOCKLIST,RECITATION,MALFORMED_FUNCTION_CALL, and others now map to their correct OpenAI equivalents instead of all falling through tocontent_filter. Unknown reasons map toerrorrather than silently misreporting as a content filter event. - Empty delta in GCP Vertex AI streaming chunks — Streaming response chunks from GCP Vertex AI that lack candidate content now emit an empty
deltaobject instead of omitting the field, conforming to the OpenAI streaming contract and fixing parse errors in strict clients. - Typeless assistant output messages in Responses API — Multi-turn Responses API inputs that include assistant messages without an explicit
type: "message"field (e.g. from OpenCode) now parse correctly. Previously these were treated as easy-input messages, causing unmarshalling failures onoutput_textcontent blocks.
📖 Upgrade Guidance
Using Hostname-Based Routing
To serve different model sets per hostname, add hostnames to your AIGatewayRoute:
apiVersion: aigateway.envoyproxy.io/v1beta1
kind: AIGatewayRoute
metadata:
name: team-a-route
spec:
hostnames:
- "team-a.ai.example.com"
rules:
- matches:
- headers:
- name: x-ai-eg-model
value: gpt-4o
backendRefs:
- name: openai-backendRoutes without hostnames remain accessible on all hosts. When at least one route uses hostname scoping, the /v1/models endpoint automatically returns only the models for the matching host.
Rules-Per-Route Limit
AIGatewayRoute.spec.rules is now capped at 15 (down from 128) to match the Gateway API HTTPRoute limit. If you have routes with more than 15 rules, split them across multiple AIGatewayRoute resources attached to the same Gateway.
Using Anthropic Prefix
If you route Anthropic traffic to a provider with a non-standard path, use the prefix field:
schema:
name: Anthropic
prefix: /custom/v2 # produces /custom/v2/messagesNote: prefix is ignored for AWSAnthropic and GCPAnthropic backends as they override paths internally.
Adopting Claude Opus 4.7 Reasoning
If you use Claude Opus 4.7 or Mythos Preview models, note that display defaults to omitted (unlike earlier Claude models which default to summarized). To receive summarized thinking content, set display: "summarized" explicitly. The new xhigh reasoning effort tier is available for long-horizon agentic tasks.
📦 Dependency Versions
| Dependency | Version |
|---|---|
| Go | 1.26.2 |
| Envoy Gateway | v1.7.0 |
| Envoy Proxy | v1.37 |
| Gateway API | v1.4.1 |
| Gateway API Inference Extension | v1.0.2 |
| MCP Go SDK | v1.6.0 |
🙏 Acknowledgements
We extend our gratitude to all contributors who made this release possible. Special thanks to:
- The growing community of adopters for their valuable feedback and production insights
- Everyone who reported bugs, submitted PRs, and participated in design discussions
- The Envoy Gateway team for their continued collaboration
🔮 What's Next
We're already working on features for future releases:
- Full quota-aware routing — building on the rate limit filter injection landed in v0.7 to route around rate-limited upstreams automatically across multiple backends
- MCPBackend CRD — a dedicated custom resource for MCP backend servers, decoupling MCP backend configuration from MCPRoute
- Expanded multimodal support — additional audio, video, and image generation backends across cloud providers
- Deeper MCP authorization — finer-grained policy across tools, resources, and prompts
- More provider translation paths — filling coverage gaps across Anthropic, Bedrock, and Vertex AI