github envoyproxy/ai-gateway v0.6.0

4 hours ago

Envoy AI Gateway v0.6.0

Envoy AI Gateway v0.6.0 marks the first production-ready API surface:

  • The core CRDs (AIGatewayRoute, AIServiceBackend, BackendSecurityPolicy, GatewayConfig, MCPRoute) are now served at v1beta1.
  • AWS Bedrock gains a native InvokeModel path for Claude alongside Titan embeddings via the OpenAI /v1/embeddings contract.
  • Gemini gets first-class embeddings and Anthropic-style prefix context caching.
  • Cross-provider clients can hit Anthropic's /v1/messages endpoint on any OpenAI-compatible backend, and a single reasoning_effort knob now works across Anthropic, OpenAI, and Gemini.
  • Operators get GKE Workload Identity via Application Default Credentials, configurable webhook host networking, request/response body redaction for compliance, and the Go 1.26.2 + Envoy 1.37 + Envoy Gateway 1.7 baseline.

Two breaking changes land in v0.6AIGatewayRoute.spec.filterConfig is removed (move to GatewayConfig), and the deprecated version-as-prefix behavior on VersionedAPISchema is removed (use prefix). See Upgrade Guidance below.

📖 Full documentation

⚠️ Breaking Changes

  • AIGatewayRoute.spec.filterConfig removed. The filterConfig field on AIGatewayRoute has been removed. Move external-processor configuration (resources, env vars, image overrides) to a GatewayConfig resource referenced from the Gateway via the aigateway.envoyproxy.io/gateway-config annotation. v0.5 deprecated the resources subfield with a pointer to GatewayConfig; v0.6 removes the entire filterConfig struct, so anything still set there must move now. See the upgrade guidance below.
  • VersionedAPISchema.version no longer acts as an endpoint prefix for OpenAI-schema backends. The legacy behavior deprecated in v0.5 is gone. Use the prefix field instead (e.g. prefix: /v1beta/openai for Gemini's OpenAI-compatible API, prefix: /compatibility/v1 for Cohere). See the upgrade guidance below.

✨ New Features

AWS Bedrock

  • Native InvokeModel API for Claude — Send requests to Claude models on Bedrock through Bedrock's native InvokeModel endpoint, complementing the existing Converse API path. Useful when applications already speak the Anthropic Messages format and want a thin translation layer.
  • OpenAI → Bedrock Titan embeddings translation — Call Amazon Titan embedding models on Bedrock through the standard OpenAI /v1/embeddings contract. Switch embedding providers without changing client code. Cohere and other Bedrock embedding models are not yet covered and will follow in a later release.

Anthropic and Cross-Provider Translation

  • Anthropic /v1/messages endpoint on OpenAI backends — Expose any OpenAI-compatible backend through Anthropic's Messages API. Lets Claude-style clients reach OpenAI, Azure OpenAI, or any other OpenAI-compatible provider behind the gateway without rewriting requests.
  • Structured output for Claude models — Pass JSON schema constraints through to Claude so responses conform to your declared shape. Available on Anthropic and AWS Bedrock Claude backends today; GCP Vertex AI Claude is excluded pending upstream provider support.
  • Cleaner handling when max_tokens is omitted on Anthropic requests — Requests without an explicit max_tokens no longer crash the translator; they're forwarded so the provider returns a normal validation error. Removes a long-standing footgun when forwarding OpenAI-shaped requests through the Anthropic path.
  • Adaptive thinking for claude-opus-4.6 — Translate Claude's new adaptive thinking mode end-to-end. Adaptive lets the model decide thinking depth per request rather than committing to a fixed budget, so callers can opt in without bespoke provider code.
  • Unified reasoning_effort across Anthropic, OpenAI, and Gemini — A single OpenAI-style reasoning_effort value (low/medium/high/xhigh) now maps onto Anthropic's thinking budgets and Gemini 3's thinking controls. One client knob, three providers.

Gemini Provider

  • Gemini embeddings translation — Use Gemini embedding models through the OpenAI /v1/embeddings contract, completing Gemini coverage alongside chat completions and Responses.
  • Gemini context caching with prefix-style API — Activate Gemini's context caching using the same Anthropic-style cache_control prefix surface already supported elsewhere. Cut input token costs on long, repeated system prompts without a Gemini-specific code path.
  • Gemini reasoning surfaced as thinking blocks — Non-streaming Gemini reasoning is now exposed as both string content and structured thinking_blocks, matching the shape clients already use for Anthropic responses. Streaming responses still surface reasoning as string content only.

OpenAI API Compatibility

  • Responses API — context management and richer streaming — Second wave of Responses API work fills in context management and improved streaming so the /v1/responses path is closer to parity with /v1/chat/completions. If you held off on /v1/responses due to missing features, retest now.
  • Compatibility with open-source Responses API implementations — Improved compatibility with non-OpenAI implementations of the Responses API (e.g. open-source inference servers that expose a /v1/responses endpoint), broadening which Responses-aware clients can sit in front of the gateway.
  • Text-to-speech endpoint /v1/audio/speech — Route OpenAI text-to-speech requests through the gateway, so audio workloads benefit from the same auth, rate limiting, and observability as chat traffic.

MCP Gateway

  • Per-backend header forwarding with renameMCPRouteBackendRef.forwardHeaders accepts a list of inbound headers to forward to each backend, optionally renaming them on the way out. Each MCP backend can receive its own set of headers (e.g. trace context, tenant identifiers, per-user auth) without a single route-wide rule.
  • JWT claim forwarding to MCP backends — Project verified JWT claims into outbound headers via MCPRouteOAuth.claimToHeaders, enabling identity-aware tool execution at backend MCP servers without re-authenticating downstream.
  • Exclude / excludeRegex on tool selectorsMCPToolFilter now supports deny patterns (literal exclude and regex excludeRegex) alongside the existing include rules. Useful when a backend exposes more capabilities than a given route should surface.
  • Tool name in access logs and response metadata — Tool invocations now carry the tool name in dynamic metadata (key mcp_tool_name), so per-tool debugging, dashboards, and access-log fields are straightforward to wire up.
  • Per-backend capability tracking — The gateway tracks which MCP server feature flags (tools, prompts, resources, logging, completions) each backend supports and merges them across a route. Capability negotiation now reflects what's actually reachable, so clients don't get told a feature is available when no reachable backend implements it.

Authentication and Identity

  • GKE Workload Identity via Application Default Credentials — GCP backends now authenticate using the standard ADC chain when neither credentialsFile nor workloadIdentityFederationConfig is set in the BackendSecurityPolicy. Workloads running on GKE pick up Workload Identity automatically — no static service account JSON secret needed.

Security and Privacy

  • Request and response body redaction — Strip or mask sensitive fields in request and response bodies before they hit logs, traces, or metrics. Lets you keep observability on while meeting privacy and compliance constraints.

Observability

  • OTLP access logging auto-configured by aigw — Standalone aigw wires up OTLP access logging out of the box when an OTLP endpoint is configured (via OTEL_EXPORTER_OTLP_ENDPOINT), removing a manual step from local-dev and demo paths.
  • Default agent-session-idsession.id header mapping — Spans and logs now correlate by session.id automatically when clients send the agent-session-id header, so agent frameworks like Goose get session correlation with zero config. Override or disable via OTEL_AIGW_REQUEST_HEADER_ATTRIBUTES. Metrics never default to session IDs (high cardinality).
  • ReasoningToken cost typeLLMRequestCostType now includes ReasoningToken, so you can budget and bill against thinking tokens separately from input, output, and cache cost types.
  • Response model metadata — Responses now carry the resolved upstream model in metadata, which clients and downstream tools can read to confirm exactly which model served a request (useful when routes use model aliasing or fallback).
  • OTEL attribute count cap removed for large contexts — Removed the OTEL span attribute count limit so long-context requests no longer have parts of their trace silently dropped.

Operations and Extensibility

  • Custom webhook port and host network — The conversion webhook can now bind to a configurable port (controller.mutatingWebhook.port) and run on the host network (controller.hostNetwork), smoothing installs in clusters with restrictive admission webhook networking such as GKE private clusters.
  • Lua filter slot after the AI ExtProc stage — Lua filters can now be attached after the AI ExtProc stage in the standard filter chain, so you can do last-mile request shaping (header rewrites, body tweaks) without writing a custom EnvoyExtensionPolicy.
  • Route-scoped LLM request costs with global defaults — Set GatewayConfig.spec.globalLLMRequestCosts for fleet-wide defaults and override per-route at AIGatewayRoute.spec.llmRequestCosts. Makes per-tenant or per-backend cost tracking straightforward without per-route boilerplate.

🔗 API Updates

  • Core CRDs promoted to aigateway.envoyproxy.io/v1beta1AIGatewayRoute, AIServiceBackend, BackendSecurityPolicy, GatewayConfig, and MCPRoute are now served at v1beta1, signaling that the core API and MCP routing surface are stable enough for production use. v1alpha1 versions remain registered with deprecation warnings so existing manifests continue to apply during the upgrade window.
  • MCPRouteBackendRef.forwardHeaders — New per-backend list of headers to forward, with optional rename. Replaces the need for a single route-wide header forwarding rule when backends expect different headers.
  • MCPRouteOAuth.claimToHeaders — Configure which verified JWT claims should be projected into outbound headers to MCP backends.
  • MCPToolFilter.exclude / excludeRegex — Tool selectors now support exclusion alongside inclusion, with both literal and regex forms.
  • LLMRequestCostType.ReasoningToken — New cost type for thinking-token usage, complementing the existing input, output, and cache cost types.
  • GatewayConfig.spec.globalLLMRequestCosts — Fleet-wide cost defaults that individual AIGatewayRoute.spec.llmRequestCosts entries can override.
  • Preview: QuotaPolicy API (v1alpha1, no runtime enforcement yet) — New CRD surface for declaring upstream-provider quota policies, laying the groundwork for quota-aware routing. Currently API-only — no controller reconciliation or enforcement is wired up. Track it for future releases; do not rely on it as a working feature today.

🐛 Bug Fixes

  • Webhook cache race during extProc injection — Fixes a race where freshly applied AIGatewayRoute resources could miss extProc injection on first reconcile because the conversion webhook read from a stale cache. Scripted apply-then-curl tests should see fewer flakes.
  • Field ownership preserved on updates — Controllers no longer claim ownership of fields they don't manage during updates. If you co-deploy the AI Gateway controller alongside other operators that touch adjacent fields (e.g. service mesh injectors, policy controllers), expect fewer reconcile churn loops on shared resources.
  • Orphan cleanup for MCPRoute backendrefs — Resources tied to MCPRoute backend references are now cleaned up when the route or reference is removed, fixing a leak that could leave stale config in the cluster.
  • Standalone Envoy startup failures surfaced by aigwaigw now reports standalone Envoy startup failures cleanly instead of hanging or printing an unhelpful trace, making local dev and CI loops much faster to diagnose.
  • Bedrock Titan embeddings dataplane route — Restored the Envoy route for Titan embeddings in dataplane tests so Titan workloads exercise the full pipeline.
  • Hardened bearer token parsing — Malformed Authorization: Bearer headers used to panic the MCP subject extractor; they now return a clean auth failure and the request falls through to the standard auth failure path.
  • Request context propagation in PostTranslateModify — Kubernetes client calls inside PostTranslateModify now honor request cancellation and deadlines, reducing stuck reconciles when the parent request is canceled.
  • Case-sensitive JSON marshalling and unmarshalling — JSON encoding now consistently honors case, fixing subtle mismatches when round-tripping fields whose names differ only in case (visible previously as occasional 400s on certain provider payloads).
  • Secret rotation propagates to MCPRoute — Updates to a Secret referenced by an MCPBackendRef are now reflected in the live configuration, matching how BackendSecurityPolicy already handled secret updates. Operators rotating MCP backend credentials no longer need to bounce the route.
  • MCP proxy handles compressed Accept-Encoding from upstreams — The MCP proxy now correctly handles compressed Accept-Encoding values from upstream requests, fixing failures when MCP backends advertise gzip or other compression schemes.
  • aigw standalone accepts IP addresses for endpoints — In standalone mode, aigw previously assumed endpoints were hostnames; you can now point OpenAI and MCP env config at IP addresses (e.g. 127.0.0.1), making local dev against loopback addresses work.

📖 Upgrade Guidance

Migrating from filterConfig to GatewayConfig

The filterConfig field on AIGatewayRoute has been removed in v0.6. If you previously configured the external processor (resources, env vars, image overrides) via filterConfig on individual routes, move that configuration to a GatewayConfig resource and reference it from the Gateway.

Before (v0.5):

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
  name: my-route
spec:
  filterConfig:
    externalProcessor:
      resources:
        requests:
          cpu: "100m"
          memory: "128Mi"

After (v0.6):

apiVersion: aigateway.envoyproxy.io/v1beta1
kind: GatewayConfig
metadata:
  name: my-gateway-config
  namespace: default
spec:
  extProc:
    kubernetes:
      resources:
        requests:
          cpu: "100m"
          memory: "128Mi"
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: ai-gateway
  annotations:
    aigateway.envoyproxy.io/gateway-config: my-gateway-config

Migrating VersionedAPISchema.version to prefix

The deprecated v0.5 behavior of using VersionedAPISchema.version as an endpoint path prefix for OpenAI-schema backends has been removed in v0.6. Use the dedicated prefix field instead.

Before (v0.5):

schema:
  name: OpenAI
  version: /v1beta/openai # legacy: version field overloaded as path prefix

After (v0.6):

schema:
  name: OpenAI
  prefix: /v1beta/openai # explicit prefix field

Adopting v1beta1 APIs

AIGatewayRoute, AIServiceBackend, BackendSecurityPolicy, GatewayConfig, and MCPRoute are now served at aigateway.envoyproxy.io/v1beta1. Existing v1alpha1 manifests continue to apply, but new manifests should target v1beta1 directly:

apiVersion: aigateway.envoyproxy.io/v1beta1
kind: AIGatewayRoute

Switching GCP backends to Workload Identity

If you're running on GKE, drop static service-account keys and let the gateway pick up Application Default Credentials. In your GCP BackendSecurityPolicy, leave credentialsFile and workloadIdentityFederationConfig unset and bind Workload Identity to the controller's service account; no serviceAccountJSON secret is required.

📦 Dependency Versions

  • Go 1.26.2 — Updated to Go 1.26.2 to pick up the latest security and performance fixes.
  • Envoy Gateway v1.7.0 — Built on Envoy Gateway v1.7.0 for the newest data plane capabilities and stability fixes.
  • Envoy v1.37 — Leveraging Envoy Proxy v1.37.0 for the latest networking and security features.
  • Gateway API v1.4.1 — Support for Gateway API v1.4.1 specifications.
  • Gateway API Inference Extension v1.0.2 — Continued integration with Gateway API Inference Extension v1.0.2 for stable intelligent endpoint selection.
  • MCP Go SDK 1.4.1 — Updated to modelcontextprotocol/go-sdk v1.4.1 for the latest MCP protocol features and fixes.

🙏 Acknowledgements

We extend our gratitude to all contributors who made this release possible. Special thanks to:

  • The growing community of adopters for their valuable feedback and production insights
  • Everyone who reported bugs, submitted PRs, and participated in design discussions
  • The Envoy Gateway team for their continued collaboration

🔮 What's Next

We're already working on features for future releases:

  • Quota-aware routing — building on the new backend quota policy API to route around rate-limited upstreams automatically
  • Deeper MCP authorization — finer-grained policy across tools, resources, and prompts
  • Expanded provider coverage — additional embeddings, audio, and image generation backends across cloud providers (including Cohere on Bedrock)
  • More efficient large-context handling — continued improvements to streaming, memory use, and tracing for long-context workloads

Don't miss a new ai-gateway release

NewReleases is sending notifications on new releases.