🚀 Features
Add ResponseErrors
selector to router response (PR #7882)
The ResponseErrors
selector in telemetry configurations captures router response errors, enabling you to log errors encountered at the router service layer. This selector enhances logging by allowing you to log only router errors instead of the entire router response body, reducing noise in your telemetry data.
telemetry:
instrumentation:
events:
router:
router.error:
attributes:
"my_attribute":
response_errors: "$.[0]"
# Examples: "$.[0].message", "$.[0].locations", "$.[0].extensions", etc.
By @Aguilarjaf in #7882
🐛 Fixes
_entities
Apollo error metrics missing service attribute (PR #8153)
The error counting feature introduced in v2.5.0 caused _entities
errors from subgraph fetches to no longer report a service (subgraph or connector) attribute. This incorrectly categorized these errors as originating from the router instead of their actual service in Apollo Studio.
The service attribute is now correctly included for _entities
errors.
By @rregitsky in #8153
WebSocket connection cleanup for subscriptions (PR #8104)
A regression introduced in v2.5.0 caused WebSocket connections to subgraphs to remain open after all client subscriptions ended. This led to unnecessary resource usage and connections not being cleaned up until a new event was received.
The router now correctly closes WebSocket connections to subgraphs when clients disconnect from subscription streams.
OTLP metrics Up/Down counter drift (PR #8174)
When using OTLP metrics export with delta temporality configured, UpDown counters could exhibit drift issues where counter values became inaccurate over time. This occurred because UpDown counters were incorrectly exported as deltas instead of cumulative values.
UpDown counters now export as aggregate values according to the OpenTelemetry specification.
By @BrynCooke in #8174
WebSocket subscription connection_error
message handling (Issue #6138)
The router now correctly processes connection_error
messages from subgraphs that don't include an id
field. Previously, these messages were ignored because the router incorrectly required an id
field. According to the graphql-transport-ws
specification, connection_error
messages only require a payload
field.
The id
field is now optional for connection_error
messages, allowing underlying error messages to propagate to clients when connection failures occur.
By @jeffutter in #8189
Add Helm chart support for deployment annotations (PR #8164)
The Helm chart now supports customizing annotations on the deployment itself using the deploymentAnnotations
value. Previously, you could only customize pod annotations with podAnnotations
.
Uncommon query planning error with interface object types (PR #8109)
An uncommon query planning error has been resolved: "Cannot add selection of field X
to selection set of parent type Y
that is potentially an interface object type at runtime". The router now handles __typename
selections from interface object types correctly, as these selections are benign even when unnecessary.
Connection shutdown race condition during hot reload (PR #8169)
A race condition during hot reload that occasionally left connections in an active state instead of terminating has been fixed. This issue could cause out-of-memory errors over time as multiple pipelines remained active.
Connections that are opening during shutdown now immediately terminate.
By @BrynCooke in #8169
Persisted Query usage reporting for safelisted operation body requests (PR #8168)
Persisted Query metrics now include operations requested by safelisted operation body. Previously, the router only recorded metrics for operations requested by ID.
📃 Configuration
Separate Apollo telemetry batch processor configurations (PR #8258)
Apollo telemetry configuration now allows separate fine-tuning for metrics and traces batch processors. The configuration has changed from:
telemetry:
apollo:
batch_processor:
scheduled_delay: 5s
max_export_timeout: 30s
max_export_batch_size: 512
max_concurrent_exports: 1
max_queue_size: 2048
To:
telemetry:
apollo:
tracing:
# Config for Apollo OTLP and Apollo usage report traces
batch_processor:
max_export_timeout: 130s
scheduled_delay: 5s
max_export_batch_size: 512
max_concurrent_exports: 1
max_queue_size: 2048
metrics:
# Config for Apollo OTLP metrics.
otlp:
batch_processor:
scheduled_delay: 13s # This does not apply config gauge metrics, which have a non-configurable scheduled_delay.
max_export_timeout: 30s
# Config for Apollo usage report metrics.
usage_reports:
batch_processor:
max_export_timeout: 30s
scheduled_delay: 5s
max_queue_size: 2048
The old telemetry.apollo.batch_processor
configuration will be used if you don't specify these new values. The router displays the configuration being used in an info-level log message at startup.
Promote Subgraph Insights metrics flag to preview (PR #8200)
The subgraph_metrics
configuration flag that powers Apollo Studio's Subgraph Insights feature has been promoted from experimental
to preview
. The flag name has been updated from experimental_subgraph_metrics
to preview_subgraph_metrics
:
telemetry:
apollo:
preview_subgraph_metrics: true
By @rregitsky in #8200