🚀 Features
Add support for connector header propagation via YAML config (PR #7152)
Added support for connector header propagation via YAML config. All of the existing header propagation in the Router now works for connectors by using
headers.connector.all to apply rules to all connectors or headers.connector.sources.* to apply rules to specific sources.
Note that if one of these rules conflicts with a header set in your schema, either in @connect or @source, the value in your Router config will
take priority and be treated as an override.
headers:
connector:
all: # configuration for all connectors across all subgraphs
request:
- insert:
name: "x-inserted-header"
value: "hello world!"
- propagate:
named: "x-client-header"
sources:
connector-graph.random_person_api:
request:
- insert:
name: "x-inserted-header"
value: "hello world!"
- propagate:
named: "x-client-header"By @andrewmcgivery in #7152
Enable configuration auto-migration for minor version bumps (PR #7162)
To facilitate configuration evolution within major versions of the router's lifecycles (e.g., within 2.x.x versions), YAML configuration migrations are applied automatically. To avoid configuration drift and facilitate maintenance, when upgrading to a new major version the migrations from the previous major (e.g., 1.x.x) will not be applied automatically. These will need to be applied with router config upgrade prior to the upgrade. To facilitate major version upgrades, we recommend regularly applying the configuration changes using router config upgrade and committing those to your version control system.
Allow expressions in more locations in Connectors URIs (PR #7220)
Previously, we only allowed expressions in very specific locations in Connectors URIs:
- A path segment, like
/users/{$args.id} - A query parameter's value, like
/users?id={$args.id}
Expressions can now be used anywhere in or after the path of the URI.
For example, you can do
@connect(http: {GET: "/users?{$args.filterName}={$args.filterValue}"}).
The result of any expression will always be percent encoded.
Note: Parts of this feature are only available when composing with Apollo Federation v2.11 or above (currently in preview).
By @dylan-apollo in #7220
Enables reporting of persisted query usage by PQ ID to Apollo (PR #7166)
This change allows the router to report usage metrics by persisted query ID to Apollo, so that we can show usage stats for PQs.
Instrument coprocessor request with http_request span (Issue #6739)
Coprocessor requests will now emit an http_request span. This span can help to gain
insight into latency that may be introduced over the network stack when communicating with coprocessor.
Coprocessor span attributes are:
otel.kind:CLIENThttp.request.method:POSTserver.address:<target address>server.port:<target port>url.full:<url.full>otel.name:<method> <url.full>otel.original_name:http_request
Enables reporting for client libraries that send the library name and version information in operation requests. (PR #7264)
Apollo client libraries can send the library name and version information in the extensions key of an operation request. If those values are found in a request the router will include them in the telemetry operation report sent to Apollo.
By @calvincestari in #7264
Add compute job pool spans (PR #7236)
The compute job pool in the router is used to execute CPU intensive work outside of the main I/O worker threads, including GraphQL parsing, query planning, and introspection.
This PR adds spans to jobs that are on this pool to allow users to see when latency is introduced due to
resource contention within the compute job pool.
compute_job:job.type: (query_parsing|query_planning|introspection)
compute_job.executionjob.age:P1-P8job.type: (query_parsing|query_planning|introspection)
Jobs are executed highest priority (P8) first. Jobs that are low priority (P1) age over time, eventually executing
at highest priority. The age of a job is can be used to diagnose if a job was waiting in the queue due to other higher
priority jobs also in the queue.
By @BrynCooke in #7236
JWT authorization supports multiple issuers (Issue #6172)
Allow JWT authorization options to support multiple issuers using the same JWKS.
Configuration change: any issuer defined on currently existing authentication.router.jwt.jwks needs to be
migrated to an entry in the issuers list. This configuration will happen automatically until the next major version of the router. This change can be committed using ./router config upgrade prior to the next major release.
For example, the following configuration:
authentication:
router:
jwt:
jwks:
- url: https://dev-zzp5enui.us.auth0.com/.well-known/jwks.json
issuer: https://issuer.oneWill be changed to contain an array of issuers rather than a single issuer:
authentication:
router:
jwt:
jwks:
- url: https://dev-zzp5enui.us.auth0.com/.well-known/jwks.json
issuers:
- https://issuer.one
- https://issuer.two🐛 Fixes
Fix JWT metrics discrepancy (PR #7258)
This fixes the apollo.router.operations.authentication.jwt counter metric to behave as documented: emitted for every request that uses JWT, with the authentication.jwt.failed attribute set to true or false for failed or successful authentication.
Previously, it was only used for failed authentication.
The attribute-less and accidentally-differently-named apollo.router.operations.jwt counter was and is only emitted for successful authentication, but is deprecated now.
By @SimonSapin in #7258
Fix potential telemetry deadlock (PR #7142)
The tracing_subscriber crate uses RwLocks to manage access to a Span's Extensions. Deadlocks are possible when
multiple threads access this lock, including with reentrant locks:
// Thread 1 | // Thread 2
let _rg1 = lock.read(); |
| // will block
| let _wg = lock.write();
// may deadlock |
let _rg2 = lock.read(); |
This fix removes an opportunity for reentrant locking while extracting a Datadog identifier.
There is also a potential for deadlocks when the root and active spans' Extensions are acquired at the same time, if
multiple threads are attempting to access those Extensions but in a different order. This fix removes a few cases
where multiple spans' Extensions are acquired at the same time.
By @carodewig in #7142
Check if JWT claim is part of the context before getting the JWT expiration with subscriptions (PR #7069)
In v2.1.0 we introduced logs for the jwt_expires_in function which caused an unexpectedly chatty logging when using subscriptions.
Parse nested input types and report them (PR #6900)
Fixes a bug where enums that were arguments to nested queries were not being reported.
Add compute job pool metrics (PR #7184)
The compute job pool is used within the router for compute intensive jobs that should not block the Tokio worker threads.
When this pool becomes saturated it is difficult for users to see why so that they can take action.
This change adds new metrics to help users understand how long jobs are waiting to be processed.
New metrics:
apollo.router.compute_jobs.queue_is_full- A counter of requests rejected because the queue was full.apollo.router.compute_jobs.duration- A histogram of time spent in the compute pipeline by the job, including the queue and query planning.job.type: (query_planning,query_parsing,introspection)job.outcome: (executed_ok,executed_error,channel_error,rejected_queue_full,abandoned)
apollo.router.compute_jobs.queue.wait.duration- A histogram of time spent in the compute queue by the job.job.type: (query_planning,query_parsing,introspection)
apollo.router.compute_jobs.execution.duration- A histogram of time spent to execute job (excludes time spent in the queue).job.type: (query_planning,query_parsing,introspection)
apollo.router.compute_jobs.active_jobs- A gauge of the number of compute jobs being processed in parallel.job.type: (query_planning,query_parsing,introspection)
By @carodewig in #7184
Preserve trailing slashes in Connectors URIs (PR #7220)
Previously, a URI like @connect(http: {GET: "/users/"}) could be normalized to @connect(http: {GET: "/users"}). This
change preserves the trailing slash, which is significant to some web servers.
By @dylan-apollo in #7220
Support @context/@fromcontext when using Connectors (PR #7132)
This fixes a bug that dropped the @context and @fromContext directives when introducing a connector.
By @lennyburdette in #7132
telemetry: correctly apply conditions on events (PR #7325)
Fixed a issue where conditional telemetry events weren't being properly evaluated.
This affected both standard events (response, error) and custom telemetry events.
For example in config like this:
telemetry:
instrumentation:
events:
supergraph:
request:
level: info
condition:
eq:
- request_header: apollo-router-log-request
- testing
response:
level: info
condition:
eq:
- request_header: apollo-router-log-request
- testingThe Router would emit the request event when the header matched, but never emit the response event - even with the same matching header.
This fix ensures that all event conditions are properly evaluated, restoring expected telemetry behavior and making conditional logging work correctly throughout the entire request lifecycle.
By @IvanGoncharov in #7325
Connection shutdown timeout 1.x (PR #7058)
When a connection is closed we call graceful_shutdown on hyper and then await for the connection to close.
Hyper 0.x has various issues around shutdown that may result in us waiting for extended periods for the connection to eventually be closed.
This PR introduces a configurable timeout from the termination signal to actual termination, defaulted to 60 seconds. The connection is forcibly terminated after the timeout is reached.
To configure, set the option in router yaml. It accepts human time durations:
supergraph:
connection_shutdown_timeout: 60s
Note that even after connections have been terminated the router will still hang onto pipelines if early_cancel has not been configured to true. The router is trying to complete the request.
Users can either set early_cancel to true
supergraph:
early_cancel: true
AND/OR use traffic shaping timeouts:
traffic_shaping:
router:
timeout: 60s
By @BrynCooke in #7058
Clarify tracing error messages in coprocessor's stages (PR #6791)
Trace messages in coprocessors used external extensibility namespace. They now use coprocessor in the message instead for clarity.
Fix crash when an invalid query plan is generated (PR #7214)
When an invalid query plan is generated, the router could panic and crash.
This could happen if there are gaps in the GraphQL validation implementation.
Now, even if there are unresolved gaps, the router will handle it gracefully and reject the request.
By @goto-bus-stop in #7214
Fix Apollo request metadata generation for errors (PR #7021)
- Fixes the Apollo operation ID and name generated for requests that fail due to parse, validation, or invalid operation name errors.
- Updates the error code generated for operations with an invalid operation name from GRAPHQL_VALIDATION_FAILED to GRAPHQL_UNKNOWN_OPERATION_NAME
Enable Integer Error Code Reporting (PR #7226)
Fixes an issue where numeric error codes (e.g. 400, 500) were not properly parsed into a string and thus were not
reported to Apollo error telemetry.
By @rregitsky in #7226
Increase compute job pool queue size (PR #7205)
The compute job pool in the router is used to execute CPU intensive work outside of the main I/O worker threads, including GraphQL parsing, query planning, and introspection. When the pool is busy, jobs enter a queue.
We previously set this queue size to 20 (per thread). However, this may be too small on resource constrained environments.
This patch increases the queue size to 1,000 jobs per thread. For reference, in older router versions before the introduction of the compute job worker pool, the equivalent queue size was 1,000.
By @goto-bus-stop in #7205
Relax percent encoding for Connectors (PR #7220)
Characters outside of { } expressions will no longer be percent encoded unless they are completely invalid for a
URI. For example, in an expression like @connect(http: {GET: "/products?filters[category]={$args.category}"}) the
square
braces [ ] will no longer be percent encoded. Any string from within a dynamic { } will still be percent encoded.
By @dylan-apollo in #7220
Preserve data: null when handling coprocessor GraphQL responses which included errors (PR #7141)
Previously, Router incorrectly swallowed data: null conditions on GraphQL responses returned from a coprocessor.
According to GraphQL Spectification:
If an error was raised during the execution that prevented a valid response, the "data" entry in the response should be null.
That means if coprocessor returned a valid execution error, for example:
{
"data": null,
"errors": [{ "message": "Some execution error" }]
}It was incorrect (and inadvertent) to return the following response to the client:
{
"errors": [{ "message": "Some execution error" }]
}This fix ensures compliance with the GraphQL specification in this regard by preserving the complete structure of the response returned from coprocessors.
Contributed by @IvanGoncharov in #7141
Helm: Correct default telemetry resource property in ConfigMap (Issue #6104)
The Helm chart was using an outdated value when emitting the telemetry.exporters.metrics.common.resource.service.name values. This has been updated to use the correct (singular) version of resource (rather than the incorrect resources which was used earlier in 1.x's life-cycle).
By @vatsalpatel in #6105
Update Dockerfile exec script to use #!/bin/bash instead of #!/usr/bin/env bash (Issue #3517)
For users of Google Cloud Platform (GCP) Cloud Run platform, using the router's default Docker image was not possible due to an error that would occur during startup:
"/usr/bin/env: 'bash ': No such file or directory"To avoid this issue, we've changed the script to use #!/bin/bash instead of #!/usr/bin/env bash, as we use a fixed Linux distribution in Docker which has the Bash binary located in a fixed location.
Remove "setting resource attributes is not allowed" warning (PR #7272)
If Uplink was enabled, Router 2.1.x emitted this warning at startup even when there was no user configuration responsible for the condition:
WARN setting resource attributes is not allowed for Apollo telemetry
The warning is removed entirely.
By @SimonSapin in #7272
📃 Configuration
Customization of "header read timeout" (PR #7262)
This change exposes the server's header read timeout as the server.http.header_read_timeout configuration option.
By default, the server.http.header_read_timeout is set to previously hard-coded 10 seconds. A longer timeout can be configured using the server.http.header_read_timeout option.
server:
http:
header_read_timeout: 30sBy @gwardwell in #7262
Fine-grained control over include_subgraph_errors (Issue #6402
Update include_subgraph_errors with additional configuration options for both global and subgraph levels. This update provides finer control over error messages and extension keys for each subgraph.
For more details, please read subgraph error inclusion.
include_subgraph_errors:
all:
redact_message: true
allow_extensions_keys:
- code
subgraphs:
product:
redact_message: false # Propagate original error messages
allow_extensions_keys: # Extend global allow list - `code` and `reason` will be propagated
- reason
exclude_global_keys: # Exclude `code` from global allow list - only `reason` will be propagated.
- code
account:
deny_extensions_keys: # Overrides global allow list
- classification
review: false # Redact everything.
# Undefined subgraphs inherits default global settings from `all`Note: Using a deny_extensions_keys approach carries security risks because any sensitive information not explicitly included in the deny list will be exposed to clients. For better security, subgraphs should prefer to redact everything or allow_extensions_keys when possible.
By @Samjin and @BrynCooke in #7164
Add new configurable delivery pathway for high cardinality GraphOS Studio metrics (PR #7138)
This change provides a secondary pathway for new "realtime" GraphOS Studio metrics whose delivery interval is configurable due to their higher cardinality. These metrics will respect telemetry.apollo.batch_processor.scheduled_delay as configured on the realtime path. All other Apollo metrics will maintain the previous hardcoded 60s send interval.
By @rregitsky and @timbotnik in #7138
📚 Documentation
GraphQL error codes that can occur during router execution (PR #7160)
Added documentation for more GraphQL error codes that can occur during router execution, including better differentiation between HTTP status codes and GraphQL error extensions codes.
By @timbotnik in #7160
Update API Gateway tech note (PR #7261)
Update the Router vs Gateway Tech Note with more details now that we have connectors
Extended errors preview configuration (PR 7038)
We've introduced documentation for GraphOS extended error reporting.
By @timbotnik in #7038
Add tip about Apollo-Expose-Query-Plan: dry-run to Cache warm-up (PR #6973)
The Cache warm-up documentation now flags the availability of the Apollo-Expose-Query-Plan: dry-run header.