🚀 Features
Add ability to transmit un-redacted errors from federated traces to Apollo Studio
When using subgraphs which are enabled with Apollo Federated Tracing, the error messages within those traces will be redacted by default.
New configuration (tracing.apollo.errors.subgraph.all.redact
, which defaults to true
) enables or disables the redaction mechanism. Similar configuration (tracing.apollo.errors.subgraph.all.send
, which also defaults to true
) enables or disables the entire transmission of the error to Studio.
The error messages returned to the clients are not changed or redacted from their previous behavior.
To enable sending subgraph's federated trace error messages to Studio without redaction, you can set the following configuration:
telemetry:
apollo:
errors:
subgraph:
all:
send: true # (true = Send to Studio, false = Do not send; default: true)
redact: false # (true = Redact full error message, false = Do not redact; default: true)
It is also possible to configure this per-subgraph using a subgraphs
map at the same level as all
in the configuration, much like other sections of the configuration which have subgraph-specific capabilities:
telemetry:
apollo:
errors:
subgraph:
all:
send: true
redact: false # Disable redaction as a default. The `accounts` service enables it below.
subgraphs:
accounts: # Applies to the `accounts` subgraph, overriding the `all` global setting.
redact: true # Redact messages from the `accounts` service.
Introduce response.is_primary
Rhai helper for working with deferred responses (Issue #2935) (Issue #2936)
A new Rhai response.is_primary()
helper has been introduced that returns false
when the current chunk being processed is a deferred response chunk. Put another way, it will be false
if the chunk is a follow-up response to the initial primary response, during the fulfillment of a @defer
'd fragment in a larger operation. The initial response will be is_primary() == true
. This aims to provide the right primitives so users can write more defensible error checking. The introduction of this relates to a bug fix noted in the Fixes section below.
Time-based forced hot-reload for "chaos" testing
For testing purposes, the Router can now artificially be forced to hot-reload (as if the configuration or schema had changed) at a configured time interval. This can help reproduce issues like reload-related memory leaks. We don't recommend using this in any production environment. (If you are compelled to use it in production, please let us know about your use case!)
The new configuration section for this "chaos" testing is (and will likely remain) marked as "experimental":
experimental_chaos:
force_hot_reload: 1m
By @SimonSapin in #2988
Provide helpful console output when using "preview" features, just like "experimental" features
This expands on the existing mechanism that was originally introduced in #2242, which supports the notion of an "experimental" feature, and make it compatible with the notion of "preview" features.
When preview or experimental features are used, an INFO
-level log is emitted during startup to notify of which features are used and shows URLs to their GitHub discussions, for feedback. Additionally, router config experimental
and router config preview
CLI sub-commands list all such features in the current Router version, regardless of which are used in a given configuration file.
For more information about launch stages, please see the documentation here: https://www.apollographql.com/docs/resources/product-launch-stages/
By @o0Ignition0o, @abernix, and @SimonSapin in #2960
Report operationCountByType
counts to Apollo Studio (PR #2979)
This adds the ability for Studio to track operation counts broken down by type of operations (e.g., query
vs mutation
). Previously, we only reported total operation count.
🐛 Fixes
Update to Federation v2.4.2
This update to Federation v2.4.2 fixes a potential bug when an @interfaceObject
type has a @requires
. This might be encountered when an @interfaceObject
type has a field with a @requires
and the query requests that field only for some specific implementations of the corresponding interface. In this case, the generated query plan was sometimes invalid and could result in an invalid query to a subgraph. In the case that the subgraph was an Apollo Server implementation, this lead the subgraph producing an "The _entities resolver tried to load an entity for type X, but no object or interface type of that name was found in the schema"
error.
Fix handling of deferred response errors from Rhai scripts (Issue #2935) (Issue #2936)
If a Rhai script was to error while processing a deferred response (i.e., an operation which uses @defer
) the Router was ignoring the error and returning None
in the stream of results. This had two unfortunate aspects:
- the error was not propagated to the client
- the stream was terminated (silently)
With this fix we now capture the error and still propagate the response to the client. This fix also adds support for the is_primary()
method which may be invoked on both supergraph_service()
and execution_service()
responses. It may be used to avoid implementing exception handling for header interactions and to determine if a response is_primary()
(i.e., first) or not.
e.g.:
if response.is_primary() {
print(`all response headers: `);
} else {
print(`don't try to access headers`);
}
vs
try {
print(`all response headers: `);
}
catch(err) {
if err == "cannot access headers on a deferred response" {
print(`don't try to access headers`);
}
}
Note
This is a minimal example for purposes of illustration which doesn't exhaustively check all error conditions. An exception handler should always handle all error conditions.
Fix incorrectly placed "message" in Rhai JSON-formatted logging (Issue #2777)
This fixes a bug where Rhai logging was incorrectly putting the message of the log into the out
attribute, when serialized as JSON. Previously, the message
field was showing rhai_{{level}}
(i.e., rhai_info
), despite there being a separate level
field in the JSON structure.
The impact of this fix can be seen in this example where we call log_info()
in a Rhai script:
log_info("this is info");
Previously, this would result in a log as follows, with the text of the message set within out
, rather than message
.
{"timestamp":"2023-04-19T07:46:15.483358Z","level":"INFO","message":"rhai_info","out":"this is info"}
After the change, the message is correctly within message
. The level continues to be available at level
. We've also additionally added a target
property which shows the file which produced the error:
{"timestamp":"2023-04-19T07:46:15.483358Z","level":"INFO","message":"this is info","target":"src/rhai_logging.rhai"}
Deferred responses now utilize compression, when requested (Issue #1572)
We previously had to disable compression on deferred responses due to an upstream library bug. To fix this, we've replaced tower-http
's CompressionLayer
with a custom stream transformation. This is necessary because tower-http
uses async-compression
under the hood, which buffers data until the end of the stream, analyzes it, then writes it, ensuring a better compression. However, this is wholly-incompatible with a core concept of the multipart protocol for @defer
, which requires chunks to be sent as soon as possible. To support that, we need to compress chunks independently.
This extracts parts of the codec
module of async-compression
, which so far is not public, and makes a streaming wrapper above it that flushes the compressed data on every response within the stream.
Update the h2
dependency to fix a potential Denial-of-Service (DoS) vulnerability
Proactively addresses the advisory in https://rustsec.org/advisories/RUSTSEC-2023-0034, though we have no evidence that suggests it has been exploited on any Router deployment.
Rate limit errors emitted from OpenTelemetry (Issue #2953)
When a batch span exporter is unable to send accept a span because the buffer is full it will emit an error. These errors can be very frequent and could potentially impact performance. To mitigate this, OpenTelemetry errors are now rate limited to one every ten seconds, per error type.
By @BrynCooke in #2954
Improved messaging when a request is received without an operation (Issue #2941)
The message that is displayed when a request has been sent to the Router without an operation has been improved. This materializes as a developer experience improvement since users (especially those using GraphqL for the first time) might send a request to the Router using a tool that isn't GraphQL-aware, or might just have their API tool of choice misconfigured.
Previously, the message stated "missing query string", but now more helpfully suggests sending either a POST or GET request and specifying the desired operation as the query
parameter (i.e., either in the POST data or in the query string parameters for GET queries).
By @kushal-93 in #2955
Traffic shaping configuration fix for global experimental_enable_http2
We've resolved a case where the experimental_enable_http2
feature wouldn't properly apply when configured with a global configuration.
Huge thanks to @westhechiang, @leggomuhgreggo, @vecchp and @davidvasandani for discovering the issue and finding a reproducible testcase!
By @o0Ignition0o in #2976
Limit the memory usage of the apollo
OpenTelemetry exporter (PR #3006)
We've added a new LRU cache in place of a Vec
for sub-span data to avoid keeping all events for a span in memory, since we don't need it for our computations.