github apollographql/router v1.13.0

latest releases: v2.0.0-preview.1, v1.57.1, v1.57.1-rc.0...
20 months ago

πŸš€ Features

Uplink metrics and improved logging (Issue #2769, Issue #2815, Issue #2816)

For monitoring, observability and debugging requirements around Uplink-related behaviors (those which occur as part of Managed Federation) the router now emits better log messages and emits new metrics around these facilities. The new metrics are:

  • apollo_router_uplink_duration_seconds_bucket: A histogram of durations with the following attributes:

    • url: The URL that was polled
    • query: SupergraphSdl or Entitlement
    • type: new, unchanged, http_error, uplink_error, or ignored
    • code: The error code, depending on type
    • error: The error message
  • apollo_router_uplink_fetch_count_total: A gauge that counts the overall success (status="success") or failure (status="failure") counts that occur when communicating to Uplink without taking into account fallback.

⚠️ The very first poll to Uplink is unable to capture metrics since its so early in the router's lifecycle that telemetry hasn't yet been setup. We consider this a suitable trade-off and don't want to allow perfect to be the enemy of good.

Here's an example of what these new metrics look like from the Prometheus scraping endpoint:

# HELP apollo_router_uplink_fetch_count_total apollo_router_uplink_fetch_count_total
# TYPE apollo_router_uplink_fetch_count_total gauge
apollo_router_uplink_fetch_count_total{query="SupergraphSdl",service_name="apollo-router",status="success"} 1
# HELP apollo_router_uplink_fetch_duration_seconds apollo_router_uplink_fetch_duration_seconds
# TYPE apollo_router_uplink_fetch_duration_seconds histogram
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.001"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.005"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.015"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.05"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.1"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.2"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.3"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.4"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.5"} 1
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="1"} 1
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="5"} 1
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="10"} 1
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="+Inf"} 1
apollo_router_uplink_fetch_duration_seconds_sum{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/"} 0.465257131
apollo_router_uplink_fetch_duration_seconds_count{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/"} 1

By @BrynCooke in #2779, #2817, #2819 #2826

πŸ› Fixes

Only process Uplink messages that are deemed to be newer (Issue #2794)

Uplink is backed by multiple cloud providers to ensure high availability. However, this means that there will be periods of time where Uplink endpoints do not agree on what the latest data is. They are eventually consistent.

This has not been a problem for most users, as the default mode of operation for the router is to fallback to the secondary Uplink endpoint if the first fails.

The other mode of operation, is round-robin, which is triggered only when setting the APOLLO_UPLINK_ENDPOINTS environment variable. In this mode there is a much higher chance that the router will go back and forth between schema versions due to disagreement between the Apollo Uplink servers or any user-provided proxies set into this variable.

This change introduces two fixes:

  1. The Router will only use fallback strategy. Uplink endpoints are not strongly consistent, and therefore it is better to always poll a primary source of information if available.
  2. Uplink already handled freshness of schema but now also handles entitlement freshness.

Note: We advise against using APOLLO_UPLINK_ENDPOINTS to try to cache uplink responses for high availability purposes. Each request to Uplink currently sends state which limits the usefulness of such a cache.

By @BrynCooke in #2803, #2826, #2846

Distributed caching: Don't send Redis' CLIENT SETNAME (PR #2825)

We won't send the CLIENT SETNAME command to connected Redis servers. This resolves an incompatibility with some Redis-compatible servers since not allΒ "Redis-compatible" offerings (like Google Memorystore) actually support every Redis command. We weren't actually necessitating this feature, it was just a feature that could be enabled optionally on our Redis client. No Router functionality is impacted.

By @Geal in #2825

Support bare top-level __typename when aliased (Issue #2792)

PR #1762 implemented support for the query { __typename } but it did not work properly if the top-level standalone __typename field was aliased. This now works properly.

By @glasser in #2791

Maintain errors set on _entities (Issue #2731)

In their responses, some subgraph implementations do not return errors per entity but instead on the entire path. We now transmit those, irregardless.

By @Geal in #2756

πŸ“ƒ Configuration

Custom OpenTelemetry Datadog exporter mapping (Issue #2228)

This PR fixes the issue with the Datadog exporter not providing meaningful contextual data in the Datadog traces.
There is a known issue where OpenTelemetry is not fully compatible with Datadog.

To fix this, the opentelemetry-datadog crate added custom mapping functions.

Now, when enable_span_mapping is set to true, the Apollo Router will perform the following mapping:

  1. Use the OpenTelemetry span name to set the Datadog span operation name.
  2. Use the OpenTelemetry span attributes to set the Datadog span resource name.

For example:

Let's say we send a query MyQuery to the Apollo Router, then the Router using the operation's query plan will send a query to my-subgraph-name, producing the following trace:

    | apollo_router request                                                                 |
        | apollo_router router                                                              |
            | apollo_router supergraph                                                      |
            | apollo_router query_planning  | apollo_router execution                       |
                                                | apollo_router fetch                       |
                                                    | apollo_router subgraph                |
                                                        | apollo_router subgraph_request    |

As you can see, there is no clear information about the name of the query, the name of the subgraph, or the name of query sent to the subgraph.

Instead, with this new enable_span_mapping setting set to true, the following trace will be created:

    | request /graphql                                                                                   |
        | router                                                                                         |
            | supergraph MyQuery                                                                         |
                | query_planning MyQuery  | execution                                                    |
                                              | fetch fetch                                              |
                                                  | subgraph my-subgraph-name                            |
                                                      | subgraph_request MyQuery__my-subgraph-name__0    |

All this logic is gated behind the configuration enable_span_mapping which, if set to true, will take the values from the span attributes.

By @samuelAndalon in #2790

πŸ›  Maintenance

Migrate xtask CLI parsing from StructOpt to Clap (Issue #2807)

As an internal improvement to our tooling, we've migrated our xtask toolset from StructOpt to Clap, since StructOpt is in maintenance mode.

By @BrynCooke in #2808

Subgraph configuration override (Issue #2426)

We've introduced a new generic wrapper type for subgraph-level configuration, with the following behaviour:

  • If there's a config in all, it applies to all subgraphs. If it is not there, the default values apply
  • If there's a config in subgraphs for a specific named subgraph:
    • the fields it specifies override the fields specified in all
    • the fields it does not specify uses the values provided by all, or default values, if applicable

By @Geal in #2453

Add integration tests for Uplink URLs (Issue #2827)

We've added integration tests to ensure that all Uplink URLs can be contacted and data can be retrieved in an expected format.

We've also changed our URLs to align exactly with Gateway, to simplify our own documentation. Existing Router users do not need to take any action as we support both on our infrastructure.

By @BrynCooke in #2830, #2834

Improve integration test harness (Issue #2809)

Our internal integration test harness has been simplified.

By @BrynCooke in #2810

Use kubeconform to validate the Router's Helm manifest (Issue #1914)

We've had a couple cases where errors have been inadvertently introduced to our Helm charts. These have required fixes such as this fix. So far, we've been relying on manual testing and inspection, but we've reached the point where automation is desired. This change uses kubeconform to ensure that the YAML generated by our Helm manifest is indeed valid. Errors may still be possible, but this should at least prevent basic errors from occurring. This information will be surfaced in our CI checks.

By @garypen in #2835

πŸ“š Documentation

Re-point links going via redirect to their true sources

Some of our documentation links were pointing to pages which have been renamed and received new page names during routine documentation updates. While the links were not broken (the former links redirected to the new URLs) we've updated them to avoid the extra hop

By @o0Ignition0o in #2780

Fix coprocessor docs about subgraph URI mutability

The subgraph uri is (and always has been) mutable when responding to the SubgraphRequest stage in a coprocessor.

By @lennyburdette in #2801

Don't miss a new router release

NewReleases is sending notifications on new releases.