github opstrace/opstrace v2021.08.13
Opstrace v2021.08.13

latest releases: v2021.11.17, v2021.09.17
2 years ago

The full set of commits compared to the last release (v2021.07.23) is listed here.

What's new

  • We integrated the opstrace/cortex-operator to manage the lifecycle of Cortex components in Opstrace. For now this is not expected to result in any user-facing behavioral change. However, it paves the pathway towards architectural consolidation and robustness.
  • Opstrace now comes with the Loki query frontend (#1140). From the Loki documentation: "One of the most important functions of the query frontend is the ability to split larger queries into smaller ones, execute them in parallel, and stitch the results back together."
  • The CLI now has an info command to inspect the version information of the various components an Opstrace instance is comprised of. Thanks to Eric Stroczynski for the contribution (#1047).
  • The Kubernetes Log Integration now supports container logs in both, the CRI/containerd format and in the dockerd format. A radio button was added to the UI for you to make a choice (#1141).
  • The UI now shows a dashboard for pod metrics from the Kubernetes Metrics Integration (#1077).

Component versions bumps

Security fixes

  • We fixed a vulnerability in the tenant API authenticator which allowed for cross-tenant data writing. Exploiting this required holding valid authentication proof for one of the tenants (#1144).
  • We fixed a vulnerability in the UI login where user information was consumed from a non-trustworthy part of the login HTTP request, instead of consuming it from a cryptographically signed artifact.

Fixed and improved

CLI:

  • opstrace create
    • Error handling around GCP service connection creation (between the Opstrace instance VPC and a Cloud SQL instance) was improved. This is to gain insight into how exactly the service connection creation may fail, and for better retrying (#1197).
    • For AWS, we added support for the ap-northeast-3 region and removed support for the cn-* regions because of instabilities and API discrepancies. Please chime in on #1202 if you have opinions on this topic.
  • opstrace destroy
    • GCP: we addressed a problem as of which DNS managed zones were not properly deleted (#1198).
  • opstrace upgrade
    • The command now exits early when the current and the new versions match (#1225).

UI:

  • Login:
    • The "Access Denied" page is now only shown when there is a factual lack of privilege. Previously, this page was erroneously shown for authentication issues and transient issues of various kinds as well as for internal server errors (#1272).
    • A new "Login Error" view was added for all login errors that are not related to a lack of privilege (#1272).
    • Client-side React state management around login was consolidated. This is to address a set of problems leading to the display of just a white screen upon or during login, a symptom frequently observed by users (#1115).
    • The server-side login routine robustness was enhanced by making the JSON Web Key Set fetcher more resilient to transient issues (#1224).
  • Error handling improvements landed for:
    • installing and uninstalling an Integration (#1199, #1138, #1088).
    • performing certain HTTP requests (#1188).
    • displaying the hash ring table (#1149).
  • We added ingest URLs to the Getting Started page.
  • The UI now shows build information.
  • A number of WebSocket setup errors shown in the browser console were fixed.

Core:

  • A rare error condition was fixed as of which the controller might become evicted (#1152).
  • The controller log output was enhanced for better debuggability of stuck deployments (#1099).
  • Fixed a condition as of which the controller log output could become huge (#1232).

Documentation:

  • We added a new section to the "Configuring Alertmanager" user guide about using the new unified alerting UI for configuring and managing alerts.

Notable changes

  • New Opstrace installations on GCP now use GKE version 1.19 (#1171). Note that Opstrace uses GKE's STABLE release channel and also GKE's default of having auto-upgrades for the API server as well as node pools enabled. If you upgrade from an older Opstrace release to this Opstrace release then you are probably still using GKE 1.18 under the hood -- we have prepared a special upgrade path which keeps the Opstrace system log collection compatible with both, GKE 1.18 and 1.19 (which is non-trivial, because the container log format changed between both versions), see #1264.

Developer experience and QA

This section does not aim for completeness. Yet, we'd like to point out some significant changes around developer experience and testing.

  • CI now enforces prettier --check to pass for a large fraction of our TypeScript code base.
  • The README for test-remote was overhauled.
  • The README for UI development was improved.
  • VSCode workspace settings were consolidated and better documented.
  • UI TypeScript typings underwent a cleanup. Thanks @MoSattler.
  • We added a bunch of automated tests for the UI, for example around folder deletion.

Don't miss a new opstrace release

NewReleases is sending notifications on new releases.