cortexproject/cortex v1.3.0-rc.0 on GitHub

This Cortex release features 125 contributions from 37 different authors. It's yet another great milestone we have reached thanks to the amazing support from our community ❤️ Thanks!

Highlights:

The blocks storage is getting closer to production readiness. In this release we've done several fixes and improvements. In particular, you should be aware of:
- Some CLI flags and YAML config options have been renamed
- The store-gateway service is now mandatory when running the blocks storage
- Introduced support for a live cluster migration from chunks to blocks (and rollback)
- Introduced support to flush blocks on-demand from ingesters
The ruler and alertmanager got several improvements, including but not limited to:
- The ruler now runs in the single binary when Cortex gets started with -target=all
- Introduced new config options to fine-tune the ruler
- Introduced support to load locally stored rules (eg. loaded via Kubernetes config map)
- Multiple alertmanager URLs can now be specified in the ruler; each URL is treated as a separate alertmanager group
- Alertmanager configuration can be persisted to object storage via API
Other changes worth to note:
- Added optional snappy compression support to internal gRPC connections
- Starting from this release we're going to publish .rpm and .deb packages too

Please refer to the full changelog for full list of changes and improvements.

Changelog

[CHANGE] Replace the metric cortex_alertmanager_configs with cortex_alertmanager_config_invalid exposed by Alertmanager. #2960
[CHANGE] Experimental Delete Series: Change target flag for purger from data-purger to purger. #2777
[CHANGE] Experimental blocks storage: The max concurrent queries against the long-term storage, configured via -experimental.blocks-storage.bucket-store.max-concurrent, is now a limit shared across all tenants and not a per-tenant limit anymore. The default value has changed from 20 to 100 and the following new metrics have been added: #2797
- cortex_bucket_stores_gate_queries_concurrent_max
- cortex_bucket_stores_gate_queries_in_flight
- cortex_bucket_stores_gate_duration_seconds
[CHANGE] Metric cortex_ingester_flush_reasons has been renamed to cortex_ingester_flushing_enqueued_series_total, and new metric cortex_ingester_flushing_dequeued_series_total with outcome label (superset of reason) has been added. #2802, #2818
[CHANGE] Experimental Delete Series: Metric cortex_purger_oldest_pending_delete_request_age_seconds would track age of delete requests since they are over their cancellation period instead of their creation time. #2806
[CHANGE] Experimental blocks storage: the store-gateway service is required in a Cortex cluster running with the experimental blocks storage. Removed the -experimental.tsdb.store-gateway-enabled CLI flag and store_gateway_enabled YAML config option. The store-gateway is now always enabled when the storage engine is blocks. #2822
[CHANGE] Experimental blocks storage: removed support for -experimental.blocks-storage.bucket-store.max-sample-count flag because the implementation was flawed. To limit the number of samples/chunks processed by a single query you can set -store.query-chunk-limit, which is now supported by the blocks storage too. #2852
[CHANGE] Ingester: Chunks flushed via /flush stay in memory until retention period is reached. This affects cortex_ingester_memory_chunks metric. #2778
[CHANGE] Querier: the error message returned when the query time range exceeds -store.max-query-length has changed from invalid query, length > limit (X > Y) to the query time range exceeds the limit (query length: X, limit: Y). #2826
[CHANGE] Add component label to metrics exposed by chunk, delete and index store clients. #2774
[CHANGE] Querier: when -querier.query-ingesters-within is configured, the time range of the query sent to ingesters is now manipulated to ensure the query start time is not older than 'now - query-ingesters-within'. #2904
[CHANGE] KV: The role label which was a label of multi KV store client only has been added to metrics of every KV store client. If KV store client is not multi, then the value of role label is primary. #2837
[CHANGE] Added the engine label to the metrics exposed by the Prometheus query engine, to distinguish between ruler and querier metrics. #2854
[CHANGE] Added ruler to the single binary when started with -target=all (default). #2854
[CHANGE] Experimental blocks storage: compact head when opening TSDB. This should only affect ingester startup after it was unable to compact head in previous run. #2870
[CHANGE] Metric cortex_overrides_last_reload_successful has been renamed to cortex_runtime_config_last_reload_successful. #2874
[CHANGE] HipChat support has been removed from the alertmanager (because removed from the Prometheus upstream too). #2902
[CHANGE] Add constant label name to metric cortex_cache_request_duration_seconds. #2903
[CHANGE] Add user label to metric cortex_query_frontend_queue_length. #2939
[CHANGE] Experimental blocks storage: cleaned up the config and renamed "TSDB" to "blocks storage". #2937
- The storage engine setting value has been changed from tsdb to blocks; this affects -store.engine CLI flag and its respective YAML option.
- The root level YAML config has changed from tsdb to blocks_storage
- The prefix of all CLI flags has changed from -experimental.tsdb. to -experimental.blocks-storage.
- The following settings have been grouped under tsdb property in the YAML config and their CLI flags changed:
  - -experimental.tsdb.dir changed to -experimental.blocks-storage.tsdb.dir
  - -experimental.tsdb.block-ranges-period changed to -experimental.blocks-storage.tsdb.block-ranges-period
  - -experimental.tsdb.retention-period changed to -experimental.blocks-storage.tsdb.retention-period
  - -experimental.tsdb.ship-interval changed to -experimental.blocks-storage.tsdb.ship-interval
  - -experimental.tsdb.ship-concurrency changed to -experimental.blocks-storage.tsdb.ship-concurrency
  - -experimental.tsdb.max-tsdb-opening-concurrency-on-startup changed to -experimental.blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
  - -experimental.tsdb.head-compaction-interval changed to -experimental.blocks-storage.tsdb.head-compaction-interval
  - -experimental.tsdb.head-compaction-concurrency changed to -experimental.blocks-storage.tsdb.head-compaction-concurrency
  - -experimental.tsdb.head-compaction-idle-timeout changed to -experimental.blocks-storage.tsdb.head-compaction-idle-timeout
  - -experimental.tsdb.stripe-size changed to -experimental.blocks-storage.tsdb.stripe-size
  - -experimental.tsdb.wal-compression-enabled changed to -experimental.blocks-storage.tsdb.wal-compression-enabled
  - -experimental.tsdb.flush-blocks-on-shutdown changed to -experimental.blocks-storage.tsdb.flush-blocks-on-shutdown
[CHANGE] Flags -bigtable.grpc-use-gzip-compression, -ingester.client.grpc-use-gzip-compression, -querier.frontend-client.grpc-use-gzip-compression are now deprecated. #2940
[CHANGE] Limit errors reported by ingester during query-time now return HTTP status code 422. #2941
[FEATURE] Introduced ruler.for-outage-tolerance, Max time to tolerate outage for restoring "for" state of alert. #2783
[FEATURE] Introduced ruler.for-grace-period, Minimum duration between alert and restored "for" state. This is maintained only for alerts with configured "for" time greater than grace period. #2783
[FEATURE] Introduced ruler.resend-delay, Minimum amount of time to wait before resending an alert to Alertmanager. #2783
[FEATURE] Ruler: added local filesystem support to store rules (read-only). #2854
[ENHANCEMENT] Upgraded Docker base images to alpine:3.12. #2862
[ENHANCEMENT] Experimental: Querier can now optionally query secondary store. This is specified by using -querier.second-store-engine option, with values chunks or blocks. Standard configuration options for this store are used. Additionally, this querying can be configured to happen only for queries that need data older than -querier.use-second-store-before-time. Default value of zero will always query secondary store. #2747
[ENHANCEMENT] Query-tee: increased the cortex_querytee_request_duration_seconds metric buckets granularity. #2799
[ENHANCEMENT] Query-tee: fail to start if the configured -backend.preferred is unknown. #2799
[ENHANCEMENT] Ruler: Added the following metrics: #2786
- cortex_prometheus_notifications_latency_seconds
- cortex_prometheus_notifications_errors_total
- cortex_prometheus_notifications_sent_total
- cortex_prometheus_notifications_dropped_total
- cortex_prometheus_notifications_queue_length
- cortex_prometheus_notifications_queue_capacity
- cortex_prometheus_notifications_alertmanagers_discovered
[ENHANCEMENT] The behavior of the /ready was changed for the query frontend to indicate when it was ready to accept queries. This is intended for use by a read path load balancer that would want to wait for the frontend to have attached queriers before including it in the backend. #2733
[ENHANCEMENT] Experimental Delete Series: Add support for deletion of chunks for remaining stores. #2801
[ENHANCEMENT] Add -modules command line flag to list possible values for -target. Also, log warning if given target is internal component. #2752
[ENHANCEMENT] Added -ingester.flush-on-shutdown-with-wal-enabled option to enable chunks flushing even when WAL is enabled. #2780
[ENHANCEMENT] Query-tee: Support for custom API prefix by using -server.path-prefix option. #2814
[ENHANCEMENT] Query-tee: Forward X-Scope-OrgId header to backend, if present in the request. #2815
[ENHANCEMENT] Experimental blocks storage: Added -experimental.blocks-storage.tsdb.head-compaction-idle-timeout option to force compaction of data in memory into a block. #2803
[ENHANCEMENT] Experimental blocks storage: Added support for flushing blocks via /flush, /shutdown (previously these only worked for chunks storage) and by using -experimental.blocks-storage.tsdb.flush-blocks-on-shutdown option. #2794
[ENHANCEMENT] Experimental blocks storage: Added support to enforce max query time range length via -store.max-query-length. #2826
[ENHANCEMENT] Experimental blocks storage: Added support to limit the max number of chunks that can be fetched from the long-term storage while executing a query. The limit is enforced both in the querier and store-gateway, and is configurable via -store.query-chunk-limit. #2852 #2922
[ENHANCEMENT] Ingester: Added new metric cortex_ingester_flush_series_in_progress that reports number of ongoing flush-series operations. Useful when calling /flush handler: if cortex_ingester_flush_queue_length + cortex_ingester_flush_series_in_progress is 0, all flushes are finished. #2778
[ENHANCEMENT] Memberlist members can join cluster via SRV records. #2788
[ENHANCEMENT] Added configuration options for chunks s3 client. #2831
- s3.endpoint
- s3.region
- s3.access-key-id
- s3.secret-access-key
- s3.insecure
- s3.sse-encryption
- s3.http.idle-conn-timeout
- s3.http.response-header-timeout
- s3.http.insecure-skip-verify
[ENHANCEMENT] Prometheus upgraded. #2798 #2849 #2867 #2902 #2918
- Optimized labels regex matchers for patterns containing literals (eg. foo.*, .*foo, .*foo.*)
[ENHANCEMENT] Add metric cortex_ruler_config_update_failures_total to Ruler to track failures of loading rules files. #2857
[ENHANCEMENT] Experimental Alertmanager: Alertmanager configuration persisted to object storage using an experimental API that accepts and returns YAML-based Alertmanager configuration. #2768
[ENHANCEMENT] Ruler: -ruler.alertmanager-url now supports multiple URLs. Each URL is treated as a separate Alertmanager group. Support for multiple Alertmanagers in a group can be achieved by using DNS service discovery. #2851
[ENHANCEMENT] Experimental blocks storage: Cortex Flusher now works with blocks engine. Flusher needs to be provided with blocks-engine configuration, existing Flusher flags are not used (they are only relevant for chunks engine). Note that flush errors are only reported via log. #2877
[ENHANCEMENT] Flusher: Added -flusher.exit-after-flush option (defaults to true) to control whether Cortex should stop completely after Flusher has finished its work. #2877
[ENHANCEMENT] Added metrics cortex_config_hash and cortex_runtime_config_hash to expose hash of the currently active config file. #2874
[ENHANCEMENT] Logger: added JSON logging support, configured via the -log.format=json CLI flag or its respective YAML config option. #2386
[ENHANCEMENT] Added new flags -bigtable.grpc-compression, -ingester.client.grpc-compression, -querier.frontend-client.grpc-compression to configure compression used by gRPC. Valid values are gzip, snappy, or empty string (no compression, default). #2940
[ENHANCEMENT] Clarify limitations of the /api/v1/series, /api/v1/labels and /api/v1/label/{name}/values endpoints. #2953
[BUGFIX] Fixed a bug with api/v1/query_range where no responses would return null values for result and empty values for resultType. #2962
[BUGFIX] Fixed a bug in the index intersect code causing storage to return more chunks/series than required. #2796
[BUGFIX] Fixed the number of reported keys in the background cache queue. #2764
[BUGFIX] Fix race in processing of headers in sharded queries. #2762
[BUGFIX] Query Frontend: Do not re-split sharded requests around ingester boundaries. #2766
[BUGFIX] Experimental Delete Series: Fixed a problem with cache generation numbers prefixed to cache keys. #2800
[BUGFIX] Ingester: Flushing chunks via /flush endpoint could previously lead to panic, if chunks were already flushed before and then removed from memory during the flush caused by /flush handler. Immediate flush now doesn't cause chunks to be flushed again. Samples received during flush triggered via /flush handler are no longer discarded. #2778
[BUGFIX] Prometheus upgraded. #2849
- Fixed unknown symbol error during head compaction
[BUGFIX] Fix panic when using cassandra as store for both index and delete requests. #2774
[BUGFIX] Experimental Delete Series: Fixed a data race in Purger. #2817
[BUGFIX] KV: Fixed a bug that triggered a panic due to metrics being registered with the same name but different labels when using a multi configured KV client. #2837
[BUGFIX] Query-frontend: Fix passing HTTP Host header if -frontend.downstream-url is configured. #2880
[BUGFIX] Ingester: Improve time-series distribution when -experimental.distributor.user-subring-size is enabled. #2887
[BUGFIX] Set content type to application/x-protobuf for remote_read responses. #2915
[BUGFIX] Fixed ruler and store-gateway instance registration in the ring (when sharding is enabled) when a new instance replaces abruptly terminated one, and the only difference between the two instances is the address. #2954
[BUGFIX] Fixed Missing chunks and index config causing silent failure Absence of chunks and index from schema config is not validated. #2732
[BUGFIX] Fix panic caused by KVs from boltdb being used beyond their life. #2971
[BUGFIX] Experimental blocks storage: /api/v1/series, /api/v1/labels and /api/v1/label/{name}/values only query the TSDB head regardless of the configured -experimental.blocks-storage.tsdb.retention-period. #2974
[BUGFIX] Ingester: Avoid indefinite checkpointing in case of surge in number of series. #2955

cortexproject/cortex v1.3.0-rc.0 Cortex 1.3.0-rc.0 on GitHub

Changelog

cortexproject/cortex v1.3.0-rc.0
Cortex 1.3.0-rc.0

on GitHub