This release contains 131 contributions from 28 authors. Thank you!
Highlights
- We have several exciting features become stable: Shuffle-sharding, querying chunks and blocks store simultaneously, lazy mmap-ing of block indexes, etc.
- Several query and ingest performance improvements!
- Tons of bugfixes and optimisations!
Changelog
- [CHANGE] Alertmanager now removes local files after Alertmanager is no longer running for removed or resharded user. #3910
- [CHANGE] Alertmanager now stores local files in per-tenant folders. Files stored by Alertmanager previously are migrated to new hierarchy. Support for this migration will be removed in Cortex 1.11. #3910
- [CHANGE] Ruler: deprecated
-ruler.storage.*
CLI flags (and their respective YAML config options) in favour of-ruler-storage.*
. The deprecated config will be removed in Cortex 1.11. #3945 - [CHANGE] Alertmanager: deprecated
-alertmanager.storage.*
CLI flags (and their respective YAML config options) in favour of-alertmanager-storage.*
. This change doesn't apply to-alertmanager.storage.path
and-alertmanager.storage.retention
. The deprecated config will be removed in Cortex 1.11. #4002 - [CHANGE] Alertmanager: removed
-cluster.
CLI flags deprecated in Cortex 1.7. The new config options to use are: #3946-alertmanager.cluster.listen-address
instead of-cluster.listen-address
-alertmanager.cluster.advertise-address
instead of-cluster.advertise-address
-alertmanager.cluster.peers
instead of-cluster.peer
-alertmanager.cluster.peer-timeout
instead of-cluster.peer-timeout
- [CHANGE] Blocks storage: removed the config option
-blocks-storage.bucket-store.index-cache.postings-compression-enabled
, which was deprecated in Cortex 1.6. Postings compression is always enabled. #4101 - [CHANGE] Querier: removed the config option
-store.max-look-back-period
, which was deprecated in Cortex 1.6 and was used only by the chunks storage. You should use-querier.max-query-lookback
instead. #4101 - [CHANGE] Query Frontend: removed the config option
-querier.compress-http-responses
, which was deprecated in Cortex 1.6. You should use-api.response-compression-enabled
instead. #4101 - [CHANGE] Runtime-config / overrides: removed the config options
-limits.per-user-override-config
(use-runtime-config.file
) and-limits.per-user-override-period
(use-runtime-config.reload-period
), both deprecated since Cortex 0.6.0. #4112 - [CHANGE] Cortex now fails fast on startup if unable to connect to the ring backend. #4068
- [FEATURE] The following features have been marked as stable: #4101
- Shuffle-sharding
- Querier support for querying chunks and blocks store at the same time
- Tracking of active series and exporting them as metrics (
-ingester.active-series-metrics-enabled
and related flags) - Blocks storage: lazy mmap of block indexes in the store-gateway (
-blocks-storage.bucket-store.index-header-lazy-loading-enabled
) - Ingester: close idle TSDB and remove them from local disk (
-blocks-storage.tsdb.close-idle-tsdb-timeout
)
- [FEATURE] Memberlist: add TLS configuration options for the memberlist transport layer used by the gossip KV store. #4046
- New flags added for memberlist communication:
-memberlist.tls-enabled
-memberlist.tls-cert-path
-memberlist.tls-key-path
-memberlist.tls-ca-path
-memberlist.tls-server-name
-memberlist.tls-insecure-skip-verify
- New flags added for memberlist communication:
- [FEATURE] Ruler: added
local
backend support to the ruler storage configuration under the-ruler-storage.
flag prefix. #3932 - [ENHANCEMENT] Upgraded Docker base images to
alpine:3.13
. #4042 - [ENHANCEMENT] Blocks storage: reduce ingester memory by eliminating series reference cache. #3951
- [ENHANCEMENT] Ruler: optimized
<prefix>/api/v1/rules
and<prefix>/api/v1/alerts
when ruler sharding is enabled. #3916 - [ENHANCEMENT] Ruler: added the following metrics when ruler sharding is enabled: #3916
cortex_ruler_clients
cortex_ruler_client_request_duration_seconds
- [ENHANCEMENT] Alertmanager: Add API endpoint to list all tenant alertmanager configs:
GET /multitenant_alertmanager/configs
. #3529 - [ENHANCEMENT] Ruler: Add API endpoint to list all tenant ruler rule groups:
GET /ruler/rule_groups
. #3529 - [ENHANCEMENT] Query-frontend/scheduler: added querier forget delay (
-query-frontend.querier-forget-delay
and-query-scheduler.querier-forget-delay
) to mitigate the blast radius in the event queriers crash because of a repeatedly sent "query of death" when shuffle-sharding is enabled. #3901 - [ENHANCEMENT] Query-frontend: reduced memory allocations when serializing query response. #3964
- [ENHANCEMENT] Querier / ruler: some optimizations to PromQL query engine. #3934 #3989
- [ENHANCEMENT] Ingester: reduce CPU and memory when an high number of errors are returned by the ingester on the write path with the blocks storage. #3969 #3971 #3973
- [ENHANCEMENT] Distributor: reduce CPU and memory when an high number of errors are returned by the distributor on the write path. #3990
- [ENHANCEMENT] Put metric before label value in the "label value too long" error message. #4018
- [ENHANCEMENT] Allow use of
y|w|d
suffixes for duration related limits and per-tenant limits. #4044 - [ENHANCEMENT] Query-frontend: Small optimization on top of PR #3968 to avoid unnecessary Extents merging. #4026
- [ENHANCEMENT] Add a metric
cortex_compactor_compaction_interval_seconds
for the compaction interval config value. #4040 - [ENHANCEMENT] Ingester: added following per-ingester (instance) experimental limits: max number of series in memory (
-ingester.instance-limits.max-series
), max number of users in memory (-ingester.instance-limits.max-tenants
), max ingestion rate (-ingester.instance-limits.max-ingestion-rate
), and max inflight requests (-ingester.instance-limits.max-inflight-push-requests
). These limits are only used when using blocks storage. Limits can also be configured using runtime-config feature, and current values are exported ascortex_ingester_instance_limits
metric. #3992. - [ENHANCEMENT] Cortex is now built with Go 1.16. #4062
- [ENHANCEMENT] Distributor: added per-distributor experimental limits: max number of inflight requests (
-distributor.instance-limits.max-inflight-push-requests
) and max ingestion rate in samples/sec (-distributor.instance-limits.max-ingestion-rate
). If not set, these two are unlimited. Also added metrics to expose current values (cortex_distributor_inflight_push_requests
,cortex_distributor_ingestion_rate_samples_per_second
) as well as limits (cortex_distributor_instance_limits
with variouslimit
label values). #4071 - [ENHANCEMENT] Ruler: Added
-ruler.enabled-tenants
and-ruler.disabled-tenants
to explicitly enable or disable rules processing for specific tenants. #4074 - [ENHANCEMENT] Block Storage Ingester:
/flush
now accepts two new parameters:tenant
to specify tenant to flush andwait=true
to make call synchronous. Multiple tenants can be specified by repeatingtenant
parameter. If notenant
is specified, all tenants are flushed, as before. #4073 - [ENHANCEMENT] Alertmanager: validate configured
-alertmanager.web.external-url
and fail if ends with/
. #4081 - [ENHANCEMENT] Alertmanager: added
-alertmanager.receivers-firewall.block.cidr-networks
and-alertmanager.receivers-firewall.block.private-addresses
to block specific network addresses in HTTP-based Alertmanager receiver integrations. #4085 - [ENHANCEMENT] Allow configuration of Cassandra's host selection policy. #4069
- [ENHANCEMENT] Store-gateway: retry synching blocks if a per-tenant sync fails. #3975 #4088
- [ENHANCEMENT] Add metric
cortex_tcp_connections
exposing the current number of accepted TCP connections. #4099 - [ENHANCEMENT] Querier: Allow federated queries to run concurrently. #4065
- [ENHANCEMENT] Label Values API call now supports
match[]
parameter when querying blocks on storage (assuming-querier.query-store-for-labels-enabled
is enabled). #4133 - [BUGFIX] Ruler-API: fix bug where
/api/v1/rules/<namespace>/<group_name>
endpoint return400
instead of404
. #4013 - [BUGFIX] Distributor: reverted changes done to rate limiting in #3825. #3948
- [BUGFIX] Ingester: Fix race condition when opening and closing tsdb concurrently. #3959
- [BUGFIX] Querier: streamline tracing spans. #3924
- [BUGFIX] Ruler Storage: ignore objects with empty namespace or group in the name. #3999
- [BUGFIX] Distributor: fix issue causing distributors to not extend the replication set because of failing instances when zone-aware replication is enabled. #3977
- [BUGFIX] Query-frontend: Fix issue where cached entry size keeps increasing when making tiny query repeatedly. #3968
- [BUGFIX] Compactor:
-compactor.blocks-retention-period
now supports weeks (w
) and years (y
). #4027 - [BUGFIX] Querier: returning 422 (instead of 500) when query hits
max_chunks_per_query
limit with block storage, when the limit is hit in the store-gateway. #3937 - [BUGFIX] Ruler: Rule group limit enforcement should now allow the same number of rules in a group as the limit. #3616
- [BUGFIX] Frontend, Query-scheduler: allow querier to notify about shutdown without providing any authentication. #4066
- [BUGFIX] Querier: fixed race condition causing queries to fail right after querier startup with the "empty ring" error. #4068
- [BUGFIX] Compactor: Increment
cortex_compactor_runs_failed_total
if compactor failed compact a single tenant. #4094 - [BUGFIX] Tracing: hot fix to avoid the Jaeger tracing client to indefinitely block the Cortex process shutdown in case the HTTP connection to the tracing backend is blocked. #4134
- [BUGFIX] Forward proper EndsAt from ruler to Alertmanager inline with Prometheus behaviour. #4017
Blocksconvert
- [ENHANCEMENT] Builder: add
-builder.timestamp-tolerance
option which may reduce block size by rounding timestamps to make difference whole seconds. #3891