This Cortex releases features 112 contributions from 32 authors and exciting news!
Highlights
- Cortex blocks storage is now GA.
- Cassandra support for the chunks storage is now GA.
- Redis caching backend now supports Redis sentinel and Redis cluster too.
- Introduced shuffle sharding support to store-gateway blocks sharding (blocks storage).
- The ruler and alertmanager got several improvements
- Last, but not the least, many enhancements, optimisations and bug fixes.
Please refer to the changelog for full list of changes and improvements.
Changelog
- [CHANGE] Cassandra backend support is now GA (stable). #3180
- [CHANGE] Blocks storage is now GA (stable). The
-experimental
prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180-experimental.blocks-storage.*
flags renamed to-blocks-storage.*
-experimental.store-gateway.*
flags renamed to-store-gateway.*
-experimental.querier.store-gateway-client.*
flags renamed to-querier.store-gateway-client.*
-experimental.querier.store-gateway-addresses
flag renamed to-querier.store-gateway-addresses
- [CHANGE] Ingester: Removed deprecated untyped record from chunks WAL. Only if you are running
v1.0
or below, it is recommended to first upgrade tov1.1
/v1.2
/v1.3
and run it for a day before upgrading tov1.4
to avoid data loss. #3115 - [CHANGE] Distributor API endpoints are no longer served unless target is set to
distributor
orall
. #3112 - [CHANGE] Increase the default Cassandra client replication factor to 3. #3007
- [CHANGE] Blocks storage: removed the support to transfer blocks between ingesters on shutdown. When running the Cortex blocks storage, ingesters are expected to run with a persistent disk. The following metrics have been removed: #2996
cortex_ingester_sent_files
cortex_ingester_received_files
cortex_ingester_received_bytes_total
cortex_ingester_sent_bytes_total
- [CHANGE] The buckets for the
cortex_chunk_store_index_lookups_per_query
metric have been changed to 1, 2, 4, 8, 16. #3021 - [CHANGE] Blocks storage: the
operation
label valuegetrange
has changed intoget_range
for the metricsthanos_store_bucket_cache_operation_requests_total
andthanos_store_bucket_cache_operation_hits_total
. #3000 - [CHANGE] Experimental Delete Series:
/api/v1/admin/tsdb/delete_series
and/api/v1/admin/tsdb/cancel_delete_request
purger APIs to return status code204
instead of200
for success. #2946 - [CHANGE] Histogram
cortex_memcache_request_duration_seconds
method
label value changes fromMemcached.Get
toMemcached.GetBatched
for batched lookups, and is not reported for non-batched lookups (label valueMemcached.GetMulti
remains, and had exactly the same value asGet
in nonbatched lookups). The same change applies to tracing spans. #3046 - [CHANGE] TLS server validation is now enabled by default, a new parameter
tls_insecure_skip_verify
can be set to true to skip validation optionally. #3030 - [CHANGE]
cortex_ruler_config_update_failures_total
has been removed in favor ofcortex_ruler_config_last_reload_successful
. #3056 - [CHANGE]
ruler.evaluation_delay_duration
field in YAML config has been moved and renamed tolimits.ruler_evaluation_delay_duration
. #3098 - [CHANGE] Removed obsolete
results_cache.max_freshness
from YAML config (deprecated since Cortex 1.2). #3145 - [CHANGE] Removed obsolete
-promql.lookback-delta
option (deprecated since Cortex 1.2, replaced with-querier.lookback-delta
). #3144 - [CHANGE] Cache: added support for Redis Cluster and Redis Sentinel. #2961
- The following changes have been made in Redis configuration:
-redis.master_name
added-redis.db
added-redis.max-active-conns
changed to-redis.pool-size
-redis.max-conn-lifetime
changed to-redis.max-connection-age
-redis.max-idle-conns
removed-redis.wait-on-pool-exhaustion
removed
- [FEATURE] Logging of the source IP passed along by a reverse proxy is now supported by setting the
-server.log-source-ips-enabled
. For non standard headers the settings-server.log-source-ips-header
and-server.log-source-ips-regex
can be used. #2985 - [FEATURE] Blocks storage: added shuffle sharding support to store-gateway blocks sharding. Added the following additional metrics to store-gateway: #3069
cortex_bucket_stores_tenants_discovered
cortex_bucket_stores_tenants_synced
- [FEATURE] Experimental blocksconvert: introduce an experimental tool
blocksconvert
to migrate long-term storage chunks to blocks. #3092 #3122 #3127 #3162 - [ENHANCEMENT] Add support for azure storage in China, German and US Government environments. #2988
- [ENHANCEMENT] Query-tee: added a small tolerance to floating point sample values comparison. #2994
- [ENHANCEMENT] Query-tee: add support for doing a passthrough of requests to preferred backend for unregistered routes #3018
- [ENHANCEMENT] Expose
storage.aws.dynamodb.backoff_config
configuration file field. #3026 - [ENHANCEMENT] Added
cortex_request_message_bytes
andcortex_response_message_bytes
histograms to track received and sent gRPC message and HTTP request/response sizes. Addedcortex_inflight_requests
gauge to track number of inflight gRPC and HTTP requests. #3064 - [ENHANCEMENT] Publish ruler's ring metrics. #3074
- [ENHANCEMENT] Add config validation to the experimental Alertmanager API. Invalid configs are no longer accepted. #3053
- [ENHANCEMENT] Add "integration" as a label for
cortex_alertmanager_notifications_total
andcortex_alertmanager_notifications_failed_total
metrics. #3056 - [ENHANCEMENT] Add
cortex_ruler_config_last_reload_successful
andcortex_ruler_config_last_reload_successful_seconds
to check status of users rule manager. #3056 - [ENHANCEMENT] The configuration validation now fails if an empty YAML node has been set for a root YAML config property. #3080
- [ENHANCEMENT] Memcached dial() calls now have a circuit-breaker to avoid hammering a broken cache. #3051, #3189
- [ENHANCEMENT]
-ruler.evaluation-delay-duration
is now overridable as a per-tenant limit,ruler_evaluation_delay_duration
. #3098 - [ENHANCEMENT] Add TLS support to etcd client. #3102
- [ENHANCEMENT] When a tenant accesses the Alertmanager UI or its API, if we have valid
-alertmanager.configs.fallback
we'll use that to start the manager and avoid failing the request. #3073 - [ENHANCEMENT] Add
DELETE api/v1/rules/{namespace}
to the Ruler. It allows all the rule groups of a namespace to be deleted. #3120 - [ENHANCEMENT] Experimental Delete Series: Retry processing of Delete requests during failures. #2926
- [ENHANCEMENT] Improve performance of QueryStream() in ingesters. #3177
- [ENHANCEMENT] Modules included in "All" target are now visible in output of
-modules
CLI flag. #3155 - [ENHANCEMENT] Added
/debug/fgprof
endpoint to debug running Cortex process usingfgprof
. This adds up to the existing/debug/...
endpoints. #3131 - [ENHANCEMENT] Blocks storage: optimised
/api/v1/series
for blocks storage. (#2976) - [BUGFIX] Ruler: when loading rules from "local" storage, check for directory after resolving symlink. #3137
- [BUGFIX] Query-frontend: Fixed rounding for incoming query timestamps, to be 100% Prometheus compatible. #2990
- [BUGFIX] Querier: Merge results from chunks and blocks ingesters when using streaming of results. #3013
- [BUGFIX] Querier: query /series from ingesters regardless the
-querier.query-ingesters-within
setting. #3035 - [BUGFIX] Blocks storage: Ingester is less likely to hit gRPC message size limit when streaming data to queriers. #3015
- [BUGFIX] Blocks storage: fixed memberlist support for the store-gateways and compactors ring used when blocks sharding is enabled. #3058 #3095
- [BUGFIX] Fix configuration for TLS server validation, TLS skip verify was hardcoded to true for all TLS configurations and prevented validation of server certificates. #3030
- [BUGFIX] Fixes the Alertmanager panicking when no
-alertmanager.web.external-url
is provided. #3017 - [BUGFIX] Fixes the registration of the Alertmanager API metrics
cortex_alertmanager_alerts_received_total
andcortex_alertmanager_alerts_invalid_total
. #3065 - [BUGFIX] Fixes
flag needs an argument: -config.expand-env
error. #3087 - [BUGFIX] An index optimisation actually slows things down when using caching. Moved it to the right location. #2973
- [BUGFIX] Ingester: If push request contained both valid and invalid samples, valid samples were ingested but not stored to WAL of the chunks storage. This has been fixed. #3067
- [BUGFIX] Cassandra: fixed consistency setting in the CQL session when creating the keyspace. #3105
- [BUGFIX] Ruler: Config API would return both the
record
andalert
inYAML
response keys even when one of them must be empty. #3120 - [BUGFIX] Index page now uses configured HTTP path prefix when creating links. #3126
- [BUGFIX] Purger: fixed deadlock when reloading of tombstones failed. #3182
- [BUGFIX] Fixed panic in flusher job, when error writing chunks to the store would cause "idle" chunks to be flushed, which triggered panic. #3140
- [BUGFIX] Index page no longer shows links that are not valid for running Cortex instance. #3133
- [BUGFIX] Configs: prevent validation of templates to fail when using template functions. #3157
- [BUGFIX] Configuring the S3 URL with an
@
but without username and password doesn't enable the AWS static credentials anymore. #3170 - [BUGFIX] Limit errors on ranged queries (
api/v1/query_range
) no longer return a status code500
but422
instead. #3167