Changelog
Cortex
Note the blocks storage compactor runs a migration task at startup in this version, which can take many minutes and use a lot of RAM.
Turn this off after first run.
- [CHANGE] FramedSnappy encoding support has been removed from Push and Remote Read APIs. This means Prometheus 1.6 support has been removed and the oldest Prometheus version supported in the remote write is 1.7. #3682
- [CHANGE] Ruler: removed the flag
-ruler.evaluation-delay-duration-deprecated
which was deprecated in 1.4.0. Please use theruler_evaluation_delay_duration
per-tenant limit instead. #3694 - [CHANGE] Removed the flags
-<prefix>.grpc-use-gzip-compression
which were deprecated in 1.3.0: #3694-query-scheduler.grpc-client-config.grpc-use-gzip-compression
: use-query-scheduler.grpc-client-config.grpc-compression
instead-frontend.grpc-client-config.grpc-use-gzip-compression
: use-frontend.grpc-client-config.grpc-compression
instead-ruler.client.grpc-use-gzip-compression
: use-ruler.client.grpc-compression
instead-bigtable.grpc-use-gzip-compression
: use-bigtable.grpc-compression
instead-ingester.client.grpc-use-gzip-compression
: use-ingester.client.grpc-compression
instead-querier.frontend-client.grpc-use-gzip-compression
: use-querier.frontend-client.grpc-compression
instead
- [CHANGE] Querier: it's not required to set
-frontend.query-stats-enabled=true
in the querier anymore to enable query statistics logging in the query-frontend. The flag is now required to be configured only in the query-frontend and it will be propagated to the queriers. #3595 #3695 - [CHANGE] Blocks storage: compactor is now required when running a Cortex cluster with the blocks storage, because it also keeps the bucket index updated. #3583
- [CHANGE] Blocks storage: block deletion marks are now stored in a per-tenant global markers/ location too, other than within the block location. The compactor, at startup, will copy deletion marks from the block location to the global location. This migration is required only once, so you can safely disable it via
-compactor.block-deletion-marks-migration-enabled=false
once new compactor has successfully started once in your cluster. #3583 - [CHANGE] OpenStack Swift: the default value for the
-ruler.storage.swift.container-name
and-swift.container-name
config options has changed fromcortex
to empty string. If you were relying on the default value, you should set it back tocortex
. #3660 - [CHANGE] HA Tracker: configured replica label is now verified against label value length limit (
-validation.max-length-label-value
). #3668 - [CHANGE] Distributor:
extend_writes
field in YAML configuration has moved fromlifecycler
(insideingester_config
) todistributor_config
. This doesn't affect command line option-distributor.extend-writes
, which stays the same. #3719 - [CHANGE] Alertmanager: Deprecated
-cluster.
CLI flags in favor of their-alertmanager.cluster.
equivalent. The deprecated flags (and their respective YAML config options) are: #3677-cluster.listen-address
in favor of-alertmanager.cluster.listen-address
-cluster.advertise-address
in favor of-alertmanager.cluster.advertise-address
-cluster.peer
in favor of-alertmanager.cluster.peers
-cluster.peer-timeout
in favor of-alertmanager.cluster.peer-timeout
- [CHANGE] Blocks storage: the default value of
-blocks-storage.bucket-store.sync-interval
has been changed from5m
to15m
. #3724 - [FEATURE] Querier: Queries can be federated across multiple tenants. The tenants IDs involved need to be specified separated by a
|
character in theX-Scope-OrgID
request header. This is an experimental feature, which can be enabled by setting-tenant-federation.enabled=true
on all Cortex services. #3250 - [FEATURE] Alertmanager: introduced the experimental option
-alertmanager.sharding-enabled
to shard tenants across multiple Alertmanager instances. This feature is still under heavy development and its usage is discouraged. The following new metrics are exported by the Alertmanager: #3664cortex_alertmanager_ring_check_errors_total
cortex_alertmanager_sync_configs_total
cortex_alertmanager_sync_configs_failed_total
cortex_alertmanager_tenants_discovered
cortex_alertmanager_tenants_owned
- [ENHANCEMENT] Allow specifying JAEGER_ENDPOINT instead of sampling server or local agent port. #3682
- [ENHANCEMENT] Blocks storage: introduced a per-tenant bucket index, periodically updated by the compactor, used to avoid full bucket scanning done by queriers, store-gateways and rulers. The bucket index is updated by the compactor during blocks cleanup, on every
-compactor.cleanup-interval
. #3553 #3555 #3561 #3583 #3625 #3711 #3715 - [ENHANCEMENT] Blocks storage: introduced an option
-blocks-storage.bucket-store.bucket-index.enabled
to enable the usage of the bucket index in the querier, store-gateway and ruler. When enabled, the querier, store-gateway and ruler will use the bucket index to find a tenant's blocks instead of running the periodic bucket scan. The following new metrics are exported by the querier and ruler: #3614 #3625cortex_bucket_index_loads_total
cortex_bucket_index_load_failures_total
cortex_bucket_index_load_duration_seconds
cortex_bucket_index_loaded
- [ENHANCEMENT] Compactor: exported the following metrics. #3583 #3625
cortex_bucket_blocks_count
: Total number of blocks per tenant in the bucket. Includes blocks marked for deletion, but not partial blocks.cortex_bucket_blocks_marked_for_deletion_count
: Total number of blocks per tenant marked for deletion in the bucket.cortex_bucket_blocks_partials_count
: Total number of partial blocks.cortex_bucket_index_last_successful_update_timestamp_seconds
: Timestamp of the last successful update of a tenant's bucket index.
- [ENHANCEMENT] Ruler: Add
cortex_prometheus_last_evaluation_samples
to expose the number of samples generated by a rule group per tenant. #3582 - [ENHANCEMENT] Memberlist: add status page (/memberlist) with available details about memberlist-based KV store and memberlist cluster. It's also possible to view KV values in Go struct or JSON format, or download for inspection. #3575
- [ENHANCEMENT] Memberlist: client can now keep a size-bounded buffer with sent and received messages and display them in the admin UI (/memberlist) for troubleshooting. #3581 #3602
- [ENHANCEMENT] Blocks storage: added block index attributes caching support to metadata cache. The TTL can be configured via
-blocks-storage.bucket-store.metadata-cache.block-index-attributes-ttl
. #3629 - [ENHANCEMENT] Alertmanager: Add support for Azure blob storage. #3634
- [ENHANCEMENT] Compactor: tenants marked for deletion will now be fully cleaned up after some delay since deletion of last block. Cleanup includes removal of remaining marker files (including tenant deletion mark file) and files under
debug/metas
. #3613 - [ENHANCEMENT] Compactor: retry compaction of a single tenant on failure instead of re-running compaction for all tenants. #3627
- [ENHANCEMENT] Querier: Implement result caching for tenant query federation. #3640
- [ENHANCEMENT] API: Add a
mode
query parameter for the config endpoint: #3645/config?mode=diff
: Shows the YAML configuration with all values that differ from the defaults./config?mode=defaults
: Shows the YAML configuration with all the default values.
- [ENHANCEMENT] OpenStack Swift: added the following config options to OpenStack Swift backend client: #3660
- Chunks storage:
-swift.auth-version
,-swift.max-retries
,-swift.connect-timeout
,-swift.request-timeout
. - Blocks storage:
-blocks-storage.swift.auth-version
,-blocks-storage.swift.max-retries
,-blocks-storage.swift.connect-timeout
,-blocks-storage.swift.request-timeout
. - Ruler:
-ruler.storage.swift.auth-version
,-ruler.storage.swift.max-retries
,-ruler.storage.swift.connect-timeout
,-ruler.storage.swift.request-timeout
.
- Chunks storage:
- [ENHANCEMENT] Disabled in-memory shuffle-sharding subring cache in the store-gateway, ruler and compactor. This should reduce the memory utilisation in these services when shuffle-sharding is enabled, without introducing a significantly increase CPU utilisation. #3601
- [ENHANCEMENT] Shuffle sharding: optimised subring generation used by shuffle sharding. #3601
- [ENHANCEMENT] New /runtime_config endpoint that returns the defined runtime configuration in YAML format. The returned configuration includes overrides. #3639
- [ENHANCEMENT] Query-frontend: included the parameter name failed to validate in HTTP 400 message. #3703
- [ENHANCEMENT] Fail to startup Cortex if provided runtime config is invalid. #3707
- [ENHANCEMENT] Alertmanager: Add flags to customize the cluster configuration: #3667
-alertmanager.cluster.gossip-interval
: The interval between sending gossip messages. By lowering this value (more frequent) gossip messages are propagated across cluster more quickly at the expense of increased bandwidth usage.-alertmanager.cluster.push-pull-interval
: The interval between gossip state syncs. Setting this interval lower (more frequent) will increase convergence speeds across larger clusters at the expense of increased bandwidth usage.
- [ENHANCEMENT] Distributor: change the error message returned when a received series has too many label values. The new message format has the series at the end and this plays better with Prometheus logs truncation. #3718
- From:
sample for '<series>' has <value> label names; limit <value>
- To:
series has too many labels (actual: <value>, limit: <value>) series: '<series>'
- From:
- [ENHANCEMENT] Improve bucket index loader to handle edge case where new tenant has not had blocks uploaded to storage yet. #3717
- [BUGFIX] Allow
-querier.max-query-lookback
usey|w|d
suffix like deprecated-store.max-look-back-period
. #3598 - [BUGFIX] Memberlist: Entry in the ring should now not appear again after using "Forget" feature (unless it's still heartbeating). #3603
- [BUGFIX] Ingester: do not close idle TSDBs while blocks shipping is in progress. #3630 #3632
- [BUGFIX] Ingester: correctly update
cortex_ingester_memory_users
andcortex_ingester_active_series
when a tenant's idle TSDB is closed, when running Cortex with the blocks storage. #3646 - [BUGFIX] Querier: fix default value incorrectly overriding
-querier.frontend-address
in single-binary mode. #3650 - [BUGFIX] Compactor: delete
deletion-mark.json
at last when deleting a block in order to not leave partial blocks without deletion mark in the bucket if the compactor is interrupted while deleting a block. #3660 - [BUGFIX] Blocks storage: do not cleanup a partially uploaded block when
meta.json
upload fails. Despite failure to uploadmeta.json
, this file may in some cases still appear in the bucket later. By skipping early cleanup, we avoid having corrupted blocks in the storage. #3660 - [BUGFIX] Alertmanager: disable access to
/alertmanager/metrics
(which exposes all Cortex metrics),/alertmanager/-/reload
and/alertmanager/debug/*
, which were available to any authenticated user with enabled AlertManager. #3678 - [BUGFIX] Query-Frontend: avoid creating many small sub-queries by discarding cache extents under 5 minutes #3653
- [BUGFIX] Ruler: Ensure the stale markers generated for evaluated rules respect the configured
-ruler.evaluation-delay-duration
. This will avoid issues with samples with NaN be persisted with timestamps set ahead of the next rule evaluation. #3687 - [BUGFIX] Alertmanager: don't serve HTTP requests until Alertmanager has fully started. Serving HTTP requests earlier may result in loss of configuration for the user. #3679
- [BUGFIX] Do not log "failed to load config" if runtime config file is empty. #3706
- [BUGFIX] Do not allow to use a runtime config file containing multiple YAML documents. #3706
- [BUGFIX] HA Tracker: don't track as error in the
cortex_kv_request_duration_seconds
metric a CAS operation intentionally aborted. #3745