cortexproject/cortex v1.7.0 on GitHub

Changelog

Cortex

Note the blocks storage compactor runs a migration task at startup in this version, which can take many minutes and use a lot of RAM.
Turn this off after first run.

[CHANGE] FramedSnappy encoding support has been removed from Push and Remote Read APIs. This means Prometheus 1.6 support has been removed and the oldest Prometheus version supported in the remote write is 1.7. #3682
[CHANGE] Ruler: removed the flag -ruler.evaluation-delay-duration-deprecated which was deprecated in 1.4.0. Please use the ruler_evaluation_delay_duration per-tenant limit instead. #3694
[CHANGE] Removed the flags -<prefix>.grpc-use-gzip-compression which were deprecated in 1.3.0: #3694
- -query-scheduler.grpc-client-config.grpc-use-gzip-compression: use -query-scheduler.grpc-client-config.grpc-compression instead
- -frontend.grpc-client-config.grpc-use-gzip-compression: use -frontend.grpc-client-config.grpc-compression instead
- -ruler.client.grpc-use-gzip-compression: use -ruler.client.grpc-compression instead
- -bigtable.grpc-use-gzip-compression: use -bigtable.grpc-compression instead
- -ingester.client.grpc-use-gzip-compression: use -ingester.client.grpc-compression instead
- -querier.frontend-client.grpc-use-gzip-compression: use -querier.frontend-client.grpc-compression instead
[CHANGE] Querier: it's not required to set -frontend.query-stats-enabled=true in the querier anymore to enable query statistics logging in the query-frontend. The flag is now required to be configured only in the query-frontend and it will be propagated to the queriers. #3595 #3695
[CHANGE] Blocks storage: compactor is now required when running a Cortex cluster with the blocks storage, because it also keeps the bucket index updated. #3583
[CHANGE] Blocks storage: block deletion marks are now stored in a per-tenant global markers/ location too, other than within the block location. The compactor, at startup, will copy deletion marks from the block location to the global location. This migration is required only once, so you can safely disable it via -compactor.block-deletion-marks-migration-enabled=false once new compactor has successfully started once in your cluster. #3583
[CHANGE] OpenStack Swift: the default value for the -ruler.storage.swift.container-name and -swift.container-name config options has changed from cortex to empty string. If you were relying on the default value, you should set it back to cortex. #3660
[CHANGE] HA Tracker: configured replica label is now verified against label value length limit (-validation.max-length-label-value). #3668
[CHANGE] Distributor: extend_writes field in YAML configuration has moved from lifecycler (inside ingester_config) to distributor_config. This doesn't affect command line option -distributor.extend-writes, which stays the same. #3719
[CHANGE] Alertmanager: Deprecated -cluster. CLI flags in favor of their -alertmanager.cluster. equivalent. The deprecated flags (and their respective YAML config options) are: #3677
- -cluster.listen-address in favor of -alertmanager.cluster.listen-address
- -cluster.advertise-address in favor of -alertmanager.cluster.advertise-address
- -cluster.peer in favor of -alertmanager.cluster.peers
- -cluster.peer-timeout in favor of -alertmanager.cluster.peer-timeout
[CHANGE] Blocks storage: the default value of -blocks-storage.bucket-store.sync-interval has been changed from 5m to 15m. #3724
[FEATURE] Querier: Queries can be federated across multiple tenants. The tenants IDs involved need to be specified separated by a | character in the X-Scope-OrgID request header. This is an experimental feature, which can be enabled by setting -tenant-federation.enabled=true on all Cortex services. #3250
[FEATURE] Alertmanager: introduced the experimental option -alertmanager.sharding-enabled to shard tenants across multiple Alertmanager instances. This feature is still under heavy development and its usage is discouraged. The following new metrics are exported by the Alertmanager: #3664
- cortex_alertmanager_ring_check_errors_total
- cortex_alertmanager_sync_configs_total
- cortex_alertmanager_sync_configs_failed_total
- cortex_alertmanager_tenants_discovered
- cortex_alertmanager_tenants_owned
[ENHANCEMENT] Allow specifying JAEGER_ENDPOINT instead of sampling server or local agent port. #3682
[ENHANCEMENT] Blocks storage: introduced a per-tenant bucket index, periodically updated by the compactor, used to avoid full bucket scanning done by queriers, store-gateways and rulers. The bucket index is updated by the compactor during blocks cleanup, on every -compactor.cleanup-interval. #3553 #3555 #3561 #3583 #3625 #3711 #3715
[ENHANCEMENT] Blocks storage: introduced an option -blocks-storage.bucket-store.bucket-index.enabled to enable the usage of the bucket index in the querier, store-gateway and ruler. When enabled, the querier, store-gateway and ruler will use the bucket index to find a tenant's blocks instead of running the periodic bucket scan. The following new metrics are exported by the querier and ruler: #3614 #3625
- cortex_bucket_index_loads_total
- cortex_bucket_index_load_failures_total
- cortex_bucket_index_load_duration_seconds
- cortex_bucket_index_loaded
[ENHANCEMENT] Compactor: exported the following metrics. #3583 #3625
- cortex_bucket_blocks_count: Total number of blocks per tenant in the bucket. Includes blocks marked for deletion, but not partial blocks.
- cortex_bucket_blocks_marked_for_deletion_count: Total number of blocks per tenant marked for deletion in the bucket.
- cortex_bucket_blocks_partials_count: Total number of partial blocks.
- cortex_bucket_index_last_successful_update_timestamp_seconds: Timestamp of the last successful update of a tenant's bucket index.
[ENHANCEMENT] Ruler: Add cortex_prometheus_last_evaluation_samples to expose the number of samples generated by a rule group per tenant. #3582
[ENHANCEMENT] Memberlist: add status page (/memberlist) with available details about memberlist-based KV store and memberlist cluster. It's also possible to view KV values in Go struct or JSON format, or download for inspection. #3575
[ENHANCEMENT] Memberlist: client can now keep a size-bounded buffer with sent and received messages and display them in the admin UI (/memberlist) for troubleshooting. #3581 #3602
[ENHANCEMENT] Blocks storage: added block index attributes caching support to metadata cache. The TTL can be configured via -blocks-storage.bucket-store.metadata-cache.block-index-attributes-ttl. #3629
[ENHANCEMENT] Alertmanager: Add support for Azure blob storage. #3634
[ENHANCEMENT] Compactor: tenants marked for deletion will now be fully cleaned up after some delay since deletion of last block. Cleanup includes removal of remaining marker files (including tenant deletion mark file) and files under debug/metas. #3613
[ENHANCEMENT] Compactor: retry compaction of a single tenant on failure instead of re-running compaction for all tenants. #3627
[ENHANCEMENT] Querier: Implement result caching for tenant query federation. #3640
[ENHANCEMENT] API: Add a mode query parameter for the config endpoint: #3645
- /config?mode=diff: Shows the YAML configuration with all values that differ from the defaults.
- /config?mode=defaults: Shows the YAML configuration with all the default values.
[ENHANCEMENT] OpenStack Swift: added the following config options to OpenStack Swift backend client: #3660
- Chunks storage: -swift.auth-version, -swift.max-retries, -swift.connect-timeout, -swift.request-timeout.
- Blocks storage: -blocks-storage.swift.auth-version, -blocks-storage.swift.max-retries, -blocks-storage.swift.connect-timeout, -blocks-storage.swift.request-timeout.
- Ruler: -ruler.storage.swift.auth-version, -ruler.storage.swift.max-retries, -ruler.storage.swift.connect-timeout, -ruler.storage.swift.request-timeout.
[ENHANCEMENT] Disabled in-memory shuffle-sharding subring cache in the store-gateway, ruler and compactor. This should reduce the memory utilisation in these services when shuffle-sharding is enabled, without introducing a significantly increase CPU utilisation. #3601
[ENHANCEMENT] Shuffle sharding: optimised subring generation used by shuffle sharding. #3601
[ENHANCEMENT] New /runtime_config endpoint that returns the defined runtime configuration in YAML format. The returned configuration includes overrides. #3639
[ENHANCEMENT] Query-frontend: included the parameter name failed to validate in HTTP 400 message. #3703
[ENHANCEMENT] Fail to startup Cortex if provided runtime config is invalid. #3707
[ENHANCEMENT] Alertmanager: Add flags to customize the cluster configuration: #3667
- -alertmanager.cluster.gossip-interval: The interval between sending gossip messages. By lowering this value (more frequent) gossip messages are propagated across cluster more quickly at the expense of increased bandwidth usage.
- -alertmanager.cluster.push-pull-interval: The interval between gossip state syncs. Setting this interval lower (more frequent) will increase convergence speeds across larger clusters at the expense of increased bandwidth usage.
[ENHANCEMENT] Distributor: change the error message returned when a received series has too many label values. The new message format has the series at the end and this plays better with Prometheus logs truncation. #3718
- From: sample for '<series>' has <value> label names; limit <value>
- To: series has too many labels (actual: <value>, limit: <value>) series: '<series>'
[ENHANCEMENT] Improve bucket index loader to handle edge case where new tenant has not had blocks uploaded to storage yet. #3717
[BUGFIX] Allow -querier.max-query-lookback use y|w|d suffix like deprecated -store.max-look-back-period. #3598
[BUGFIX] Memberlist: Entry in the ring should now not appear again after using "Forget" feature (unless it's still heartbeating). #3603
[BUGFIX] Ingester: do not close idle TSDBs while blocks shipping is in progress. #3630 #3632
[BUGFIX] Ingester: correctly update cortex_ingester_memory_users and cortex_ingester_active_series when a tenant's idle TSDB is closed, when running Cortex with the blocks storage. #3646
[BUGFIX] Querier: fix default value incorrectly overriding -querier.frontend-address in single-binary mode. #3650
[BUGFIX] Compactor: delete deletion-mark.json at last when deleting a block in order to not leave partial blocks without deletion mark in the bucket if the compactor is interrupted while deleting a block. #3660
[BUGFIX] Blocks storage: do not cleanup a partially uploaded block when meta.json upload fails. Despite failure to upload meta.json, this file may in some cases still appear in the bucket later. By skipping early cleanup, we avoid having corrupted blocks in the storage. #3660
[BUGFIX] Alertmanager: disable access to /alertmanager/metrics (which exposes all Cortex metrics), /alertmanager/-/reload and /alertmanager/debug/*, which were available to any authenticated user with enabled AlertManager. #3678
[BUGFIX] Query-Frontend: avoid creating many small sub-queries by discarding cache extents under 5 minutes #3653
[BUGFIX] Ruler: Ensure the stale markers generated for evaluated rules respect the configured -ruler.evaluation-delay-duration. This will avoid issues with samples with NaN be persisted with timestamps set ahead of the next rule evaluation. #3687
[BUGFIX] Alertmanager: don't serve HTTP requests until Alertmanager has fully started. Serving HTTP requests earlier may result in loss of configuration for the user. #3679
[BUGFIX] Do not log "failed to load config" if runtime config file is empty. #3706
[BUGFIX] Do not allow to use a runtime config file containing multiple YAML documents. #3706
[BUGFIX] HA Tracker: don't track as error in the cortex_kv_request_duration_seconds metric a CAS operation intentionally aborted. #3745

cortexproject/cortex v1.7.0 Cortex 1.7.0 on GitHub

Changelog

Cortex

cortexproject/cortex v1.7.0
Cortex 1.7.0

on GitHub