cortexproject/cortex v1.2.0 on GitHub

This release has a number of bug-fixes and enhancements, particularly:

Memberlist KV client is no longer considered experimental. #2725
3rd-party index and chunk stores using gRPC client/server plugin mechanism (experimental) #2220
Using an invalid flag no longer causes printing of all available flags. #2691 (my favourite change!)

Many thanks to all contributors.

Detailed list of changes:

[CHANGE] Metric cortex_kv_request_duration_seconds now includes name label to denote which client is being used as well as the backend label to denote the KV backend implementation in use. #2648
[CHANGE] Experimental Ruler: Rule groups persisted to object storage using the experimental API have an updated object key encoding to better handle special characters. Rule groups previously-stored using object storage must be renamed to the new format. #2646
[CHANGE] Query Frontend now uses Round Robin to choose a tenant queue to service next. #2553
[CHANGE] -promql.lookback-delta is now deprecated and has been replaced by -querier.lookback-delta along with lookback_delta entry under querier in the config file. -promql.lookback-delta will be removed in v1.4.0. #2604
[CHANGE] Experimental TSDB: removed -experimental.tsdb.bucket-store.binary-index-header-enabled flag. Now the binary index-header is always enabled.
[CHANGE] Experimental TSDB: Renamed index-cache metrics to use original metric names from Thanos, as Cortex is not aggregating them in any way: #2627
- cortex_<service>_blocks_index_cache_items_evicted_total => thanos_store_index_cache_items_evicted_total{name="index-cache"}
- cortex_<service>_blocks_index_cache_items_added_total => thanos_store_index_cache_items_added_total{name="index-cache"}
- cortex_<service>_blocks_index_cache_requests_total => thanos_store_index_cache_requests_total{name="index-cache"}
- cortex_<service>_blocks_index_cache_items_overflowed_total => thanos_store_index_cache_items_overflowed_total{name="index-cache"}
- cortex_<service>_blocks_index_cache_hits_total => thanos_store_index_cache_hits_total{name="index-cache"}
- cortex_<service>_blocks_index_cache_items => thanos_store_index_cache_items{name="index-cache"}
- cortex_<service>_blocks_index_cache_items_size_bytes => thanos_store_index_cache_items_size_bytes{name="index-cache"}
- cortex_<service>_blocks_index_cache_total_size_bytes => thanos_store_index_cache_total_size_bytes{name="index-cache"}
- cortex_<service>_blocks_index_cache_memcached_operations_total => thanos_memcached_operations_total{name="index-cache"}
- cortex_<service>_blocks_index_cache_memcached_operation_failures_total => thanos_memcached_operation_failures_total{name="index-cache"}
- cortex_<service>_blocks_index_cache_memcached_operation_duration_seconds => thanos_memcached_operation_duration_seconds{name="index-cache"}
- cortex_<service>_blocks_index_cache_memcached_operation_skipped_total => thanos_memcached_operation_skipped_total{name="index-cache"}
[CHANGE] Experimental TSDB: Renamed metrics in bucket stores: #2627
- cortex_<service>_blocks_meta_syncs_total => cortex_blocks_meta_syncs_total{component="<service>"}
- cortex_<service>_blocks_meta_sync_failures_total => cortex_blocks_meta_sync_failures_total{component="<service>"}
- cortex_<service>_blocks_meta_sync_duration_seconds => cortex_blocks_meta_sync_duration_seconds{component="<service>"}
- cortex_<service>_blocks_meta_sync_consistency_delay_seconds => cortex_blocks_meta_sync_consistency_delay_seconds{component="<service>"}
- cortex_<service>_blocks_meta_synced => cortex_blocks_meta_synced{component="<service>"}
- cortex_<service>_bucket_store_block_loads_total => cortex_bucket_store_block_loads_total{component="<service>"}
- cortex_<service>_bucket_store_block_load_failures_total => cortex_bucket_store_block_load_failures_total{component="<service>"}
- cortex_<service>_bucket_store_block_drops_total => cortex_bucket_store_block_drops_total{component="<service>"}
- cortex_<service>_bucket_store_block_drop_failures_total => cortex_bucket_store_block_drop_failures_total{component="<service>"}
- cortex_<service>_bucket_store_blocks_loaded => cortex_bucket_store_blocks_loaded{component="<service>"}
- cortex_<service>_bucket_store_series_data_touched => cortex_bucket_store_series_data_touched{component="<service>"}
- cortex_<service>_bucket_store_series_data_fetched => cortex_bucket_store_series_data_fetched{component="<service>"}
- cortex_<service>_bucket_store_series_data_size_touched_bytes => cortex_bucket_store_series_data_size_touched_bytes{component="<service>"}
- cortex_<service>_bucket_store_series_data_size_fetched_bytes => cortex_bucket_store_series_data_size_fetched_bytes{component="<service>"}
- cortex_<service>_bucket_store_series_blocks_queried => cortex_bucket_store_series_blocks_queried{component="<service>"}
- cortex_<service>_bucket_store_series_get_all_duration_seconds => cortex_bucket_store_series_get_all_duration_seconds{component="<service>"}
- cortex_<service>_bucket_store_series_merge_duration_seconds => cortex_bucket_store_series_merge_duration_seconds{component="<service>"}
- cortex_<service>_bucket_store_series_refetches_total => cortex_bucket_store_series_refetches_total{component="<service>"}
- cortex_<service>_bucket_store_series_result_series => cortex_bucket_store_series_result_series{component="<service>"}
- cortex_<service>_bucket_store_cached_postings_compressions_total => cortex_bucket_store_cached_postings_compressions_total{component="<service>"}
- cortex_<service>_bucket_store_cached_postings_compression_errors_total => cortex_bucket_store_cached_postings_compression_errors_total{component="<service>"}
- cortex_<service>_bucket_store_cached_postings_compression_time_seconds => cortex_bucket_store_cached_postings_compression_time_seconds{component="<service>"}
- cortex_<service>_bucket_store_cached_postings_original_size_bytes_total => cortex_bucket_store_cached_postings_original_size_bytes_total{component="<service>"}
- cortex_<service>_bucket_store_cached_postings_compressed_size_bytes_total => cortex_bucket_store_cached_postings_compressed_size_bytes_total{component="<service>"}
- cortex_<service>_blocks_sync_seconds => cortex_bucket_stores_blocks_sync_seconds{component="<service>"}
- cortex_<service>_blocks_last_successful_sync_timestamp_seconds => cortex_bucket_stores_blocks_last_successful_sync_timestamp_seconds{component="<service>"}
[CHANGE] Available command-line flags are printed to stdout, and only when requested via -help. Using invalid flag no longer causes printing of all available flags. #2691
[CHANGE] Experimental Memberlist ring: randomize gossip node names to avoid conflicts when running multiple clients on the same host, or reusing host names (eg. pods in statefulset). Node name randomization can be disabled by using -memberlist.randomize-node-name=false. #2715
[CHANGE] Memberlist KV client is no longer considered experimental. #2725
[CHANGE] Experimental Delete Series: Make delete request cancellation duration configurable. #2760
[CHANGE] Removed -store.fullsize-chunks option which was undocumented and unused (it broke ingester hand-overs). #2656
[CHANGE] Query with no metric name that has previously resulted in HTTP status code 500 now returns status code 422 instead. #2571
[FEATURE] TLS config options added for GRPC clients in Querier (Query-frontend client & Ingester client), Ruler, Store Gateway, as well as HTTP client in Config store client. #2502
[FEATURE] The flag -frontend.max-cache-freshness is now supported within the limits overrides, to specify per-tenant max cache freshness values. The corresponding YAML config parameter has been changed from results_cache.max_freshness to limits_config.max_cache_freshness. The legacy YAML config parameter (results_cache.max_freshness) will continue to be supported till Cortex release v1.4.0. #2609
[FEATURE] Experimental gRPC Store: Added support to 3rd parties index and chunk stores using gRPC client/server plugin mechanism. #2220
[FEATURE] Add -cassandra.table-options flag to customize table options of Cassandra when creating the index or chunk table. #2575
[ENHANCEMENT] Propagate GOPROXY value when building build-image. This is to help the builders building the code in a Network where default Go proxy is not accessible (e.g. when behind some corporate VPN). #2741
[ENHANCEMENT] Querier: Added metric cortex_querier_request_duration_seconds for all requests to the querier. #2708
[ENHANCEMENT] Cortex is now built with Go 1.14. #2480 #2749 #2753
[ENHANCEMENT] Experimental TSDB: added the following metrics to the ingester: #2580 #2583 #2589 #2654
- cortex_ingester_tsdb_appender_add_duration_seconds
- cortex_ingester_tsdb_appender_commit_duration_seconds
- cortex_ingester_tsdb_refcache_purge_duration_seconds
- cortex_ingester_tsdb_compactions_total
- cortex_ingester_tsdb_compaction_duration_seconds
- cortex_ingester_tsdb_wal_fsync_duration_seconds
- cortex_ingester_tsdb_wal_page_flushes_total
- cortex_ingester_tsdb_wal_completed_pages_total
- cortex_ingester_tsdb_wal_truncations_failed_total
- cortex_ingester_tsdb_wal_truncations_total
- cortex_ingester_tsdb_wal_writes_failed_total
- cortex_ingester_tsdb_checkpoint_deletions_failed_total
- cortex_ingester_tsdb_checkpoint_deletions_total
- cortex_ingester_tsdb_checkpoint_creations_failed_total
- cortex_ingester_tsdb_checkpoint_creations_total
- cortex_ingester_tsdb_wal_truncate_duration_seconds
- cortex_ingester_tsdb_head_active_appenders
- cortex_ingester_tsdb_head_series_not_found_total
- cortex_ingester_tsdb_head_chunks
- cortex_ingester_tsdb_mmap_chunk_corruptions_total
- cortex_ingester_tsdb_head_chunks_created_total
- cortex_ingester_tsdb_head_chunks_removed_total
[ENHANCEMENT] Experimental TSDB: added metrics useful to alert on critical conditions of the blocks storage: #2573
- cortex_compactor_last_successful_run_timestamp_seconds
- cortex_querier_blocks_last_successful_sync_timestamp_seconds (when store-gateway is disabled)
- cortex_querier_blocks_last_successful_scan_timestamp_seconds (when store-gateway is enabled)
- cortex_storegateway_blocks_last_successful_sync_timestamp_seconds
[ENHANCEMENT] Experimental TSDB: added the flag -experimental.tsdb.wal-compression-enabled to allow to enable TSDB WAL compression. #2585
[ENHANCEMENT] Experimental TSDB: Querier and store-gateway components can now use so-called "caching bucket", which can currently cache fetched chunks into shared memcached server. #2572
[ENHANCEMENT] Ruler: Automatically remove unhealthy rulers from the ring. #2587
[ENHANCEMENT] Query-tee: added support to /metadata, /alerts, and /rules endpoints #2600
[ENHANCEMENT] Query-tee: added support to query results comparison between two different backends. The comparison is disabled by default and can be enabled via -proxy.compare-responses=true. #2611
[ENHANCEMENT] Query-tee: improved the query-tee to not wait all backend responses before sending back the response to the client. The query-tee now sends back to the client first successful response, while honoring the -backend.preferred option. #2702
[ENHANCEMENT] Thanos and Prometheus upgraded. #2602 #2604 #2634 #2659 #2686 #2756
- TSDB now holds less WAL files after Head Truncation.
- TSDB now does memory-mapping of Head chunks and reduces memory usage.
[ENHANCEMENT] Experimental TSDB: decoupled blocks deletion from blocks compaction in the compactor, so that blocks deletion is not blocked by a busy compactor. The following metrics have been added: #2623
- cortex_compactor_block_cleanup_started_total
- cortex_compactor_block_cleanup_completed_total
- cortex_compactor_block_cleanup_failed_total
- cortex_compactor_block_cleanup_last_successful_run_timestamp_seconds
[ENHANCEMENT] Experimental TSDB: Use shared cache for metadata. This is especially useful when running multiple querier and store-gateway components to reduce number of object store API calls. #2626 #2640
[ENHANCEMENT] Experimental TSDB: when -querier.query-store-after is configured and running the experimental blocks storage, the time range of the query sent to the store is now manipulated to ensure the query end time is not more recent than 'now - query-store-after'. #2642
[ENHANCEMENT] Experimental TSDB: small performance improvement in concurrent usage of RefCache, used during samples ingestion. #2651
[ENHANCEMENT] The following endpoints now respond appropriately to an Accepts header with the value application/json #2673
- /distributor/all_user_stats
- /distributor/ha_tracker
- /ingester/ring
- /store-gateway/ring
- /compactor/ring
- /ruler/ring
- /services
[ENHANCEMENT] Experimental Cassandra backend: Add -cassandra.num-connections to allow increasing the number of TCP connections to each Cassandra server. #2666
[ENHANCEMENT] Experimental Cassandra backend: Use separate Cassandra clients and connections for reads and writes. #2666
[ENHANCEMENT] Experimental Cassandra backend: Add -cassandra.reconnect-interval to allow specifying the reconnect interval to a Cassandra server that has been marked DOWN by the gocql driver. Also change the default value of the reconnect interval from 60s to 1s. #2687
[ENHANCEMENT] Experimental Cassandra backend: Add option -cassandra.convict-hosts-on-failure=false to not convict host of being down when a request fails. #2684
[ENHANCEMENT] Experimental TSDB: Applied a jitter to the period bucket scans in order to better distribute bucket operations over the time and increase the probability of hitting the shared cache (if configured). #2693
[ENHANCEMENT] Experimental TSDB: Series limit per user and per metric now work in TSDB blocks. #2676
[ENHANCEMENT] Experimental Memberlist: Added ability to periodically rejoin the memberlist cluster. #2724
[ENHANCEMENT] Experimental Delete Series: Added the following metrics for monitoring processing of delete requests: #2730
- cortex_purger_load_pending_requests_attempts_total: Number of attempts that were made to load pending requests with status.
- cortex_purger_oldest_pending_delete_request_age_seconds: Age of oldest pending delete request in seconds.
- cortex_purger_pending_delete_requests_count: Count of requests which are in process or are ready to be processed.
[ENHANCEMENT] Experimental TSDB: Improved compactor to hard-delete also partial blocks with an deletion mark (even if the deletion mark threshold has not been reached). #2751
[ENHANCEMENT] Experimental TSDB: Introduced a consistency check done by the querier to ensure all expected blocks have been queried via the store-gateway. If a block is missing on a store-gateway, the querier retries fetching series from missing blocks up to 3 times. If the consistency check fails once all retries have been exhausted, the query execution fails. The following metrics have been added: #2593 #2630 #2689 #2695
- cortex_querier_blocks_consistency_checks_total
- cortex_querier_blocks_consistency_checks_failed_total
- cortex_querier_storegateway_refetches_per_query
[ENHANCEMENT] Delete requests can now be canceled #2555
[ENHANCEMENT] Table manager can now provision tables for delete store #2546
[BUGFIX] Ruler: Ensure temporary rule files with special characters are properly mapped and cleaned up. #2506
[BUGFIX] Fixes #2411, Ensure requests are properly routed to the prometheus api embedded in the query if -server.path-prefix is set. #2372
[BUGFIX] Experimental TSDB: fixed chunk data corruption when querying back series using the experimental blocks storage. #2400
[BUGFIX] Fixed collection of tracing spans from Thanos components used internally. #2655
[BUGFIX] Experimental TSDB: fixed memory leak in ingesters. #2586
[BUGFIX] QueryFrontend: fixed a situation where HTTP error is ignored and an incorrect status code is set. #2590
[BUGFIX] Ingester: Fix an ingester starting up in the JOINING state and staying there forever. #2565
[BUGFIX] QueryFrontend: fixed a panic (integer divide by zero) in the query-frontend. The query-frontend now requires the -querier.default-evaluation-interval config to be set to the same value of the querier. #2614
[BUGFIX] Experimental TSDB: when the querier receives a /series request with a time range older than the data stored in the ingester, it now ignores the requested time range and returns known series anyway instead of returning an empty response. This aligns the behaviour with the chunks storage. #2617
[BUGFIX] Cassandra: fixed an edge case leading to an invalid CQL query when querying the index on a Cassandra store. #2639
[BUGFIX] Ingester: increment series per metric when recovering from WAL or transfer. #2674
[BUGFIX] Fixed wrong number of arguments for 'mget' command Redis error when a query has no chunks to lookup from storage. #2700 #2796
[BUGFIX] Ingester: Automatically remove old tmp checkpoints, fixing a potential disk space leak after an ingester crashes. #2726

cortexproject/cortex v1.2.0 Cortex 1.2.0 on GitHub

cortexproject/cortex v1.2.0
Cortex 1.2.0

on GitHub