Cortex 1.20.0 Release Notes
This release contains 368 contributions from 38 contributors. We also have 14 new contributors. Thank you all for the contributions!
Some notable changes in this release are:
- Prometheus Remote Write 2.0 Support: Experimental support for Prometheus Remote Write 2.0 protocol.
- Parquet Format Support: Experimental Parquet based block storage. A new parquet converter service to convert TSDB blocks to parquet and querier to query parquet files.
- Query Federation with Regex Tenant Resolver: Introduce experimental regex tenant resolver allowing regex patterns in
X-Scope-OrgIDheader via-tenant-federation.regex-matcher-enabledflag - gRPC Stream Push between Distributor and Ingester: Experimental feature to use gRPC stream connections for push requests.
- More Native Histogram Support: Out-of-order native histogram ingestion support, per-tenant native histogram ingestion config, native histogram active series metrics and limits
- Resource-Based Monitor and Limiter:
ResourceMonitorto collect CPU and Heap usage for Cortex andResourceBasedLimiterin Ingesters and StoreGateways to protect the service from incoming requests when hitting limits - UTF-8 Name: UTF-8 name support via
-name-validation-schemeflag
What's Changed
- [CHANGE] StoreGateway/Alertmanager: Add default 5s connection timeout on client. #6603
- [CHANGE] Ingester: Remove EnableNativeHistograms config flag and instead gate keep through new per-tenant limit at ingestion. #6718
- [CHANGE] Validate a tenantID when to use a single tenant resolver. #6727
- [CHANGE] Ring: Add zone label to ring_members metric. #6900
- [FEATURE] Distributor: Add an experimental
-distributor.otlp.enable-type-and-unit-labelsflag to add__type__and__unit__labels for OTLP metrics. #6969 - [FEATURE] Distributor: Add an experimental
-distributor.otlp.allow-delta-temporalityflag to ingest delta temporality otlp metrics. #6934 - [FEATURE] Query Frontend: Add dynamic interval size for query splitting. This is enabled by configuring experimental flags
querier.max-shards-per-queryand/orquerier.max-fetched-data-duration-per-query. The split interval size is dynamically increased to maintain a number of shards and total duration fetched below the configured values. #6458 - [FEATURE] Querier/Ruler: Add
query_partial_dataandrules_partial_datalimits to allow queries/rules to be evaluated with data from a single zone, if other zones are not available. #6526 - [FEATURE] Update prometheus alertmanager version to v0.28.0 and add new integration msteamsv2, jira, and rocketchat. #6590
- [FEATURE] Ingester/StoreGateway: Add
ResourceMonitormodule in Cortex, and addResourceBasedLimiterin Ingesters and StoreGateways. #6674 - [FEATURE] Support Prometheus remote write 2.0. #6330
- [FEATURE] Ingester: Support out-of-order native histogram ingestion. It is automatically enabled when
-ingester.out-of-order-time-window > 0and-blocks-storage.tsdb.enable-native-histograms=true. #6626 #6663 - [FEATURE] Ruler: Add support for percentage based sharding for rulers. #6680
- [FEATURE] Ruler: Add support for group labels. #6665
- [FEATURE] Query federation: Introduce a regex tenant resolver to allow regex in
X-Scope-OrgIDvalue. #6713
- Add an experimental
tenant-federation.regex-matcher-enabledflag. If it enabled, user can input regex toX-Scope-OrgId, the matched tenantIDs are automatically involved. The user discovery is based on scanning block storage, so new users can get queries after uploading a block (generally 2h). - Add an experimental
tenant-federation.user-sync-intervalflag, it specifies how frequently to scan users. The scanned users are used to calculate matched tenantIDs.
- [FEATURE] Experimental Support Parquet format: Implement parquet converter service to convert a TSDB block into Parquet and Parquet Queryable. #6716 #6743
- [FEATURE] Distributor/Ingester: Implemented experimental feature to use gRPC stream connection for push requests. This can be enabled by setting
-distributor.use-stream-push=true. #6580 - [FEATURE] Compactor: Add support for percentage based sharding for compactors. #6738
- [FEATURE] Querier: Allow choosing PromQL engine via header
X-PromQL-EngineType. #6777 - [FEATURE] Querier: Support for configuring query optimizers and enabling XFunctions in the Thanos engine. #6873
- [FEATURE] Query Frontend: Add support /api/v1/format_query API for formatting queries. #6893
- [FEATURE] Query Frontend: Add support for /api/v1/parse_query API (experimental) to parse a PromQL expression and return it as a JSON-formatted AST (abstract syntax tree). #6978
- [ENHANCEMENT] Upgrade the Prometheus version to 3.6.0 and add a
-name-validation-schemeflag to support UTF-8. #7040 #7056 - [ENHANCEMENT] Distributor: Emit an error with a 400 status code when empty labels are found before the relabelling or label dropping process. #7052
- [ENHANCEMENT] Parquet Storage: Add support for additional sort columns during Parquet file generation #7003
- [ENHANCEMENT] Modernizes the entire codebase by using go modernize tool. #7005
- [ENHANCEMENT] Overrides Exporter: Expose all fields that can be converted to float64. Also, the label value
max_local_series_per_metricgot renamed tomax_series_per_metric, andmax_local_series_per_usergot renamed tomax_series_per_user. #6979 - [ENHANCEMENT] Ingester: Add
cortex_ingester_tsdb_wal_replay_unknown_refs_totalandcortex_ingester_tsdb_wbl_replay_unknown_refs_totalmetrics to track unknown series references during wal/wbl replaying. #6945 - [ENHANCEMENT] Distributor: Introduce a Protobuf model for Prometheus Remote Write 2.0 and a pool to improve performance. #6917
- [ENHANCEMENT] Ruler: Emit an error message when the rule synchronization fails. #6902
- [ENHANCEMENT] Querier: Support snappy and zstd response compression for
-querier.response-compressionflag. #6848 - [ENHANCEMENT] Tenant Federation: Add a # of query result limit logic when the
-tenant-federation.regex-matcher-enabledis enabled. #6845 - [ENHANCEMENT] Query Frontend: Add a
cortex_slow_queries_totalmetric to track # of slow queries per user. #6859 - [ENHANCEMENT] Query Frontend: Change to return 400 when the tenant resolving fail. #6715
- [ENHANCEMENT] Querier: Support query parameters to metadata api (/api/v1/metadata) to allow user to limit metadata to return. Add a
-ingester.return-all-metadataflag to make the metadata API run when the deployment. Please set this flag tofalseto use the metadata API with the limits later. #6681 #6744 - [ENHANCEMENT] Ingester: Add a
cortex_ingester_active_native_histogram_seriesmetric to track # of active NH series. #6695 - [ENHANCEMENT] Query Frontend: Add new limit
-frontend.max-query-response-sizefor total query response size after decompression in query frontend. #6607 - [ENHANCEMENT] Alertmanager: Add nflog and silences maintenance metrics. #6659
- [ENHANCEMENT] Querier: limit label APIs to query only ingesters if
startparam is not specified. #6618 - [ENHANCEMENT] Alertmanager: Add new limits
-alertmanager.max-silences-countand-alertmanager.max-silences-size-bytesfor limiting silences per tenant. #6605 - [ENHANCEMENT] Add
compactor.auto-forget-delayfor compactor to auto forget compactors after X minutes without heartbeat. #6533 - [ENHANCEMENT] StoreGateway: Emit more histogram buckets on the
cortex_querier_storegateway_refetches_per_querymetric. #6570 - [ENHANCEMENT] Querier: Apply bytes limiter to LabelNames and LabelValuesForLabelNames. #6568
- [ENHANCEMENT] Query Frontend: Add a
too_many_tenantsreason label value tocortex_rejected_queries_totalmetric to track the rejected query count due to the # of tenant limits. #6569 - [ENHANCEMENT] Alertmanager: Add receiver validations for msteamsv2 and rocketchat. #6606
- [ENHANCEMENT] Query Frontend: Add a
-frontend.enabled-ruler-query-statsflag to configure whether to report the query stats log for queries coming from the Ruler. #6504 - [ENHANCEMENT] OTLP: Support otlp metadata ingestion. #6617
- [ENHANCEMENT] AlertManager: Add
keep_instance_in_the_ring_on_shutdownandtokens_file_pathconfigs for alertmanager ring. #6628 - [ENHANCEMENT] Querier: Add metric and enhanced logging for query partial data. #6676
- [ENHANCEMENT] Ingester: Push request should fail when label set is out of order #6746
- [ENHANCEMENT] Querier: Add
querier.ingester-query-max-attemptsto retry on partial data. #6714 - [ENHANCEMENT] Distributor: Add min/max schema validation for Native Histogram. #6766
- [ENHANCEMENT] Ingester: Handle runtime errors in query path #6769
- [ENHANCEMENT] Compactor: Support metadata caching bucket for Cleaner. Can be enabled via
-compactor.cleaner-caching-bucket-enabledflag. #6778 - [ENHANCEMENT] Distributor: Add ingestion rate limit for Native Histogram. #6794 and #6994
- [ENHANCEMENT] Ingester: Add active series limit specifically for Native Histogram. #6796
- [ENHANCEMENT] Compactor, Store Gateway: Introduce user scanner strategy and user index. #6780
- [ENHANCEMENT] Querier: Support chunks cache for parquet queryable. #6805
- [ENHANCEMENT] Parquet Storage: Add some metrics for parquet blocks and converter. #6809 #6821
- [ENHANCEMENT] Compactor: Optimize cleaner run time. #6815
- [ENHANCEMENT] Parquet Storage: Allow percentage based dynamic shard size for Parquet Converter. #6817
- [ENHANCEMENT] Query Frontend: Enhance the performance of the JSON codec. #6816
- [ENHANCEMENT] Compactor: Emit partition metrics separate from cleaner job. #6827
- [ENHANCEMENT] Metadata Cache: Support inmemory and multi level cache backend. #6829
- [ENHANCEMENT] Store Gateway: Allow to ignore syncing blocks older than certain time using
ignore_blocks_before. #6830 - [ENHANCEMENT] Distributor: Add native histograms max sample size bytes limit validation. #6834
- [ENHANCEMENT] Querier: Support caching parquet labels file in parquet queryable. #6835
- [ENHANCEMENT] Querier: Support query limits in parquet queryable. #6870
- [ENHANCEMENT] Ingester: Add new metric
cortex_ingester_push_errors_totalto track reasons for ingester request failures. #6901 - [ENHANCEMENT] Ring: Expose
detailed_metrics_enabledfor all rings. Default true. #6926 - [ENHANCEMENT] Parquet Storage: Allow Parquet Queryable to disable fallback to Store Gateway. #6920
- [ENHANCEMENT] Query Frontend: Add a
format_queryandparse_querylabels value to theoplabel atcortex_query_frontend_queries_totalmetric. #6925 #6990 - [ENHANCEMENT] API: add request ID injection to context to enable tracking requests across downstream services. #6895
- [ENHANCEMENT] gRPC: Add gRPC Channelz monitoring. #6950
- [ENHANCEMENT] Upgrade build image and Go version to 1.24.6. #6970 #6976
- [ENHANCEMENT] Implement versioned transactions for writes to DynamoDB ring. #6986
- [ENHANCEMENT] Add source metadata to requests(api vs ruler) #6947
- [ENHANCEMENT] Add new metric
cortex_discarded_seriesandcortex_discarded_series_per_labelsetto track number of series that have a discarded sample. #6995 - [ENHANCEMENT] Ingester: Add
cortex_ingester_tsdb_head_stale_seriesmetric to keep track of number of stale series on head. #7071 - [ENHANCEMENT] Expose more Go runtime metrics. #7070
- [ENHANCEMENT] Distributor: Filter out label with empty value. #7069
- [BUGFIX] Ingester: Avoid error or early throttling when READONLY ingesters are present in the ring #6517
- [BUGFIX] Ingester: Fix labelset data race condition. #6573
- [BUGFIX] Compactor: Cleaner should not put deletion marker for blocks with no-compact marker. #6576
- [BUGFIX] Compactor: Cleaner would delete bucket index when there is no block in bucket store. #6577
- [BUGFIX] Querier: Fix marshal native histogram with empty bucket when protobuf codec is enabled. #6595
- [BUGFIX] Query Frontend: Fix samples scanned and peak samples query stats when query hits results cache. #6591
- [BUGFIX] Query Frontend: Fix panic caused by nil pointer dereference. #6609
- [BUGFIX] Ingester: Add check to avoid query 5xx when closing tsdb. #6616
- [BUGFIX] Querier: Fix panic when marshaling QueryResultRequest. #6601
- [BUGFIX] Ingester: Avoid resharding for query when restart readonly ingesters. #6642
- [BUGFIX] Query Frontend: Fix query frontend per
usermetrics clean up. #6698 - [BUGFIX] Add
__markers__tenant ID validation. #6761 - [BUGFIX] Ring: Fix nil pointer exception when token is shared. #6768
- [BUGFIX] Fix race condition in active user. #6773
- [BUGFIX] Ruler: Prevent counting 2xx and 4XX responses as failed writes. #6785
- [BUGFIX] Ingester: Allow shipper to skip corrupted blocks. #6786
- [BUGFIX] Compactor: Delete the prefix
blocks_metafrom the metadata fetcher metrics. #6832 - [BUGFIX] Store Gateway: Avoid race condition by deduplicating entries in bucket stores user scan. #6863
- [BUGFIX] Runtime-config: Change to check tenant limit validation when loading runtime config only for
all,distributor,querier, andrulertargets. #6880 - [BUGFIX] Distributor: Fix
/distributor/all_user_statsapi to work during rolling updates on ingesters. #7026 - [BUGFIX] Runtime-config: Fix panic when the runtime config is
null. #7062 - [BUGFIX] Scheduler: Avoid all queriers reserved for prioritized requests. #7057
New Contributors
- @dankrzeminski32 made their first contribution in #6636
- @CodingFabian made their first contribution in #6662
- @PaurushGarg made their first contribution in #6718
- @Ajay-Satish-01 made their first contribution in #6819
- @bogdan-st made their first contribution in #6818
- @siddarth2810 made their first contribution in #6891
- @rubywtl made their first contribution in #6884
- @guytet made their first contribution in #6915
- @avleentwilio made their first contribution in #6960
- @EpiJunkie made their first contribution in #6970
- @srlobo made their first contribution in #7007
- @vivekgarg20 made their first contribution in #7023
- @Angith made their first contribution in #7003
- @aclaygray made their first contribution in #6889
Full Changelog: v1.19.1...v1.20.0-rc.0