github grafana/mimir mimir-3.1.0-rc.0
3.1.0-rc.0

pre-release6 hours ago

This release contains 1433 PRs from 97 authors, including new contributors Alex Weaver, Ali Asghar, Anas, Andy Hay, Bernd Hois, Charbel Mitri, Chris, CR, Dominik Eisenberg, Elsa Adjei, Federico Torres, francoposa, Gerard van Engelen, HuanMeng, imishchuk-tsgs, Jara Suárez de Puga García, Juliette O, Justin Grothe, Kai Udo, Karl Skewes, Karol Chrapek, Kim Nylander, Kyle Fazzari, Lars Lehtonen, Laurent Dufresne, lif, Manas Srivastava, Manuel Alonso, Mariell Hoversholm, Nico Pazos, Nikolai Tikhonov, Olzhas, Ömer Çengel, Pavel Panfilov, psauvage, Q, Rodrigo Kellermann, Sander Ruitenbeek, Satyam Raj, sherinabr, Shouhei, Soya Kubodera, srpvpn, Thimo Soet. Thank you!

Grafana Mimir version 3.1.0-rc.0 release notes

Grafana Labs is excited to announce version 3.1 of Grafana Mimir.

The highlights that follow include the top features, enhancements, and bug fixes in this release. For the complete list of changes, refer to the CHANGELOG.

Features and enhancements

Grafana Mimir version 3.1 includes the following key features and enhancements.

More Kafka options for Ingest storage

Ingest storage now supports additional ways to authenticate with Kafka through the new -ingest-storage.kafka.sasl-mechanism flag, including SCRAM, OAUTHBEARER, and AWS MSK IAM authentication. In addition, new -ingest-storage.kafka.tls* flags allow connecting to Kafka clusters over TLS, including mTLS.

You can also configure multiple Kafka seed brokers via comma-separated values in -ingest-storage.kafka.address and enable rack-aware consumption with -ingest-storage.kafka.client-rack.

Separate ingestion limits by tenant metadata

Distributors can now track limits separately based on tenant metadata. This allows operators to track limits separately for subsets of write requests belonging to the same tenant, for example to prioritize some sources of metrics over others.

Clients may pass tenant metadata in the X-Scope-OrgID header using the format tenantID:key1=value1:key2=value2, and operators may define per-metadata overrides in the runtime configuration.

Mimir Query Engine (MQE) improvements

MQE continues to receive significant optimizations in this release:

  • Support for experimental PromQL extended range selector modifiers smoothed and anchored, enabled with -query-frontend.enabled-promql-extended-range-selectors=true.
  • Optimization passes for common subexpression elimination, subset selector elimination, projection pushdown, and multi-aggregation without buffering.
  • Improved per-query memory consumption limit enforcement in a variety of scenarios.
  • Experimental support for splitting and caching intermediate results for functions over range vectors in instant queries.
  • Support for the experimental info() PromQL function.

Zone-aware memberlist routing

A new experimental zone-aware routing feature for memberlist reduces cross-AZ data transfer by routing gossip messages within the local availability zone when possible. Configure it with -memberlist.zone-aware-routing.* flags.

Additional improvements

Grafana Mimir 3.1 also includes:

  • Store-gateways now verify CRC32 checksums for 1 out of every 128 chunks read from object storage and the chunks cache to detect corruption.
  • GCS uploads are now optionally retryable, with configurable max retries per storage backend.
  • Disk interaction has been removed when loading ruler rules. Rule evaluation failures now include a reason label (operator or user) in prometheus_rule_evaluation_failures_total for better error classification.
  • Query blocking via blocked_queries is now stable and no longer experimental, with support for blocking queries exceeding a time range duration (time_range_longer_than) or with steps smaller than a threshold (step_size_shorter_than).
  • The per-tenant postings-for-matchers cache is now stable.
  • Out-of-order ingestion support is now stable, configured via -ingester.out-of-order-time-window.
  • The -alertmanager.utf8-strict-mode-enabled flag is now stable.
  • The query-scheduler now drains the queue before exiting during shutdown.
  • Distributors support zone-aware rate limiting via -distributor.ring.instance-availability-zone, dividing the global ingestion rate by zones instead of total distributors.
  • Default ingest storage configuration now enables concurrency settings for improved throughput.
  • Optional per-tenant max limits for label name and label value requests via max_label_names_limit and max_label_values_limit.
  • Runtime configuration can now be loaded from HTTP URLs in addition to local files via -runtime-config.file. This may reduce configuration propagation times.
  • Blocked queries configuration is now validated at load time.

Important changes

Grafana Mimir 3.1 introduces several updates that change default behavior and configuration. Review these changes before upgrading:

  • Experimental support for disabling ring heartbeats and heartbeat timeouts has been removed.
  • The -target=flusher mode has been removed; use the /ingester/flush HTTP endpoint instead.
  • Uploaded TSDB blocks must now use v2 of the index file format. Store-gateways no longer generate index-headers from v1 index format blocks.
  • Per-step stats are no longer supported when MQE is enabled. The -query-frontend.cache-samples-processed-stats flag is deprecated and has no effect.
  • The -querier.response-streaming-enabled flag has been removed; active series responses are now always streamed.
  • cortex_ingest_storage_writer_buffered_produce_bytes has been renamed to cortex_ingest_storage_writer_buffered_produce_bytes_distribution.
  • Metric cortex_ingester_owned_target_info_series has been removed.
  • The cost_attribution_labels configuration option has been removed; use cost_attribution_labels_structured instead.
  • -querier.prefer-availability-zone has been renamed to -querier.prefer-availability-zones and now accepts a comma-separated list.
  • The per-query memory consumption limit now considers more sources of memory consumption. As a result, queries that previously succeeded may now fail due to exceeding the memory consumption limit.
  • The following flags have been removed:
    • -distributor.metric-relabeling-enabled
    • -compactor.no-blocks-file-cleanup-enabled
    • -compactor.in-memory-tenant-meta-cache-size
    • -blocks-storage.bucket-store.index-header.eager-loading-startup-enabled
    • *.memcached.dns-ignore-startup-failures

Experimental features

Grafana Mimir 3.1 includes some features that are experimental. Use these features with caution and report any issues that you encounter:

  • New usage-tracker component to enforce series limits before data is ingested.
  • Zone-aware memberlist routing to reduce cross-AZ data transfer.
  • Query planning in query-frontends with distributed execution across queriers.
  • Support in MQE for experimental PromQL extended range selector modifiers (smoothed, anchored).
  • Support in MQE for the experimental info() PromQL function.
  • MQE optimization passes: multi-aggregation, subset selector elimination, common subexpression elimination for range vector expressions.
  • Per-zone store-gateway shard size (-store-gateway.tenant-shard-size-per-zone).
  • Running ingesters with no tokens in the ring when ingest storage is enabled (-ingester.ring.num-tokens=0).
  • Per-sample HA deduplication (-distributor.ha-tracker.per-sample-dedupe).
  • Per-tenant early head compaction for ingesters based on owned series count.
  • Store-gateway excluded zones (-store-gateway.sharding-ring.excluded-zones).
  • Controlling OTLP metric name suffix addition and translation strategy via request headers, gated by -api.otlp-translation-headers-enabled.
  • Memberlist propagation delay tracker (-memberlist.propagation-delay-tracker.enabled).
  • Reporting the number of samples read per query in MQE.

Bug fixes

For a detailed list of bug fixes, refer to the CHANGELOG.

Helm chart improvements

The Grafana Mimir Helm chart is released independently. Refer to the Grafana Mimir Helm chart documentation.

Changelog

3.1.0-rc.0

Grafana Mimir

  • [CHANGE] Query-frontend: Renamed minimum_step_size filter in blocked_queries configuration to step_size_shorter_than to follow the naming convention of time_range_longer_than. Users with minimum_step_size in their runtime configuration must rename the field. #15081
  • [CHANGE] Query-frontend: blocked_queries configuration is now validated at load time; a configuration error is returned if a rule has an empty pattern, or has regex: true with a pattern that is not a valid regular expression. #14978
  • [CHANGE] Ingester: Changed default value of -include-tenant-id-in-profile-labels from false to true. #13375
  • [CHANGE] Hash ring: removed experimental support for disabling heartbeats (setting -*.ring.heartbeat-period=0) and heartbeat timeouts (setting -*.ring.heartbeat-timeout=0). These configurations are now invalid. #13104
  • [CHANGE] Distributor: removed experimental flag -distributor.metric-relabeling-enabled. #13143
  • [CHANGE] Compactor: removed experimental flag -compactor.no-blocks-file-cleanup-enabled. Cleanup of remaining files when no blocks exist is now always enabled. #13108
  • [CHANGE] Ruler: Add "unknown" alert rule state to alerts and rules on the GET <prometheus-http-prefix>/api/v1/alerts end point. Alerts are in the "unknown" state when they haven't yet been evaluated since the ruler started. #13060
  • [CHANGE] All: remove experimental feature that allowed disabling ring heartbeats and timeouts. #13142
  • [CHANGE] Store-gateway: Removed experimental -blocks-storage.bucket-store.index-header.eager-loading-startup-enabled flag. The eager loading feature is now always enabled when lazy loading is enabled. #13126
  • [CHANGE] Compactor: remove experimental -compactor.in-memory-tenant-meta-cache-size. #13131
  • [CHANGE] Distributor: Replace per-label-value warning on value length exceeded by an aggregated summary per metric and label name. #13189
  • [CHANGE] Limits: removed the experimental cost_attribution_labels configuration option. Use cost_attribution_labels_structured instead. #13286
  • [CHANGE] Ingester: Renamed cortex_ingest_storage_writer_buffered_produce_bytes metric to cortex_ingest_storage_writer_buffered_produce_bytes_distribution (Prometheus summary), and added cortex_ingest_storage_writer_buffered_produce_bytes metric that exports the buffer size as a Prometheus Gauge. #13414
  • [CHANGE] Querier and query-frontend: Removed support for per-step stats when MQE is enabled. #13582
  • [CHANGE] Querier: Make the experimental enable_delayed_name_removal setting configurable as a per-tenant limit instead of a global flag. #13926
  • [CHANGE] Compactor: Require that uploaded TSDB blocks use v2 of the index file format. #13815
  • [CHANGE] Store-gateway: Remove support for generating index-headers from TSDB blocks that use v1 of the index file format. #13824
  • [CHANGE] Query-frontend: Removed support for calculating 'cache-adjusted samples processed' query statistic. The -query-frontend.cache-samples-processed-stats CLI flag has been deprecated and will be removed in a future release. Setting it has now no effect. #13582
  • [CHANGE] Querier: Renamed experimental flag -querier.prefer-availability-zone to -querier.prefer-availability-zones and changed it to accept a comma-separated list of availability zones. All zones in the list are given equal priority when querying ingesters and store-gateways. #13756 #13758
  • [CHANGE] Ingester: Stabilize experimental flag -ingest-storage.write-logs-fsync-before-kafka-commit-concurrency to fsync write logs before the offset is committed to Kafka. Remove -ingest-storage.write-logs-fsync-before-kafka-commit-enabled since this is always enabled now. #13591
  • [CHANGE] Query-frontend: Remove experimental flags -query-frontend.shard-active-series-queries, -query-frontend.use-active-series-decoder. These settings are always enabled now. #15091
  • [CHANGE] Ingester: Remove metric cortex_ingester_owned_target_info_series; if needed this is better done as a custom active series tracker. #13831
  • [CHANGE] Querier: Remove experimental flag -querier.response-streaming-enabled, active series responses are now always streamed to query-frontends. #14095 #14114
  • [CHANGE] Store-gateway: Warn when loading index headers based on TSDB blocks that use v1 of the index file format. #13834
  • [CHANGE] Cache: Remove the experimental setting -<prefix>.memcached.dns-ignore-startup-failures that allowed failure to discover Memcached servers to be a soft error and always consider failure to discover Memcached servers a hard error. #14038
  • [CHANGE] Ingester: Removed the -target=flusher mode. If you need to flush ingester data, use the /ingester/flush HTTP endpoint instead. #14032
  • [CHANGE] Limits: Add new limit -validation.max-active-series-additional-custom-trackers (default: 0) to control the maximum number of additional custom trackers per tenant. This limit only applies to active_series_additional_custom_trackers, not to -ingester.active-series-custom-trackers. Set to 0 (the default) to disable the limit. #14226 #14256
  • [CHANGE] Querier and query-frontend: The experimental -querier.mimir-query-engine.enable-common-subexpression-elimination-for-range-vector-expressions-in-instant-queries and -querier.mimir-query-engine.enable-skipping-histogram-decoding flags have been removed. Both were previously enabled by default and now cannot be disabled. #14237
  • [CHANGE] Query-frontend: The per-query memory consumption limit now considers the estimated memory consumed by buffered messages received from queriers but not yet consumed. #14775
  • [CHANGE] Querier: The flag -querier.filter-queryables-enabled is deprecated and will be removed in Mimir 3.3. #14843
  • [CHANGE] Ingest storage: The cortex_ingest_storage_writer_latency_seconds metric now tracks the latency to write an incoming request to all Kafka partitions in a single call, instead of tracking latency individually for each partition. #14898
  • [CHANGE] Ingest storage: deprecated -ingest-storage.kafka.write-clients CLI flag. The flag is now ignored and Mimir always uses a single Kafka write client. The flag will be removed in Mimir 3.3. #14903
  • [CHANGE] Alertmanager: -alertmanager.grafana-alertmanager-idle-grace-period renamed to -alertmanager.strict-initialization-idle-grace-period. #14960
  • [CHANGE] Query-frontend: The per-query memory consumption limit now spans all time-split sub-queries when MQE is enabled rather than applying per split query. #14980
  • [CHANGE] Query-frontend: Rewriting middleware now runs before user-injected middlewares. #15111
  • [FEATURE] Distributor: experimental per-tenant limit -distributor.ha-tracker.per-sample-dedupe (per-tenant ha_tracker_per_sample_dedupe) to evaluate HA deduplication for each timeseries within a write request rather than making a single decision based on the first series. Enables correct behavior for mixed-label requests (e.g. Prometheus federation, metrics proxies) without affecting standard setups that have uniform HA labels within a single request. Disabled by default. #15064
  • [FEATURE] Distributor: add -validation.enforce-out-of-order-window-on-distributor per-tenant option. When enabled and past_grace_period is 0, distributors reject samples older than out_of_order_time_window, matching ingester behavior, without relying on a small past_grace_period. #15090
  • [FEATURE] Runtime config: Support loading configuration from http:// and https:// URLs in addition to local files via -runtime-config.file. Added -runtime-config.http-client-timeout (default 30s) to control the HTTP fetch timeout. Added -runtime-config.http-client-cluster-validation.label (inheritable from -common.client-cluster-validation.label) to send the X-Cluster validation header when fetching from a cluster-validated HTTP endpoint. #15052 #15244
  • [FEATURE] Distributor: Derive limits based on tenant metadata. Supported limits are max_active_series_per_user, ingestion_rate, ingestion_burst_size, ingestion_burst_factor, otel_metric_suffixes_enabled, name_validation_scheme and otel_translation_strategy. #14289
  • [FEATURE] Distributor: add cortex_distributor_request_body_compression_ratio histogram that tracks the compression of write requests. #14232
  • [FEATURE] Distributor: add -distributor.otel-label-name-underscore-sanitization and -distributor.otel-label-name-preserve-underscores that control sanitization of underscores during OTLP translation. #13133
  • [FEATURE] Query-frontends: Automatically adjust features used in query plans generated for remote execution based on what the available queriers support. #13017 #13164 #13544
  • [FEATURE] Memberlist: Add experimental support for zone-aware routing in order to reduce memberlist cross-AZ data transfer. #13129 #13651 #13664
  • [FEATURE] Query-frontend and querier: Add experimental support for performing query planning in query-frontends and distributing portions of the plan to queriers for execution. #13058 #13685 #13800 #14001 #14027
  • [FEATURE] MQE: Add support for experimental extended range selector modifiers smoothed and anchored. You can enable these modifiers with -query-frontend.enabled-promql-extended-range-selectors=smoothed,anchored #13398
  • [FEATURE] MQE: Add support for the experimental PromQL function info. #13443
  • [FEATURE] Querier: Add querier.mimir-query-engine.enable-reduce-matchers flag that enables a new MQE AST optimization pass that eliminates duplicate or redundant matchers that are part of selector expressions. #13178
  • [FEATURE] Continuous test: Add prometheus2 option -tests.write-protocol flag to select Prometheus Remote-Write 2.0 as a protocol. #13659 #13982
  • [FEATURE] Continuous test: Write metrics metadata along with samples. #13659 #13732 #13796
  • [FEATURE] Store-gateway: Add experimental per-zone shard size -store-gateway.tenant-shard-size-per-zone. When set, the total shard size is computed as this value multiplied by the number of zones. This option takes precedence over -store-gateway.tenant-shard-size. #13835
  • [FEATURE] Distributor, Ingester: Add experimental reactive limiter setting -distributor.reactive-limiter.max-limit-factor-decay. #14007
  • [FEATURE] Ingester: Added experimental per-tenant early head compaction. New per-tenant limits -ingester.early-head-compaction-owned-series-threshold and -ingester.early-head-compaction-min-estimated-series-reduction-percentage trigger compaction based on owned series count across all ingesters. #13980 #15056
  • [FEATURE] Ingester: Added experimental support to run ingesters with no tokens in the ring when ingest storage is enabled. You can set -ingester.ring.num-tokens=0 to enable this feature. #14024
  • [FEATURE] Store-gateway: Add -store-gateway.sharding-ring.excluded-zones flag to exclude specific zones from the store-gateway ring. #14120
  • [FEATURE] Ingest storage: Add -ingest-storage.kafka.sasl-mechanism flag supporting more ways to authenticate with Kafka. #14307 #14344 #14540 #14674
  • [FEATURE] Ingest storage: Add -ingest-storage.kafka.tls* flags to connect to Kafka using TLS. #14550
  • [FEATURE] Ingest storage: Add -ingest-storage.ingestion-partition-tenant-write-shard-size to limit the number of partitions used for writes independently from reads, allowing safely reducing the shard size without losing query coverage during the migration. #14780
  • [FEATURE] MQE: Add experimental support for splitting and caching intermediate results for functions over range vectors in instant queries. #13472 #14479 #14506 #14499 #14517 #14536 #14614 #14645 #14677 #14788
  • [FEATURE] MQE: Add experimental support for reporting the number of samples read per query. #14828 #14839 #14952 #15035 #15045
  • [FEATURE] Compactor: Add -compactor.ooo-split-and-merge-shards per-tenant limit to allow a separate shard count for blocks with the out-of-order external label. #14704
  • [FEATURE] Distributor: add experimental support for controlling OTLP metric name suffix addition and translation strategy via X-Mimir-OTLP-AddSuffixes and X-Mimir-OTLP-TranslationStrategy request headers on the OTLP push path, gated by -api.otlp-translation-headers-enabled (off by default). #14782
  • [ENHANCEMENT] Ingest storage: Default to the more efficient -ingest-storage.kafka.producer-record-version=2 based on Remote-Write 2.0, which reduces Kafka record size and improves write throughput. #15185
  • [ENHANCEMENT] Distributor: Add per-tenant -distributor.active-series-limit-response-code override to configure the HTTP response code returned when rejecting series due to the active series limit. Defaults to 429 (Too Many Requests). Set to 400 (Bad Request) to prevent clients from retrying rejected requests. #14981
  • [ENHANCEMENT] Query-frontend: Add minimum_step_size filter to blocked queries config to reject range queries with a step smaller than the configured threshold. #14885
  • [ENHANCEMENT] Query-frontend: Add support for blocking queries exceeding a time range duration with time_range_longer_than. #14609
  • [ENHANCEMENT] Distributor: Add zone-aware rate limiting via -distributor.ring.instance-availability-zone. When configured the global ingestion rate limit is divided by the number of zones and the number of distributors in the local zone, instead of the total number of distributors. #14515
  • [ENHANCEMENT] Memberlist: Add experimental propagation delay tracker to measure gossip propagation delay across the memberlist cluster. Enable with -memberlist.propagation-delay-tracker.enabled=true. #14312 #14406
  • [ENHANCEMENT] Memberlist: Add -memberlist.received-messages-queue-size to configure the size of the internal queue for messages received from other nodes. Increasing this value may help to avoid dropping messages when the node is processing a large number of messages from other nodes. #14995
  • [ENHANCEMENT] Compactor: Make the compactor dashboard autoscaling panel work with non-CPU scaling metrics. #15017
  • [ENHANCEMENT] Compactor: Add 0-100% jitter to the first compaction interval to spread compactions when multiple compactors start simultaneously. #14280
  • [ENHANCEMENT] Compactor, Store-gateway: Remove experimental setting -compactor.upload-sparse-index-headers and always upload sparse index-headers. This improves lazy loading performance in the store-gateway. #13089 #13882
  • [ENHANCEMENT] Querier: Reduce memory consumption of queries samples for a single series are retrieved from multiple ingesters or store-gateways. #13806
  • [ENHANCEMENT] Store-gateway: Verify CRC32 checksums for 1 out of every 128 chunks read from object storage and the chunks cache to detect corruption. #13151
  • [ENHANCEMENT] Ingester: the per-tenant postings for matchers cache is now stable. Use the following configuration options: #13101
    • -blocks-storage.tsdb.head-postings-for-matchers-cache-ttl
    • -blocks-storage.tsdb.head-postings-for-matchers-cache-max-bytes
    • -blocks-storage.tsdb.head-postings-for-matchers-cache-force
    • -blocks-storage.tsdb.block-postings-for-matchers-cache-ttl
    • -blocks-storage.tsdb.block-postings-for-matchers-cache-max-bytes
    • -blocks-storage.tsdb.block-postings-for-matchers-cache-force
  • [ENHANCEMENT] OTLP: De-duplicate target_info samples with conflicting timestamps. #13204
  • [ENHANCEMENT] Query-frontend: Include the number of remote execution requests performed for a request in query stats logs emitted by query-frontends when remote execution is enabled. #13248
  • [ENHANCEMENT] Update Docker base images from alpine:3.22.1 to alpine:3.22.2. #12991
  • [ENHANCEMENT] Compactor, Store-gateway: Add metrics to track performance of in-memory and disk-based metadata caches. #13150
  • [ENHANCEMENT] Ruler: Removed disk interaction when loading rules. #13156
  • [ENHANCEMENT] Ingester: Cost-based index lookup planner accounts for query sharding when estimating cardinality and filter costs. #13374
  • [ENHANCEMENT] GCS: Make uploads optionally retryable. Use the following advanced flags (default true): #13226 #13842
    • -alertmanager-storage.gcs.enable-upload-retries
    • -blocks-storage.gcs.enable-upload-retries
    • -common.storage.gcs.enable-upload-retries
    • -ruler-storage.gcs.enable-upload-retries
    • -alertmanager-storage.gcs.max-retries
    • -blocks-storage.gcs.max-retries
    • -common.storage.gcs.max-retries
    • -ruler-storage.gcs.max-retries
  • [ENHANCEMENT] Usage-tracker: Improve first snapshot loading & rehash speed. #13284
  • [ENHANCEMENT] Query-frontend: Return different error messages when experimental functions, aggregations, or extended range selector modifiers are used but not enabled for a tenant. #13398
  • [ENHANCEMENT] Usage-tracker: Improved snapshot loading by doing it in parallel with GOMAXPROCS workers. #13608 #13622
  • [ENHANCEMENT] Usage-tracker, distributor: Make usage-tracker calls asynchronous for users who are far enough from the series limits. #13427
  • [ENHANCEMENT] Usage-tracker: Ensure tenant shards have enough capacity when loading a snapshot. #13607
  • [ENHANCEMENT] Usage-tracker: Limit-aware map growth in tenant shards to avoid excessive memory allocation when tenants grow slightly beyond their limit. #14642
  • [ENHANCEMENT] Ruler: Implemented OperatorControllableErrorClassifier for rule evaluation, allowing differentiation between operator-controllable errors (e.g., storage failures, 5xx errors, rate limiting) and user-controllable errors (e.g., bad queries, validation errors, 4xx errors). This change affects the rule evaluation failure metric prometheus_rule_evaluation_failures_total, which now includes a reason label with values operator or user to distinguish between them. #13313, #13470
  • [ENHANCEMENT] Store-gateway: Added cortex_bucket_store_block_discovery_latency_seconds metric to track time from block creation to discovery by store-gateway. #13489 #13552 #13963
  • [ENHANCEMENT] Alertmanager, distributor, querier, ruler: Added experimental CLI flags to configure a grace period for health checks for connections to other services or other replicas. The default value of 0 preserves the existing behaviour of immediately removing connections that have failed a health check. #13521 #13846
    • -alertmanager.alertmanager-client.health-check-grace-period
    • -distributor.ingester-health-check-grace-period
    • -querier.frontend-client.health-check-grace-period
    • -querier.store-gateway-client.health-check-grace-period
    • -ruler.client.health-check-grace-period
  • [ENHANCEMENT] Query-frontend: Added more efficient encoding and decoding of JSON payloads. #13561
  • [ENHANCEMENT] Querier: Add optional per-tenant max limits for label name and label value requests, max_label_names_limit and max_label_values_limit. #13654
  • [ENHANCEMENT] Usage tracker: loadSnapshot() checks shard emptiness instead of using explicit first parameter. #13534
  • [ENHANCEMENT] OTLP: Add metric cortex_distributor_otlp_requests_by_content_type_total to track content type (json or proto) of OTLP packets. #13525
  • [ENHANCEMENT] Query-scheduler: Gracefully handle shutdown by draining the queue before exiting. #13603 #13976
  • [ENHANCEMENT] OTLP: Add experimental metric cortex_distributor_otlp_array_lengths to better understand the layout of OTLP packets in practice. #13525
  • [ENHANCEMENT] Ruler: gRPC errors without details are classified as operator errors, and rule evaluation failures (such as duplicate labelsets) are classified as user errors. #13586
  • [ENHANCEMENT] Server: The /metrics endpoint now supports metrics filtering by providing one or more name[] query parameters. #13746
  • [ENHANCEMENT] Distributor: Improved the performance of configuration retrieval in the validation middleware. #13807
  • [ENHANCEMENT] Ingester: Make sharded active-series requests matching all series faster. #13491
  • [ENHANCEMENT] Ingester: New -blocks-storage.tsdb.close-idle-tsdb-when-shipping-disabled flag to enforce closing of idle TSDBs when block shipping is disabled. #13862
  • [ENHANCEMENT] Partitions ring: Add support to forcefully lock a partition state through the web UI. #13811
  • [ENHANCEMENT] Usage-tracker: Serialize metrics gathering to reduce tail latency when running many partitions on a single instance. #13886
  • [ENHANCEMENT] Usage-tracker: Add experimental per-user series created and removed counter metrics, gated behind -usage-tracker.enable-verbose-series-creation-deletion-prometheus-metrics. #14486
  • [ENHANCEMENT] API: The /api/v1/user_limits endpoint is now stable and no longer experimental. #13218
  • [ENHANCEMENT] Ingester: limiting CPU and memory utilized by the read path (-ingester.read-path-cpu-utilization-limit and -ingester.read-path-memory-utilization-limit) is now considered stable. #13167
  • [ENHANCEMENT] Querier: -querier.max-estimated-fetched-chunks-per-query-multiplier is now stable and no longer experimental. #13120
  • [ENHANCEMENT] Alertmanager: UTF-8 strict mode (-alertmanager.utf8-strict-mode-enabled) is now stable and no longer experimental. #13109
  • [ENHANCEMENT] Promote the logger rate-limiting configuration parameters from experimental to stable. #13128
  • [ENHANCEMENT] Ingester: Out-of-order ingestion support is now stable, use -ingester.out-of-order-time-window and -ingester.out-of-order-blocks-external-label-enabled to configure it. #13132
  • [ENHANCEMENT] Ruler: align_evaluation_time_on_interval is now stable and no longer experimental. #13103
  • [ENHANCEMENT] Query-frontend: query blocking (configured with blocked_queries limit) is now stable and no longer experimental. #13107
  • [ENHANCEMENT] Querier: -querier.active-series-results-max-size-bytes is now stable and no longer experimental. #13110
  • [ENHANCEMENT] API: The /api/v1/cardinality/active_series endpoint is now stable and no longer experimental. #13111
  • [ENHANCEMENT] Querier: Default to streaming active series responses to query-frontends via querier.response-streaming-enabled. #13883
  • [ENHANCEMENT] Store-gateway: Add cortex_bucket_store_blocks_loaded_size_bytes metric to track per-tenant disk utilization. #13891
  • [ENHANCEMENT] Compactor: If compaction fails because the result block would exceed the size limit for its postings offsets table, symbol table, or index, mark input blocks for no-compaction to avoid blocking future compactor runs. #13876 #14466 #14482
  • [ENHANCEMENT] Query-frontend: add support for range() duration expression. #13931
  • [ENHANCEMENT] Add experimental flag common.instrument-reference-leaks-percentage to leaked references to gRPC buffers. #13609 #14083
  • [ENHANCEMENT] Querier: Add experimental flag -querier.mimir-query-engine.enable-projection-pushdown to enable an MQE optimization pass for reducing data transferred between queriers and the storage layer. #14006 #14132 #14239 #14241 #14326 #14720 #14751 #14800
  • [ENHANCEMENT] MQE: Default to enabling the "eliminate deduplicate and merge" optimization pass via -querier.mimir-query-engine.enable-eliminate-deduplicate-and-merge. #14172
  • [ENHANCEMENT] Ingester: Reduce likelihood of ingestion being paused while idle TSDB compaction is in progress. #13978
  • [ENHANCEMENT] Ingester: Extend cortex_ingester_tsdb_forced_compactions_in_progress metric to report a value of 1 when there's an idle or forced TSDB head compaction in progress. #13979
  • [ENHANCEMENT] Usage-tracker, distributor: Distributor accumulates batches of series and sends them to usage-tracker in fewer RPCs if '-distributor.usage-tracker-client.use-batched-tracking' is enabled. #13966 #13983
  • [ENHANCEMENT] MQE: Include metric name in histogram_quantile warning/info annotations when delayed name removal is enabled. #13905
  • [ENHANCEMENT] MQE: Add metrics to track step-invariant expression usage and data point reuse savings: cortex_mimir_query_engine_step_invariant_nodes_total and cortex_mimir_query_engine_step_invariant_steps_saved_total. #13911
  • [ENHANCEMENT] MQE: Add explicit error handling for unsupported Prometheus experimental binary operator modifiers fill, fill_left and fill_right. #14107
  • [ENHANCEMENT] MQE: Add experimental support for computing multiple aggregations over the same data without buffering. Enable with -querier.mimir-query-engine.enable-multi-aggregation=true. #14123 #14174
  • [ENHANCEMENT] Querier: Add support for the first phase of using non-opaque GRPC types between queriers and store-gateways per #14264. #14253
  • [ENHANCEMENT] Querier: Optimize querying store-gateways when many of them are in a LEAVING state. #14157
  • [ENHANCEMENT] Memberlist: Add "Size" column to "KV Store" table in the memberlist web page. #14200
  • [ENHANCEMENT] Memberlist: Add experimental configuration option -memberlist.rejoin-seed-nodes to set custom seed nodes used by periodic rejoin (when enabled). #14208
  • [ENHANCEMENT] Ingester: Check labels order and uniqueness on ingestion to protect against corruption. #14089
  • [ENHANCEMENT] Ingester: Add experimental file based Kafka consumer group offset tracking via flag -ingest-storage.kafka.consumer-group-offset-commit-file-enforced. #14110
  • [ENHANCEMENT] Store-gateway: Add "OOO" column to the tenant blocks page to indicate whether each block was created from out-of-order samples. #14283
  • [ENHANCEMENT] Ingester: Optimize ingestion from Kafka in clusters with mixed size tenants. #13924 #13961 #14302
  • [ENHANCEMENT] Querier: Add new config flag querier.enable-delayed-name-removal-prometheus-engine to enable delayed name removal for Prometheus engine. #14349
  • [ENHANCEMENT] Ingester: reduce heap usage during streaming chunk queries by releasing series label memory after each batch is sent rather than holding it until chunk streaming completes. #14422
  • [ENHANCEMENT] Ingester: Eliminate 20-minute active series metrics loading period when custom tracker or cost attribution configuration changes. Active series counts are now immediately correct after a config reload. #14537
  • [ENHANCEMENT] Ingester: Export cortex_ingester_active_series_loading gauge metric that is 1 while active series counts are still warming up after ingester startup, and 0 once they are accurate (after IdleTimeout has elapsed). #14783
  • [ENHANCEMENT] Ingester: Allow /ingester/flush endpoint to be called while the ingester is starting up. This is useful during incidents where ingesters are stuck replaying from Kafka because they hit the max series limit, and an operator needs to manually trigger TSDB head compaction to free up in-memory series. #15065
  • [ENHANCEMENT] Ingest storage: Skip kotel tracing hooks for unsampled traces in the franz-go Kafka client, significantly reducing CPU and memory overhead. #14852
  • [ENHANCEMENT] Distributor: Reduced CPU utilization when writing to ingest storage with a large number of partitions by batching all partitions into a single Kafka produce call instead of one per partition. #14898
  • [ENHANCEMENT] Ingest storage: Allow configuring multiple Kafka seed brokers via -ingest-storage.kafka.address (comma-separated). #14328
  • [ENHANCEMENT] MQE: Add experimental support for eliminating selectors that are a subset of another selector. Enable with -querier.mimir-query-engine.enable-subset-selector-elimination=true. #14456 #14457 #14546 #14559 #14561 #14621
  • [ENHANCEMENT] Ingest storage: Add -ingest-storage.kafka.client-rack flag to enable rack awareness. #14434
  • [ENHANCEMENT] Ingester: Add cortex_ingester_queried_blocks_total metric to track TSDB block generations queried. #14572
  • [ENHANCEMENT] Distributor, ingest storage: Add cortex_distributor_received_bytes_total and cortex_ingest_storage_writer_input_bytes_total metrics to measure Remote Write v2 symbols table compression effectiveness. #14453
  • [ENHANCEMENT] Store-gateway: Added cortex_bucket_store_chunk_size_estimate_type_total metric to track how often do we infer the size of a chunk or use the default size. #14477
  • [ENHANCEMENT] Block-builder: Expose per-tenant TSDB metrics. #14364 #14699
  • [ENHANCEMENT] Block-builder: Add experimental -block-builder.generate-sparse-index-headers option. Construct and upload sparse index headers to object storage as part of block creation to make the sparse headers available to store-gateways when loading uncompacted blocks. #14494
  • [ENHANCEMENT] Add experimental -http.response-compression-level CLI flag to set the gzip compression level used for compressed HTTP responses. #14586
  • [ENHANCEMENT] Query-frontend: Add support for lookback_delta query parameter for instant and range queries. #14582 #14588
  • [ENHANCEMENT] Query-frontend: Extend query blocking to optionally only apply a blocking rule if the query is an unaligned range query. Set unaligned_range_queries: true to enable. #14643
  • [ENHANCEMENT] Store-gateway: Add experimental flag blocks-storage.bucket-store.partitioner-max-gap-bytes-chunks to specify the gap size for the chunks partitioner. #14649
  • [ENHANCEMENT] Compactor: Add expermental -compactor.first-level-compaction-ooo-wait-period to configure a separate compaction wait period for out-of-order blocks. It's an analogue of -compactor.first-level-compaction-wait-period, which currently ignores out-of-order blocks. #14627
  • [ENHANCEMENT] Block-builder: Support for experimental -blocks-storage.tsdb.early-head-compaction-min-in-memory-series to enforce early head compaction, if in-memory series reach threshold. #14678
  • [ENHANCEMENT] Usage-tracker: Improve performance of TrackSeriesBatch by preprocessing input data. #14702 #14734
  • [ENHANCEMENT] MQE: Improve per-query memory consumption limit enforcement in histogram function evaluations. #14691
  • [ENHANCEMENT] MQE: Improve per-query memory consumption limit enforcement within aggregation operations. #14735
  • [ENHANCEMENT] Usage-tracker: Improve performance by using a special shard grouping algorithm. #14715
  • [ENHANCEMENT] MQE: Support subset selector elimination for expressions where the subset is given by regex selectors. #14732
  • [ENHANCEMENT] API: activity tracker (if enabled) covers the full request lifecycle and used on all routes. #14777
  • [ENHANCEMENT] MQE: Add metrics for tracking in-flight memory consumption tracking. cortex_querier_inflight_query_max_estimated_memory_consumption_limit_bytes, cortex_querier_inflight_query_current_estimated_memory_consumption_bytes, cortex_querier_inflight_query_peak_estimated_memory_consumption_bytes and cortex_querier_inflight_query_sampled_count. #14807
  • [ENHANCEMENT] Query-frontend: Stream JSON encoding directly to the response body to avoid a full-copy allocation of the serialized payload. #14840
  • [ENHANCEMENT] Activity tracker: Added activity_tracker_unfinished_activities_loaded metric to report the number of unfinished activities detected on startup. #14860
  • [ENHANCEMENT] Distributor now uses record validation time as Kafka record timestamp to reduce rejections among consumers. #14921
  • [ENHANCEMENT] MQE: Add optimisation pass to optimise away expressions that can't produce results such as those containing comparisons with timestamp() due to the query time range or conflicting matchers. #14989 #15014 #15163 #15117
  • [ENHANCEMENT] Distributor: OTLP endpoint now returns partial success (HTTP 200) instead of HTTP 429 when the usage tracker rejects some series due to the active series limit but other series are successfully ingested. The RejectedDataPoints field reports the count of distributor-side rejections (usage tracker filtering). #14789
  • [ENHANCEMENT] MQE: Account for memory consumption of labels returned by binary operations in query memory consumption estimate earlier. #15033
  • [ENHANCEMENT] Query-frontend: Log the number of series and samples returned for queries in query stats log lines. #15044
  • [ENHANCEMENT] Querier and query-frontend: When remote execution is enabled, send series metadata in batches, rather than in a single large message. The batch size can be configured with -query-frontend.remote-execution-series-metadata-batch-size. #15047
  • [ENHANCEMENT] Ingest storage: Update the default configuration to enable ingest storage concurrency: #15072
    • -ingest-storage.kafka.fetch-concurrency-max from 0 to 12
    • -ingest-storage.kafka.ingestion-concurrency-max from 0 to 8
    • -ingest-storage.kafka.ingestion-concurrency-queue-capacity from 5 to 3
    • -ingest-storage.kafka.ingestion-concurrency-target-flushes-per-shard from 80 to 40
    • -ingest-storage.kafka.max-buffered-bytes from 100MB to 1GB
  • [ENHANCEMENT] MQE: Enable narrow selectors optimisation and hints passing for and/unless binary operation. #15096
  • [ENHANCEMENT] MQE: Add support for common subexpression elimination and subset selector elimination of range vector selectors in range queries. Enable with -querier.mimir-query-engine.enable-range-query-range-vector-common-subexpression-elimination=true. #15127
  • [ENHANCEMENT] MQE: Use series selected for one side to reduce data selected on the other side in one-to-many and many-to-one binary operations (eg. group_left and group_right). #15137
  • [ENHANCEMENT] MQE: Reduced per-query memory overhead by no longer holding a reference to the HTTP request for the lifetime of a query. #15251
  • [BUGFIX] Query-frontend: Fixed a memory leak caused that could occur on some error paths if MQE was enabled. #15251
  • [BUGFIX] Alertmanager: Skip empty/zero config. #15184
  • [BUGFIX] Tracing: Respect OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG environment variables in NewOTelFromEnv(). Previously, the sampler was always hardcoded to AlwaysSample() when no Jaeger remote sampler was configured, making it impossible to control trace volume through standard OpenTelemetry configuration. #15128
  • [BUGFIX] API: Scope activity tracking middleware to query routes only, preventing it from rejecting write requests that have an unexpected Content-Type header with HTTP 500. #15129
  • [BUGFIX] Ingester: enforce a minimum 10s delay between TSDB head compaction iterations when an iteration approaches or exceeds the configured -blocks-storage.tsdb.head-compaction-interval, so ingestion is not starved by back-to-back compactions. #15061
  • [BUGFIX] Update to Go v1.25.9. #15030
  • [BUGFIX] Distributor: OTLP partial success responses now correctly populate RejectedDataPoints with the actual count of rejected samples, instead of always reporting 0. In classical architecture, this includes rejected samples propagated from the ingester. #14789
  • [BUGFIX] Distributor: Fix race condition where usage-tracker partition ring may not be initialized before the distributor service starts, causing usage-tracker partition ring is required error on startup. #14675
  • [BUGFIX] Store-gateway: Fix cortex_bucket_store_series_data_touched{data_type="series", stage="returned"} metric observing negative values when series-for-postings cache is hit and pending matchers filter out some series. #14655
  • [BUGFIX] Mimir: Fix false positive in filesystem path overlap detection when one path is a string prefix of another but not an ancestor directory. #14426
  • [BUGFIX] Build: Fixed config descriptor generation to correctly handle custom field types without CLI flags. #14632
  • [BUGFIX] Query-frontend: Fixed blocked queries tests to use production code path instead of bypassing YAML parsing and canonicalization. #14585
  • [BUGFIX] Distributor: Fix ingestion rate limit error message reporting incorrect burst size when ingestion_burst_factor is configured. #14471
  • [BUGFIX] Mimir: Fix nil pointer dereference when -target is set to an empty string. #14381
  • [BUGFIX] API: Fixed web UI links not respecting -server.path-prefix configuration. #14090
  • [BUGFIX] API: Fixed embedded web UI static assets (CSS, JS, images) returning 404 when -server.path-prefix is configured. #15181
  • [BUGFIX] Distributor: Fix issue where distributors didn't send custom values of native histograms. #13849
  • [BUGFIX] Compactor: Fix potential concurrent map writes. #13053
  • [BUGFIX] Query-frontend: Fix issue where queries sometimes fail with failed to receive query result stream message: rpc error: code = Canceled desc = context canceled if remote execution is enabled. #13084
  • [BUGFIX] Query-frontend: Fix issue where query stats, such as series read, did not include the parameters to the histogram_quantile and histogram_fraction functions if remote execution was enabled. #13084
  • [BUGFIX] Query-frontend: Fix issue where requests that are canceled or time out are sometimes cached if remote execution is enabled. #13098
  • [BUGFIX] Querier: Fix issue where errors are logged as "EOF" when sending results to query-frontends in response to remote execution requests fails. #13099 #13121
  • [BUGFIX] Usage-Tracker: Fix underflow in current limit calculation when series >= limit. #13113
  • [BUGFIX] Querier: Fix issue where a problem sending a response to a query-frontend may cause all other responses from the same querier to the same query-frontend to fail or be delayed. #13123
  • [BUGFIX] Ingester: fix index lookup planning with regular expressions which match empty strings on non-existent labels. #13117
  • [BUGFIX] Memberlist: Fix memberlist initialization when Mimir is executed with -target=memberlist-kv. #13129
  • [BUGFIX] Query-frontend: Fix issue where queriers may receive a rpc error: code = Internal desc = cardinality violation: expected <EOF> for non server-streaming RPCs, but received another message error while sending a query result to a query-frontend if remote execution is enabled. #13147
  • [BUGFIX] Querier: Fix issue where cancelled queries may cause a error notifying scheduler about finished query message to be logged. #13186
  • [BUGFIX] Querier: Fix issue where evaluation metrics and logs aren't emitted if remote execution is enabled. #13207
  • [BUGFIX] Query-frontend: Fix issue where queries containing subqueries could fail with slice capacity must be a power of two, but is X if remote execution is enabled. #13211
  • [BUGFIX] Query-frontend: Fix issue where queries containing duplicated shardable expressions would fail with could not materialize query: no registered node materializer for node of type NODE_TYPE_REMOTE_EXEC if running sharding inside MQE is enabled. #13247
  • [BUGFIX] Runtime config: Fix issue when inconsistent map key types (numbers and strings) caused some of the runtime config files silently skipped from loading. #13270
  • [BUGFIX] Store-gateway: Fix how out-of-order blocks are tracked in the cortex_bucket_store_series_blocks_queried metric. #13261
  • [BUGFIX] Cost attribution: Fix panic when metrics are created with invalid labels. #13273
  • [BUGFIX] Distributor: Fix in-flight request counter when the reactive limiter is full. #13406
  • [BUGFIX] Query-frontend: Fix panic when evaluating a sharded avg expression when running sharding inside MQE. #13484
  • [BUGFIX] Query-frontend: Fix incorrect annotation position information when running sharding inside MQE. #13484
  • [BUGFIX] Query-frontend: Fix incorrect query results when evaluating some sharded aggregations with without when running sharding inside MQE. #13484
  • [BUGFIX] Ingester: Panic when push and read reactive limiters are enabled with prioritization. #13482
  • [BUGFIX] Usage-tracker: Prevent tracking requests to be handled by partition handlers that are not in Running state. #13532
  • [BUGFIX] MQE: Fix an issue when applying extra matchers to one side of a binary operation to avoid adding matchers for labels that do not exist. #13499 #13592
  • [BUGFIX] Query-frontend: Fix excessive CPU and memory consumption when running sharding inside MQE. #13580
  • [BUGFIX] Rename cortex_bucket_store_cached_postings_compression_time_seconds, cortex_query_frontend_regexp_matcher_count, and cortex_query_frontend_regexp_matcher_optimized_count to follow naming conventions. #13599
  • [BUGFIX] MQE: Fix issue where the conflicting counter resets during histogram warning could be incorrectly emitted during sharded histogram aggregations. #13623
  • [BUGFIX] Query-frontend: Fix incorrect query results when running sharding inside MQE is enabled and the query contains a subquery eligible for subquery spin-off wrapped in a shardable aggregation. #13619
  • [BUGFIX] Memberlist: Fix occasional nil pointer dereference panics. #13635
  • [BUGFIX] Query-scheduler: Fix issue where queries executed with remote execution could time out rather than fail immediately if the querier evaluating the request crashes after receiving the query from the query-scheduler. #13742
  • [BUGFIX] Query-frontend: Fix silent panic when executing a remote read API request if the request has no matchers. #13745
  • [BUGFIX] Ruler: Fixed -ruler.max-rule-groups-per-tenant-by-namespace to only count rule groups in the specified namespace instead of all namespaces. #13743
  • [BUGFIX] Ruler: Fix parsing of rule expressions with leading newlines. #14947
  • [BUGFIX] Query-frontend: Fix race condition that could sometimes cause unnecessary resharding of queriers if querier shuffle sharding and remote execution is enabled. #13794 #13838
  • [BUGFIX] Query-frontend: Fix step() duration expression returning 1000x larger value. #13920
  • [BUGFIX] Store-gateway: Fix parent-child relationship in LabelNames and LabelValues trace spans. #13932
  • [BUGFIX] MQE: Map remote execution storage errors correctly. #13944
  • [BUGFIX] Ingester: Fix race condition where new partition could reach Active partition ring state for a before its ingester instances reached Active ring state. #14025
  • [BUGFIX] Ingester: Query all ingesters when shuffle sharding is disabled. #14041
  • [BUGFIX] Query-frontend: Fix issue where per-query memory consumption limit is not enforced. #14086
  • [BUGFIX] Ingester: Fix race condition during shutdown where TSDBs could be closed while appends are still in progress. #14094
  • [BUGFIX] Store-gateway: Fix blocks being incorrectly dropped during shutdown when the store-gateway is terminated while fetching an updated bucket index. #14113
  • [BUGFIX] Ingester: Defensive correctness fix for buffer reference counting in pkg/mimirpb. #14108
  • [BUGFIX] Ingester: Add timeouts to wait for instance state on startup and deferred shutdown of tasks on failure. #14134, #14180
  • [BUGFIX] Distributor: Fix duplicate label validation bypass when label value exceeds length limit and is handled by Truncate or Drop strategy. #14131
  • [BUGFIX] Block-builder-scheduler: Fix bug where data could be skipped when partition is fully consumed at startup but later grows. #14136
  • [BUGFIX] Ingester: Create TSDB directory on startup #14112
  • [BUGFIX] Querier: Fix strategy used to select partitions to query when some partions are Inactive since longer than lookback period and shuffle sharding is disabled. #14261
  • [BUGFIX] Block-builder-scheduler: Fix data race when reading partition state during pending jobs enqueueing. #14489
  • [BUGFIX] Querier: Fix issue where queries can time out if remote execution is enabled and sending the initial message from queriers to query-frontends fails. #14557
  • [BUGFIX] Querier: Fix issue where different sharded legs of a query could be evaluated with different lookback deltas if different queriers were configured with different default lookback deltas. #14575
  • [BUGFIX] Query-frontend: Fixed partial cache hit returning incomplete data for native histogram series due to incorrect response ordering before merge. #14612
  • [BUGFIX] Update to Go v1.25.8 to address CVE-2026-27142, CVE-2026-27139, CVE-2026-25679, CVE-2026-27138, CVE-2026-27137. #14623
  • [BUGFIX] Distributor: Fix nil pointer panic in WriteRequest.Unmarshal when receiving a Remote Write 2.0 request with zero timeseries. #14698
  • [BUGFIX] MQE: Fix info() incorrectly dropping inner series with no matching info series when a data label matcher matches the empty string. #14819
  • [BUGFIX] MQE: Fix info() emitting un-enriched series when a data label matcher doesn't match the empty string and the info series is unavailable at some timestamps. #14812
  • [BUGFIX] MQE: Fix and/unless functions to not pass matchers to RHS as it can result in incorrect filtering. #14902
  • [BUGFIX] MQE: Fix internal error when executing a subquery with delayed name removal enabled. #14946
  • [BUGFIX] Alertmanager: Fix deadlock when trying to broadcast after stopping a tenant #14922
  • [BUGFIX] Query-frontend: Fix max total query length limit (-query-frontend.max-total-query-length) not being enforced on instant queries with subqueries or range selectors. #14985
  • [BUGFIX] Compactor: Fix potential goroutine leak when compaction iteration exits early due to errors. #13420
  • [BUGFIX] Query-frontend: Fix bugs with matcher propagation for binary operations where it was not being properly applied within nested expressions and also wrongly propagating internal label matchers. #15110
  • [BUGFIX] Distributor: Cancel DoUntilQuorum in cardinality analysis API when active_series_results_max_size_bytes is breached. #15177
  • [BUGFIX] MQE: Fix issue where queries with step-invariant range vector expressions (eg. quantile_over_time(scalar(arg), metric[5m] @ 1000)) could return incorrect results. #15192
  • [BUGFIX] MQE: Fix info() function not enriching series when inner series are missing one identifying label (instance/job) but matching info series exist. #14832
  • [BUGFIX] MQE: Fix info() function only retaining one matcher when multiple data label matchers target the same label name. #14832
  • [BUGFIX] MQE: Fix info() function silently overwriting conflicting labels from different info metrics instead of returning an error. #14832
  • [BUGFIX] MQE: Fix info() function incorrectly grouping labels from replaced info series at the same evaluation timestamp due to lookback. #14832

Mixin

  • [CHANGE] Alerts and rules: Replaced _config.base_alerts_range_interval_minutes and _config.recording_rules_range_interval with _config.scrape_interval (default 15s). Instead of configuring a pre-multiplied number of minutes, configure your actual Prometheus scrape interval and the mixin will compute safe rate-function windows automatically (at least 4× the scrape interval). #15174 #15176
  • [CHANGE] Dashboards: Add configuration option dashboards_default_latency_mode to control the default value of the native/classic latency variable (uses 'classic' if unset). #14424
  • [CHANGE] Alerts: Renamed the following alerts to fit within 40 characters: #13363
    • MimirAlertmanagerPartialStateMergeFailingMimirAlertmanagerStateMergeFailing
    • MimirServerInvalidClusterValidationLabelRequestsMimirServerInvalidClusterLabelRequests
    • MimirClientInvalidClusterValidationLabelRequestsMimirClientInvalidClusterLabelRequests
    • MimirHighGRPCConcurrentStreamsPerConnectionMimirHighGRPCStreamsPerConnection
    • MimirDistributorReachingInflightPushRequestLimitMimirDistributorInflightRequestsHigh
    • MimirIngesterHasNotShippedBlocksMimirIngesterNotShippingBlocks
    • MimirIngesterHasNotShippedBlocksSinceStartMimirIngesterNotShippingBlocksSinceStart
    • MimirIngesterTSDBCheckpointCreationFailedMimirIngesterTSDBCheckpointCreateFailed
    • MimirIngesterTSDBCheckpointDeletionFailedMimirIngesterTSDBCheckpointDeleteFailed
    • MimirCompactorHasNotSuccessfullyCleanedUpBlocksMimirCompactorNotCleaningUpBlocks
    • MimirCompactorHasNotSuccessfullyRunCompactionMimirCompactorNotRunningCompaction
    • MimirCompactorFailingToBuildSparseIndexHeadersMimirCompactorBuildingSparseIndexFailed
    • MimirIngesterLastConsumedOffsetCommitFailedMimirIngesterOffsetCommitFailed
    • MimirIngesterFailedToReadRecordsFromKafkaMimirIngesterKafkaReadFailed
    • MimirStartingIngesterKafkaReceiveDelayIncreasingMimirStartingIngesterKafkaDelayGrowing
    • MimirIngesterFailsToProcessRecordsFromKafkaMimirIngesterKafkaProcessingFailed
    • MimirIngesterStuckProcessingRecordsFromKafkaMimirIngesterKafkaProcessingStuck
    • MimirStrongConsistencyOffsetNotPropagatedToIngestersMimirStrongConsistencyOffsetMissing
    • MimirKafkaClientBufferedProduceBytesTooHighMimirKafkaClientProduceBufferHigh
  • [CHANGE] Alerts: Replaced MimirCompactorSkippedUnhealthyBlocks with more generic MimirCompactorSkippedBlocks. #13876
  • [CHANGE] Dashboards: replace usage of container_spec_cpu_quota / container_spec_cpu_period with kube_pod_container_resource_limits for calculation of CPU limits. #14425
  • [CHANGE] Dashboards: The queries used in latency panels no longer convert seconds to milliseconds. The dashboard panels now use "seconds" unit instead of "milliseconds". #14896
  • [ENHANCEMENT] Dashboards: Group compactor compaction-related panels into a single collapsible "Compaction" row. #14784
  • [ENHANCEMENT] Dashboards: Merge CPU and memory panels in the "Compactor resources" dashboard into a single collapsible row. #14866
  • [ENHANCEMENT] Alerts: Add more native histogram versions of alerts using classic histograms. #13814
  • [ENHANCEMENT] Alerts: Improve MimirCompactorNotRunningCompaction alert to be restart-resistant. Added warning severity alerts for early detection (6h threshold) and lowered the since-startup critical duration from 24h to 12h. #14282
  • [ENHANCEMENT] Dashboards: Support native histograms in the Alertmanager, Compactor, Queries, Rollout operator, Reads, RemoteRuler-Reads, Ruler, and Writes dashboards. #13556 #13621 #13629 #13673 #13690 #13678 #13633 #13672
  • [ENHANCEMENT] Alerts: Add MimirFewerIngestersConsumingThanActivePartitions alert. #13159
  • [ENHANCEMENT] Querier and query-frontend: Add alerts for querier ring, which is used when performing query planning in query-frontends and distributing portions of the plan to queriers for execution. #13165
  • [ENHANCEMENT] Alerts: Add MimirBlockBuilderSchedulerNotRunning alert. #13208
  • [ENHANCEMENT] Alerts: Add MimirBlockBuilderPersistentJobFailure alert. #13278
  • [ENHANCEMENT] Dashboards: Update default regular expressions to match multi-zone deployments for query-frontend, querier, distributor and ruler. #13200
  • [ENHANCEMENT] Alerts: Update MimirHighVolumeLevel1BlocksQueried alert to fire on a percentage of the level 1 blocks queried. #13229
  • [ENHANCEMENT] Dashboards: Plot OMMKilled events in the workingset memory panels of resources dashboards. #13377
  • [ENHANCEMENT] Dashboards: Add variable to compactor and object store dashboards to switch between classic and native latencies. Use native histogram thanos_objstore_bucket_operation_duration_seconds. #12137
  • [ENHANCEMENT] Recording rules: Add native histogram version of histogram recording rules. #13553
  • [ENHANCEMENT] Alerts: Add MimirMemberlistBridgeZoneUnavailable alert. #13647
  • [ENHANCEMENT] Alerts: Add MimirMemberlistZoneAwareRoutingAutoFailover alert that fires when memberlist zone-aware routing auto-failover triggers due to missing memberlist bridges. #13726
  • [ENHANCEMENT] Dashboards and recording rules: Add usage-tracker rows to writes, writes-networking, writes-resources dashboards if the config.usage_tracker_enabled var is set. Add usage-tracker client latency recording rules. #13639 #13652 #14865
  • [ENHANCEMENT] Recording rules and dashboards: Add stage label to cortex_ingester_queried_series recording rules and filter Queries dashboard "Series per query" panel to show only stage=merged_blocks. #13666
  • [ENHANCEMENT] Dashboards: Add "Owned series" and "Active series" panels to the writes dashboard Headlines row. #13895
  • [ENHANCEMENT] Alerts: Add IncorrectWebhookConfigurationFailurePolicy, BadZoneAwarePodDisruptionBudgetConfiguration and HighNumberInflightZpdbRequests rollout-operator alerts. #13840
  • [ENHANCEMENT] Dashboards: Add additional panels to the rollout-operator dashboard related to the zone aware pod disruption budget controller. #13840
  • [ENHANCEMENT] Dashboards: Sort tooltips in descending order to show main contributors to spike or query. #13827
  • [ENHANCEMENT] Dashboards: Add "By store-gateway disk utilization" panel to the Top Tenants dashboard showing per-tenant disk usage and their shard size. #13917
  • [ENHANCEMENT] Dashboards: Add panels showing the distribution of estimated query memory consumption and rate of fallback to Prometheus' query engine in query-frontends to the Queries dashboard. #14029
  • [ENHANCEMENT] Dashboards: Add "Forced TSDB head compactions in progress" panel to "Mimir / Writes" dashboard. #14248
  • [ENHANCEMENT] Dashboards: Improve "Last successful run per-compactor replica" table in the compactor dashboard to show time since process start for compactors that haven't completed their first run yet. #14285
  • [ENHANCEMENT] Alerts: Add MimirUsageTrackerSnapshotUploadFailing and MimirUsageTrackerSnapshotDownloadFailing alerts to detect usage-tracker snapshot upload/download failures. #14778
  • [ENHANCEMENT] Alerts: Add dashboard_url annotations to Prometheus alerts. #14458
  • [ENHANCEMENT] Dashboards: Change the "Rules" panel in the "Mimir / Reads resources" dashboard to use a stacked visualization. #14707
  • [ENHANCEMENT] Dashboards: Split the "All series" panel in the Tenants dashboard into "Active series" and "Owned & in-memory series" panels, and added the active series limit. #14648 #14771
  • [ENHANCEMENT] Dashboards: Add "In memory series" panel to experimental "Mimir / Block-builder" dashboard. #14700
  • [ENHANCEMENT] Dashboards: Unify object store rows into a single collapsible row across Alertmanager, Compactor, Reads, and Ruler dashboards. #14850
  • [ENHANCEMENT] Alerts: Make MimirInconsistentRuntimeConfig alert less flaky when performing multiple configuration changes in a row in a large Kubernetes cluster. #14743 #14933 #15051 #15257
  • [ENHANCEMENT] Alerts: Suppress MimirRingMembersMismatch alert during ingester rollouts. The alert now uses an unless clause to avoid false positives when ingester statefulsets are being updated. #14895
  • [ENHANCEMENT] Recording rules: add a low-cardinality recorded version of usage_tracker_active_series. #14901
  • [ENHANCEMENT] Alerts: Fix MimirSchedulerQueriesStuck false positives by only looking for cases where the number of enqueued queries doesn't decrease. #14943 #15193
  • [ENHANCEMENT] Dashboards: Add ephemeral storage panels to "Resources" dashboards. #14999
  • [ENHANCEMENT] Dashboards: Add disk utilization panels to experimental Block-builder dashboard. #15029
  • [BUGFIX] Dashboards: Fix compactor dashboard to exclude instances without the last successful run metric in the "Last successful run per-compactor replica" table. #14784
  • [BUGFIX] Dashboards: Fix issue where throughput dashboard panels would group all gRPC requests that resulted in a status containing an underscore into one series with no name. #13184
  • [BUGFIX] Dashboards: Filter out 0s from max_series limit on Writes Resources > Ingester > In-memory series panel. #13419
  • [BUGFIX] Dashboards: Fix issue where the "Tenant gateway requests" panels on Tenants dashboard would show data from all components. #13940
  • [BUGFIX] Dashboards: Fix issue where the MQE-related dashboard panels on the Queries dashboard would show data from both queriers and query-frontends, instead of just queriers. #14029
  • [BUGFIX] Alerts: Fix alert definitions with short range vector selectors that did not respect the configured base_alerts_range_interval_minutes. #15083
  • [BUGFIX] Dashboards: Fix mixin build failure when singleBinary is true. #15108
  • [BUGFIX] Alerts: Fix alert for store-gateway object storage operation failures to alert based on percentage of failed operations, not raw number of them. #15196

Jsonnet

  • [CHANGE] Renamed the following configuration parameters to add the _per_zone suffix, to better reflect that these values apply per zone in multi-zone deployments: #13632
    • autoscaling_querier_min_replicasautoscaling_querier_min_replicas_per_zone
    • autoscaling_querier_max_replicasautoscaling_querier_max_replicas_per_zone
    • autoscaling_query_frontend_min_replicasautoscaling_query_frontend_min_replicas_per_zone
    • autoscaling_query_frontend_max_replicasautoscaling_query_frontend_max_replicas_per_zone
    • autoscaling_ruler_min_replicasautoscaling_ruler_min_replicas_per_zone
    • autoscaling_ruler_max_replicasautoscaling_ruler_max_replicas_per_zone
    • autoscaling_ruler_querier_min_replicasautoscaling_ruler_querier_min_replicas_per_zone
    • autoscaling_ruler_querier_max_replicasautoscaling_ruler_querier_max_replicas_per_zone
    • autoscaling_ruler_query_frontend_min_replicasautoscaling_ruler_query_frontend_min_replicas_per_zone
    • autoscaling_ruler_query_frontend_max_replicasautoscaling_ruler_query_frontend_max_replicas_per_zone
  • [CHANGE] Store-gateway: The store-gateway disk class now honors the one configured via $._config.store_gateway_data_disk_class and doesn't replace fast with fast-dont-retain. #13152
  • [CHANGE] Rollout-operator: Vendor jsonnet from rollout-operator repository. #13245 #13317 #13793 #13799 #13840 #14240 #14463 #14854 #14900
  • [CHANGE] Ruler: Set default memory ballast to 1GiB to reduce GC pressure during startup. #13376
  • [CHANGE] Zone pod disruption budget: Remove multi_zone_zpdb_enabled and replace it with multi_zone_ingester_zpdb_enabled and multi_zone_store_gateway_zpdb_enabled to allow to selectively enable the zone pod disruption budget on a per-component basis. #13813
  • [CHANGE] Reduced dynamic replication factor when running store-gateways with replication factor set to a value higher than 3. #14304
  • [CHANGE] Disable ingester ring tokens by default when ingest storage architecture is enabled. #14613
  • [CHANGE] Querier: Set ignoreNullValues to false by default for KEDA ScaledObject to prevent autoscaling down when there is no data returned from scaling metrics. #14641
  • [CHANGE] Ingester: Change default ingestion concurrency configuration used by ingest storage architecture, to maximize throughput when consuming records from Kafka. #14668
  • [CHANGE] Memberlist: when the multi-zone memberlist bridge is enabled (multi_zone_memberlist_bridge_enabled), Mimir components now use memberlist-bridge pods as seed nodes by default, instead of the shared gossip ring service. This reduces inter-AZ data transfer. The new memberlist_bridge_seed_nodes_enabled configuration option can be used to disable this behavior. #14994
  • [CHANGE] Ruler remote evaluation: Split the ruler-query-frontend service into a ClusterIP service (for HTTP load balancing) and a headless service (for gRPC client-side load balancing by rulers). The ruler now connects to the headless service. #15001
  • [CHANGE] Memberlist bridge: Changed default value of memberlist_bridge_replicas_per_zone from 2 to 3. #14667
  • [FEATURE] Add multi-zone support for read path components (memcached, querier, query-frontend, query-scheduler, ruler, and ruler remote evaluation stack). Add multi-AZ support for ingester and store-gateway multi-zone deployments. Add memberlist-bridge support for zone-aware memberlist routing. #13559 #13628 #13636 #13915 #14260 #14301
  • [FEATURE] Add deletion protection support for ingesters and store-gateways StatefulSet. It can be enabled by setting ingester_deletion_protection_enabled and store_gateway_deletion_protection_enabled in the _config block. #13819
  • [FEATURE] Shuffle sharding: Add the following configuration options to enable the experimental per-zone store-gateway shard size: #13908 #13941
    • $._config.shuffle_sharding.store_gateway_shard_size_per_zone_enabled
    • $._config.shuffle_sharding.store_gateway_shard_size_per_zone_defaults_enabled (takes precedence over store_gateway_shard_size_per_zone_enabled)
    • $._config.shuffle_sharding.store_gateway_shard_size_per_zone_overrides_enabled (takes precedence over store_gateway_shard_size_per_zone_enabled)
  • [FEATURE] Ruler: Add $._config.multi_zone_ruler_balanced_autoscaling_enabled option to ensure equally balanced replica counts across ruler zones in multi-AZ deployments by using aggregate metrics for autoscaling. #14198
  • [FEATURE] Add query_engine_range_vector_splitting_enabled configuration option to enable experimental range vector splitting with memcached cache. #14435
  • [FEATURE] Store-gateway: Add the ability to autoscale store-gateways based on disk usage when automated downscale is enabled. #15019
    • $._config.autoscaling_store_gateway_enabled
    • $._config.autoscaling_store_gateway_disk_usage_threshold
    • $._config.autoscaling_store_gateway_min_replicas_per_zone
    • $._config.autoscaling_store_gateway_max_replicas_per_zone
  • [ENHANCEMENT] Ruler querier and query-frontend: Add support for newly-introduced querier ring, which is used when performing query planning in query-frontends and distributing portions of the plan to queriers for execution. #13017
  • [ENHANCEMENT] Ingester: Increase $._config.ingester_tsdb_head_early_compaction_min_in_memory_series default when Mimir is running with the ingest storage architecture. #13450
  • [ENHANCEMENT] Memberlist bridge: Add memberlist_bridge_replicas_per_zone configuration option (default: 2). #13727
  • [ENHANCEMENT] Update the list of OTel resource attributes used for tracing. #13469
  • [ENHANCEMENT] Ingester: Set -ingester.partition-ring.delete-inactive-partition-after based on -querier.query-ingesters-within. #13550
  • [ENHANCEMENT] Add extra, experimental, KEDA ScaledObject trigger to prevent from down-scaling during OOM kills, if memory trigger is disabled and $._config.autoscaling_oom_protection_enabled is true. #13509
  • [ENHANCEMENT] Multi-zone: Make config validation exclusions configurable via multi_zone_config_validation_excluded_args and multi_zone_config_validation_excluded_env_vars, and add validation for multi-zone distributor deployments. #13728
  • [ENHANCEMENT] Overrides-exporter: Include query configuration so that query limit defaults are reported accurately. #13850
  • [ENHANCEMENT] Expose pod termination grace period for alertmanagers, ingesters, query-frontends, rulers and store-gateways. #13852
  • [ENHANCEMENT] Store-gateways configured in multi-zone deployment will only scale up once the preceding zones replicas are all ready. #13879
  • [ENHANCEMENT] Multi-zone: Add config options to enable multi-zone (virtual zones) and multi-AZ deployments for all write and read path components respectively: #13906
    • multi_zone_write_path_enabled
    • multi_zone_read_path_enabled
    • multi_zone_read_path_multi_az_enabled
  • [ENHANCEMENT] Overrides-exporter: Add overrides_exporter_exported_limits config option to specify the limits exposed by the exporter. The default list of limits has not been changed compared to the previous version. #13912
  • [ENHANCEMENT] Ingester: Add ingester_priority_class config option to customise the ingester priority class. By default no explicit priority class is configured, and the Kubernetes default class is used. #14093
  • [ENHANCEMENT] Store-gateway: Add config options to enable store-gateway multi-AZ deployments on a per-zone basis. #14111
    • multi_zone_store_gateway_zone_a_multi_az_enabled
    • multi_zone_store_gateway_zone_b_multi_az_enabled
    • multi_zone_store_gateway_zone_c_multi_az_enabled
  • [ENHANCEMENT] Querier: Add autoscaling_querier_ignore_null_values option to set KEDA ignoreNullValues for querier autoscaling metrics. #14101
  • [ENHANCEMENT] Multi-zone: Add config validation for -querier.prefer-availability-zones flag on querier and ruler-querier deployments. #14539
  • [ENHANCEMENT] Distributor: render the experimental -distributor.max-active-series-per-user flag on distributor if $._config.limits.max_active_series_per_user is set. #14636
  • [ENHANCEMENT] Ingester: Add $._config.ingest_storage_set_client_rack to pass -ingest-storage.kafka.client-rack when zone-aware replication is enabled. #14654
  • [ENHANCEMENT] Ingester: Add $._config.multi_zone_ingester_multi_az_zone_(a|b|c)_enabled to simplify migrations not using a temporary zone-c. #15000
  • [BUGFIX] Ingester: Fix $._config.ingest_storage_ingester_autoscaling_max_owned_series_threshold default value, to compute it based on the configured $._config.ingester_instance_limits.max_series. #13448

Documentation

  • [ENHANCEMENT] Runbook: Add section on "Ring Failures" to MimirCompactorNotRunningCompaction runbook. #14391
  • [ENHANCEMENT] Add Azure object store workload identity example configuration. #13135
  • [ENHANCEMENT] Ruler: clarify that internal distributor applies to both operational modes. #13300
  • [ENHANCEMENT] Native histograms: Set expectations on querying classic histograms versus NHCBs. #13689
  • [ENHANCEMENT] Add a scenario to the MimirCompactorNotRunningCompaction runbook. #13874
  • [ENHANCEMENT] Document how ingesters calculate partition ID from ring's instance ID in ingest storage. #13903
  • [ENHANCEMENT] Add AWS profile authentication example to mark-blocks tool documentation and add centralized section in runbooks with examples for all cloud providers. #14281
  • [BUGFIX] Distributor: Fix type error in multi-zone distributor container constructor's env map. #14403
  • [BUGFIX] Native histograms: Fix PromQL query example for histogram_fraction to filter NaN results when there are no observations. #14433
  • [BUGFIX] OTLP: Exponential histograms over OTLP are not experimental. #14437
  • [ENHANCEMENT] Kafka: Document that Apache Kafka and Confluent Kafka require message.max.bytes=16000000 to support Mimir's default producer record size. #14875

Tools

  • [FEATURE] mimir-tool: Add validate alerts-file command that performs checks on alert files defined as YAML. #14043
  • [FEATURE] mimir-tool: Add partition-ring add-partition and partition-ring remove-partition commands. #14265
  • [FEATURE] mimir-tool: Add partition-ring add-owner and partition-ring remove-owner commands. #14462
  • [FEATURE] tsdb-index-header: Add tool to inspect the content of a block's index or index-header. #13738 #14279 #14944
  • [FEATURE] tsdb-chunks, tsdb-print-chunk: When printing samples, include the start time (ST) in the output. #14337
  • [FEATURE] kafkatool: Add create-topic command to create a Kafka topic with a specified number of partitions. #14639
  • [FEATURE] kafkatool: Add list-topics command to list all Kafka topics and their partition counts. #14639
  • [ENHANCEMENT] mimir-tool: Add __ignore_usage__="" label selector to queries used in analyze prometheus command, so that Adaptive Metrics' recommendations service ignores them. #14474
  • [ENHANCEMENT] mimir-tool: Add TLS client flags (--tls-ca-path, --tls-cert-path, --tls-key-path, --tls-server-name, --tls-insecure-skip-verify) to the remote-read subcommands so they can talk to an endpoint protected by mTLS or a private CA. #15132
  • [ENHANCEMENT] copyblocks: Support resolving S3 credentials from the environment (IAM roles for service accounts, ECS task roles, and EC2 instance metadata) when -s3.<source|destination>.access-key-id and -s3.<source|destination>.secret-access-key are omitted. #15075
  • [BUGFIX] mimir-tool-action: Fix base image of the Github action. #13303
  • [BUGFIX] mimir-tool: do not fail on $latency_metrics dashboard variable, documented for native histograms migrations. #13526
  • [BUGFIX] kafkatool: Fix kafkatool dump print to support RW2 records. #13848
  • [BUGFIX] mimir-tool-action: Fix special character handling in NAMESPACES input #14247

Query-tee

  • [CHANGE] Added /api/v1/read as a registered route. #13227
  • [CHANGE] Added cluster validation label configuration -query-tee.client-cluster-validation.label. If set, query-tee will set X-Cluster header before forwarding the request to both primary and secondary backends. #13302
  • [CHANGE] Make HTTP and gRPC server options configurable through the same dskit server flags and config block as Mimir. This begins the deprecation cycle for query-tee's server.http-service-address, server.http-service-port, "server.grpc-service-address, and server.grpc-service-port flags. #13328 #13355 #13360
  • [ENHANCEMENT] Add /ready endpoint that returns HTTP 200 when the proxy is running. #14478
  • [BUGFIX] Fix bug where query-tee can panic if forwarding a request fails. #14015

All changes in this release: mimir-3.0.6...mimir-3.1.0-rc.0

Don't miss a new mimir release

NewReleases is sending notifications on new releases.