This release contains 177 PRs from 43 authors, including new contributors Bartosz Cisek, dggmsa, gmintoco, Ihor Urazov, James Ross, Jean-Philippe Quéméner, Jon Gutschon, l3ioo, lpugoy, Nicolás Pazos, Oscar, Reto Kupferschmid, ying-jeanne. Thank you!
Grafana Mimir version 2.7.1 release notes
Grafana Labs is excited to announce version 2.7.1 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Note: During the release process, version 2.7.0 was tagged too early, before completing the release checklist and production testing. Release 2.7.1 doesn't include any code changes since 2.7.0, but now has proper release notes, published documentation, and has been fully tested in our production environment.
Features and enhancements
- Store-gateway streaming enabled by default The new default value of
5000
for-blocks-storage.bucket-store.batch-series-size
enables store-gateway streaming in the default configuration. This means that series are loaded from object storage in batches rather than buffering them all in memory before returning to the querier. Enabling streaming can reduce memory utilization peaks in the store-gateway. - Store-gateway index header reader no longer uses mmap by default Along with streaming enabled in the store-gateway, this change contributes to more efficient memory usage. See the Important changes section for more details.
- Support for
keep_firing_for
option to ruler configuration This new option determines the amount of time an alert should keep firing while the ruler expression doesn't return results. - More efficient chunks fetching and caching Enable with the new experimental feature flag
-blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled=true
. This should reduce CPU, memory utilization, and receive bandwidth of a store-gateway. - Experimental query sharding improvements:
A new configuration parameter,-query-frontend.query-sharding-target-series-per-shard
, allows query sharding to take into account cardinality of similar requests executed previously when computing the maximum number of shards to use. If you want to try it out, we recommend starting with a value of2500
. - Experimental support for native histogram ingestion:
Native histograms can now be ingested. The new per-tenant limit-ingester.native-histograms-ingestion-enabled
controls whether native histograms are stored or ignored. The support for querying native histograms is not complete yet and it's expected to be available in the next release.
Alertmanager improvements
- New metrics The following upstream metrics are now exposed:
cortex_alertmanager_dispatcher_aggregation_groups
cortex_alertmanager_dispatcher_alert_processing_duration_seconds
Helm chart improvements
The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
Important changes
In Grafana Mimir 2.7, the default vaules of the following configuration options have changed:
-blocks-storage.bucket-store.batch-series-size
is now enabled by default with a value of5000
.-ruler.evaluation-delay-duration
has changed from0
to1m
.
In Grafana Mimir 2.7, the following configuration options are now deprecated:
-blocks-storage.bucket-store.chunks-cache.subrange-size
since there's no benefit to changing the default of16000
-blocks-storage.bucket-store.consistency-delay
has been deprecated and will be removed in Mimir 2.9.-compactor.consistency-delay
has been deprecated and will be removed in Mimir 2.9.-ingester.ring.readiness-check-ring-health
has been deprecated and will be removed in Mimir 2.9.
In Grafana Mimir 2.7, the following options, metrics, and labels have been removed:
- Experimental support for ephemeral storage introduced in Mimir 2.6.0 has been removed.
- Following options are no longer available:
-blocks-storage.ephemeral-tsdb.*
-distributor.ephemeral-series-enabled
-distributor.ephemeral-series-matchers
-ingester.max-ephemeral-series-per-user
-ingester.instance-limits.max-ephemeral-series
- The following metrics have been removed:
cortex_ingester_ephemeral_series
cortex_ingester_ephemeral_series_created_total
cortex_ingester_ephemeral_series_removed_total
cortex_ingester_ingested_ephemeral_samples_total
cortex_ingester_ingested_ephemeral_samples_failures_total
cortex_ingester_memory_ephemeral_users
cortex_ingester_queries_ephemeral_total
cortex_ingester_queried_ephemeral_samples
cortex_ingester_queried_ephemeral_series
- Additionally, querying using the
{__mimir_storage__="ephemeral"}
selector no longer works. All label values with theephemeral-
prefix within thereason
label of thecortex_discarded_samples_total
metric are no longer available.
- Following options are no longer available:
- The store-gateway default index header reader no longer uses mmap and the mmap-based index header reader has been removed. The following flags have been changed:
-blocks-storage.bucket-store.index-header.map-populate-enabled
has been removed-blocks-storage.bucket-store.index-header.stream-reader-enabled
has been removed-blocks-storage.bucket-store.index-header.stream-reader-max-idle-file-handles
has been renamed to-blocks-storage.bucket-store.index-header.max-idle-file-handles
, and the corresponding configuration file option has been renamed fromstream_reader_max_idle_file_handles
tomax_idle_file_handles
Bug fixes
- Store-gateway: return Canceled rather than Aborted or Internal error when the calling querier cancels a label names or values request, and return Internal if processing the request fails for another reason. PR 4061
- Querier: track canceled requests with status code 499 in the metrics instead of 503 or 422. PR 4099
- Ingester: compact out-of-order data during /ingester/flush or when TSDB is idle. PR 4180
- Ingester: conversion of global limits max-series-per-user, max-series-per-metric, max-metadata-per-user and max-metadata-per-metric into corresponding local limits now takes into account the number of ingesters in each zone. PR 4238
- Ingester: track cortex_ingester_memory_series metric consistently with cortex_ingester_memory_series_created_total and cortex_ingester_memory_series_removed_total. PR 4312
- Querier: fixed a bug which was incorrectly matching series with regular expression label matchers with begin/end anchors in the middle of the regular expression. PR 4340
Changelog
2.7.1
Grafana Mimir
- [CHANGE] Ingester: the configuration parameter
-ingester.ring.readiness-check-ring-health
has been deprecated and will be removed in Mimir 2.9. #4422 - [CHANGE] Ruler: changed default value of
-ruler.evaluation-delay-duration
option from 0 to 1m. #4250 - [CHANGE] Querier: Errors with status code
422
coming from the store-gateway are propagated and not converted to the consistency check error anymore. #4100 - [CHANGE] Store-gateway: When a query hits
max_fetched_chunks_per_query
andmax_fetched_series_per_query
limits, an error with the status code422
is created and returned. #4056 - [CHANGE] Packaging: Migrate FPM packaging solution to NFPM. Rationalize packages dependencies and add package for all binaries. #3911
- [CHANGE] Store-gateway: Deprecate flag
-blocks-storage.bucket-store.chunks-cache.subrange-size
since there's no benefit to changing the default of16000
. #4135 - [CHANGE] Experimental support for ephemeral storage introduced in Mimir 2.6.0 has been removed. Following options are no longer available: #4252
-blocks-storage.ephemeral-tsdb.*
-distributor.ephemeral-series-enabled
-distributor.ephemeral-series-matchers
-ingester.max-ephemeral-series-per-user
-ingester.instance-limits.max-ephemeral-series
Querying with using{__mimir_storage__="ephemeral"}
selector no longer works. All label values withephemeral-
prefix inreason
label ofcortex_discarded_samples_total
metric are no longer available. Following metrics have been removed:cortex_ingester_ephemeral_series
cortex_ingester_ephemeral_series_created_total
cortex_ingester_ephemeral_series_removed_total
cortex_ingester_ingested_ephemeral_samples_total
cortex_ingester_ingested_ephemeral_samples_failures_total
cortex_ingester_memory_ephemeral_users
cortex_ingester_queries_ephemeral_total
cortex_ingester_queried_ephemeral_samples
cortex_ingester_queried_ephemeral_series
- [CHANGE] Store-gateway: use mmap-less index-header reader by default and remove mmap-based index header reader. The following flags have changed: #4280
-blocks-storage.bucket-store.index-header.map-populate-enabled
has been removed-blocks-storage.bucket-store.index-header.stream-reader-enabled
has been removed-blocks-storage.bucket-store.index-header.stream-reader-max-idle-file-handles
has been renamed to-blocks-storage.bucket-store.index-header.max-idle-file-handles
, and the corresponding configuration file option has been renamed fromstream_reader_max_idle_file_handles
tomax_idle_file_handles
- [CHANGE] Store-gateway: the streaming store-gateway is now enabled by default. The new default setting for
-blocks-storage.bucket-store.batch-series-size
is5000
. #4330 - [CHANGE] Compactor: the configuration parameter
-compactor.consistency-delay
has been deprecated and will be removed in Mimir 2.9. #4409 - [CHANGE] Store-gateway: the configuration parameter
-blocks-storage.bucket-store.consistency-delay
has been deprecated and will be removed in Mimir 2.9. #4409 - [FEATURE] Ruler: added
keep_firing_for
support to alerting rules. #4099 - [FEATURE] Distributor, ingester: ingestion of native histograms. The new per-tenant limit
-ingester.native-histograms-ingestion-enabled
controls whether native histograms are stored or ignored. #4159 - [FEATURE] Query-frontend: Introduce experimental
-query-frontend.query-sharding-target-series-per-shard
to allow query sharding to take into account cardinality of similar requests executed previously. This feature uses the same cache that's used for results caching. #4121 #4177 #4188 #4254 - [ENHANCEMENT] Go: update go to 1.20.1. #4266
- [ENHANCEMENT] Ingester: added
out_of_order_blocks_external_label_enabled
shipper option to label out-of-order blocks before shipping them to cloud storage. #4182 #4297 - [ENHANCEMENT] Ruler: introduced concurrency when loading per-tenant rules configuration. This improvement is expected to speed up the ruler start up time in a Mimir cluster with a large number of tenants. #4258
- [ENHANCEMENT] Compactor: Add
reason
label tocortex_compactor_runs_failed_total
. The value can beshutdown
orerror
. #4012 - [ENHANCEMENT] Store-gateway: enforce
max_fetched_series_per_query
. #4056 - [ENHANCEMENT] Query-frontend: Disambiguate logs for failed queries. #4067
- [ENHANCEMENT] Query-frontend: log caller user agent in query stats logs. #4093
- [ENHANCEMENT] Store-gateway: add
data_type
label with values oncortex_bucket_store_partitioner_extended_ranges_total
,cortex_bucket_store_partitioner_expanded_ranges_total
,cortex_bucket_store_partitioner_requested_ranges_total
,cortex_bucket_store_partitioner_expanded_bytes_total
,cortex_bucket_store_partitioner_requested_bytes_total
forpostings
,series
, andchunks
. #4095 - [ENHANCEMENT] Store-gateway: Reduce memory allocation rate when loading TSDB chunks from Memcached. #4074
- [ENHANCEMENT] Query-frontend: track
cortex_frontend_query_response_codec_duration_seconds
andcortex_frontend_query_response_codec_payload_bytes
metrics to measure the time taken and bytes read / written while encoding and decoding query result payloads. #4110 - [ENHANCEMENT] Alertmanager: expose additional upstream metrics
cortex_alertmanager_dispatcher_aggregation_groups
,cortex_alertmanager_dispatcher_alert_processing_duration_seconds
. #4151 - [ENHANCEMENT] Querier and query-frontend: add experimental, more performant protobuf internal query result response format enabled with
-query-frontend.query-result-response-format=protobuf
. #4153 - [ENHANCEMENT] Store-gateway: use more efficient chunks fetching and caching. This should reduce CPU, memory utilization, and receive bandwidth of a store-gateway. Enable with
-blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled=true
. #4163 #4174 #4227 - [ENHANCEMENT] Query-frontend: Wait for in-flight queries to finish before shutting down. #4073 #4170
- [ENHANCEMENT] Store-gateway: added
encode
andother
stage tocortex_bucket_store_series_request_stage_duration_seconds
metric. #4179 - [ENHANCEMENT] Ingester: log state of TSDB when shipping or forced compaction can't be done due to unexpected state of TSDB. #4211
- [ENHANCEMENT] Update Docker base images from
alpine:3.17.1
toalpine:3.17.2
. #4240 - [ENHANCEMENT] Store-gateway: add a
stage
label to the metricscortex_bucket_store_series_data_fetched
,cortex_bucket_store_series_data_size_fetched_bytes
,cortex_bucket_store_series_data_touched
,cortex_bucket_store_series_data_size_touched_bytes
. This label only applies todata_type="chunks"
. Forfetched
metrics withdata_type="chunks"
thestage
label has 2 values:fetched
- the chunks or bytes that were fetched from the cache or the object store,refetched
- the chunks or bytes that had to be refetched from the cache or the object store because their size was underestimated during the first fetch. Fortouched
metrics withdata_type="chunks"
thestage
label has 2 values:processed
- the chunks or bytes that were read from the fetched chunks or bytes and were processed in memory,returned
- the chunks or bytes that were selected from the processed bytes to satisfy the query. #4227 #4316 - [ENHANCEMENT] Compactor: improve the partial block check related to
compactor.partial-block-deletion-delay
to potentially issue less requests to object storage. #4246 - [ENHANCEMENT] Memcached: added
-*.memcached.min-idle-connections-headroom-percentage
support to configure the minimum number of idle connections to keep open as a percentage (0-100) of the number of recently used idle connections. This feature is disabled when set to a negative value (default), which means idle connections are kept open indefinitely. #4249 - [ENHANCEMENT] Querier and store-gateway: optimized regular expression label matchers with case insensitive alternate operator. #4340 #4357
- [ENHANCEMENT] Compactor: added the experimental flag
-compactor.block-upload.block-validation-enabled
with the defaulttrue
to configure whether block validation occurs on backfilled blocks. #3411 - [ENHANCEMENT] Ingester: apply a jitter to the first TSDB head compaction interval configured via
-blocks-storage.tsdb.head-compaction-interval
. Subsequent checks will happen at the configured interval. This should help to spread the TSDB head compaction among different ingesters over the configured interval. #4364 - [ENHANCEMENT] Ingester: the maximum accepted value for
-blocks-storage.tsdb.head-compaction-interval
has been increased from 5m to 15m. #4364 - [BUGFIX] Store-gateway: return
Canceled
rather thanAborted
orInternal
error when the calling querier cancels a label names or values request, and returnInternal
if processing the request fails for another reason. #4061 - [BUGFIX] Querier: track canceled requests with status code
499
in the metrics instead of503
or422
. #4099 - [BUGFIX] Ingester: compact out-of-order data during
/ingester/flush
or when TSDB is idle. #4180 - [BUGFIX] Ingester: conversion of global limits
max-series-per-user
,max-series-per-metric
,max-metadata-per-user
andmax-metadata-per-metric
into corresponding local limits now takes into account the number of ingesters in each zone. #4238 - [BUGFIX] Ingester: track
cortex_ingester_memory_series
metric consistently withcortex_ingester_memory_series_created_total
andcortex_ingester_memory_series_removed_total
. #4312 - [BUGFIX] Querier: fixed a bug which was incorrectly matching series with regular expression label matchers with begin/end anchors in the middle of the regular expression. #4340
Mixin
- [CHANGE] Move auto-scaling panel rows down beneath logical network path in Reads and Writes dashboards. #4049
- [CHANGE] Make distributor auto-scaling metric panels show desired number of replicas. #4218
- [CHANGE] Alerts: The alert
MimirMemcachedRequestErrors
has been renamed toMimirCacheRequestErrors
. #4242 - [ENHANCEMENT] Alerts: Added
MimirAutoscalerKedaFailing
alert firing when a KEDA scaler is failing. #4045 - [ENHANCEMENT] Add auto-scaling panels to ruler dashboard. #4046
- [ENHANCEMENT] Add gateway auto-scaling panels to Reads and Writes dashboards. #4049 #4216
- [ENHANCEMENT] Dashboards: distinguish between label names and label values queries. #4065
- [ENHANCEMENT] Add query-frontend and ruler-query-frontend auto-scaling panels to Reads and Ruler dashboards. #4199
- [BUGFIX] Alerts: Fixed
MimirAutoscalerNotActive
to not fire if scaling metric does not exist, to avoid false positives on scaled objects with 0 min replicas. #4045 - [BUGFIX] Alerts:
MimirCompactorHasNotSuccessfullyRunCompaction
is no longer triggered by frequent compactor restarts. #4012 - [BUGFIX] Tenants dashboard: Correctly show the ruler-query-scheduler queue size. #4152
Jsonnet
- [CHANGE] Create the
query-frontend-discovery
service only when Mimir is deployed in microservice mode without query-scheduler. #4353 - [CHANGE] Add results cache backend config to
ruler-query-frontend
configuration to allow cache reuse for cardinality-estimation based sharding. #4257 - [ENHANCEMENT] Add support for ruler auto-scaling. #4046
- [ENHANCEMENT] Add optional
weight
param tonewQuerierScaledObject
andnewRulerQuerierScaledObject
to allow running multiple querier deployments on different node types. #4141 - [ENHANCEMENT] Add support for query-frontend and ruler-query-frontend auto-scaling. #4199
- [BUGFIX] Shuffle sharding: when applying user class limits, honor the minimum shard size configured in
$._config.shuffle_sharding.*
. #4363
Mimirtool
- [FEATURE] Added
keep_firing_for
support to rules configuration. #4099 - [ENHANCEMENT] Add
-tls-insecure-skip-verify
to rules, alertmanager and backfill commands. #4162
Query-tee
- [CHANGE] Increase default value of
-backend.read-timeout
to 150s, to accommodate default querier and query frontend timeout of 120s. #4262 - [ENHANCEMENT] Log errors that occur while performing requests to compare two endpoints. #4262
- [ENHANCEMENT] When comparing two responses that both contain an error, only consider the comparison failed if the errors differ. Previously, if either response contained an error, the comparison always failed, even if both responses contained the same error. #4262
- [ENHANCEMENT] Include the value of the
X-Scope-OrgID
header when logging a comparison failure. #4262 - [BUGFIX] Parameters (expression, time range etc.) for a query request where the parameters are in the HTTP request body rather than in the URL are now logged correctly when responses differ. #4265
Documentation
- [ENHANCEMENT] Add guide on alternative migration method for Thanos to Mimir #3554
- [ENHANCEMENT] Restore "Migrate from Cortex" for Jsonnet. #3929
- [ENHANCEMENT] Document migration from microservices to read-write deployment mode. #3951
- [ENHANCEMENT] Do not error when there is nothing to commit as part of a publish #4058
- [ENHANCEMENT] Explain how to run Mimir locally using docker-compose #4079
- [ENHANCEMENT] Docs: use long flag names in runbook commands. #4088
- [ENHANCEMENT] Clarify how ingester replication happens. #4101
- [ENHANCEMENT] Improvements to the Get Started guide. #4315
- [BUGFIX] Added indentation to Azure and SWIFT backend definition. #4263
Tools
- [ENHANCEMENT] Adapt tsdb-print-chunk for native histograms. #4186
- [ENHANCEMENT] Adapt tsdb-index-health for blocks containing native histograms. #4186
- [ENHANCEMENT] Adapt tsdb-chunks tool to handle native histograms. #4186
All changes in this release: mimir-2.6.0...mimir-2.7.1