This release contains 210 PRs from 53 authors, including new contributors Abdurrahman J. Allawala, Ashray Jain, Cyrill N, Daniel Barnes, Dave, David van der Spek, day4me, Devin Trejo, Dmitriy Okladin, Gabriel Santos, inbarpatashnik, Johannes Tandler, Julien Girard, KingJ, Miller, Rafał Boniecki, Raphael Ferreira, Raúl Marín, Ruslan Kovalov, Shagit Ziganshin, shanmugara, Wilfried ROSET. Thank you!
Grafana Mimir version 2.8.0-rc.0 release notes
Grafana Labs is excited to announce version 2.8 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Features and enhancements
- Changed default value of block storage retention period The default value for
-blocks-storage.tsdb.retention-period
was24h
and now is13h
. - Query-frontend cached results now contain timestamp This allows Mimir to check if cached results are still valid based on current TTL configured for tenant. Results cached by previous Mimir version are used until they expire from cache, which can take up to 7 days. If you need to use per-tenant TTL sooner, please flush results cache manually.
- Experimental support for using Redis as cache Mimir now can use Redis for caching results, chunks, index and metadata.
- Experimental support for fetching secret from Vault for TLS configuration.
Helm chart improvements
The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
Important changes
In Grafana Mimir 2.8 we have removed the following previously deprecated or experimental configuration options or metrics.
The following metrics have been removed cortex_bucket_store_series_get_all_duration_seconds
, cortex_bucket_store_series_merge_duration_seconds
,
cortex_ingester_tsdb_wal_replay_duration_seconds
.
The following configuration options are deprecated and will be removed in Grafana Mimir 2.10:
- The CLI flag
-blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
and its respective YAML configuration optiontsdb.max_tsdb_opening_concurrency_on_startup
.
The following experimental options and features are now stable:
- Use protobuf internal query result payload format by default.
Bug fixes
- Querier: Streaming remote read will now continue to return multiple chunks per frame after the first frame. PR 4423
- Query-frontend: don't retry queries which error inside PromQL. PR 4643
- Store-gateway & query-frontend: report more consistent statistics for fetched index bytes. PR 4671
- Native histograms: fix how IsFloatHistogram determines if mimirpb.Histogram is a float histogram. PR 4706
- Query-frontend: fix query sharding for native histograms. PR 4666
Changelog
2.8.0-rc.0
Grafana Mimir
- [CHANGE] Ingester: changed experimental CLI flag from
-out-of-order-blocks-external-label-enabled
to-ingester.out-of-order-blocks-external-label-enabled
#4440 - [CHANGE] Store-gateway: The following metrics have been removed: #4332
cortex_bucket_store_series_get_all_duration_seconds
cortex_bucket_store_series_merge_duration_seconds
- [CHANGE] Ingester: changed default value of
-blocks-storage.tsdb.retention-period
from24h
to13h
. If you're running Mimir with a custom configuration and you're overriding-querier.query-store-after
to a value greater than the default12h
then you should increase-blocks-storage.tsdb.retention-period
accordingly. #4382 - [CHANGE] Ingester: the configuration parameter
-blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
has been deprecated and will be removed in Mimir 2.10. #4445 - [CHANGE] Query-frontend: Cached results now contain timestamp which allows Mimir to check if cached results are still valid based on current TTL configured for tenant. Results cached by previous Mimir version are used until they expire from cache, which can take up to 7 days. If you need to use per-tenant TTL sooner, please flush results cache manually. #4439
- [CHANGE] Ingester: the
cortex_ingester_tsdb_wal_replay_duration_seconds
metrics has been removed. #4465 - [CHANGE] Query-frontend and ruler: use protobuf internal query result payload format by default. This feature is no longer considered experimental. #4557 #4709
- [CHANGE] Ruler: reject creating federated rule groups while tenant federation is disabled. Previously the rule groups would be silently dropped during bucket sync. #4555
- [CHANGE] Compactor: the
/api/v1/upload/block/{block}/finish
endpoint now returns a429
status code when the compactor has reached the limit specified by-compactor.max-block-upload-validation-concurrency
. #4598 - [CHANGE] Compactor: when starting a block upload the maximum byte size of the block metadata provided in the request body is now limited to 1 MiB. If this limit is exceeded a
413
status code is returned. #4683 - [CHANGE] Store-gateway: cache key format for expanded postings has changed. This will invalidate the expanded postings in the index cache when deployed. #4667
- [FEATURE] Cache: Introduce experimental support for using Redis for results, chunks, index, and metadata caches. #4371
- [FEATURE] Vault: Introduce experimental integration with Vault to fetch secrets used to configure TLS for clients. Server TLS secrets will still be read from a file.
tls-ca-path
,tls-cert-path
andtls-key-path
will denote the path in Vault for the following CLI flags when-vault.enabled
is true: #4446.-distributor.ha-tracker.etcd.*
-distributor.ring.etcd.*
-distributor.forwarding.grpc-client.*
-querier.store-gateway-client.*
-ingester.client.*
-ingester.ring.etcd.*
-querier.frontend-client.*
-query-frontend.grpc-client-config.*
-query-frontend.results-cache.redis.*
-blocks-storage.bucket-store.index-cache.redis.*
-blocks-storage.bucket-store.chunks-cache.redis.*
-blocks-storage.bucket-store.metadata-cache.redis.*
-compactor.ring.etcd.*
-store-gateway.sharding-ring.etcd.*
-ruler.client.*
-ruler.alertmanager-client.*
-ruler.ring.etcd.*
-ruler.query-frontend.grpc-client-config.*
-alertmanager.sharding-ring.etcd.*
-alertmanager.alertmanager-client.*
-memberlist.*
-query-scheduler.grpc-client-config.*
-query-scheduler.ring.etcd.*
-overrides-exporter.ring.etcd.*
- [FEATURE] Distributor, ingester, querier, query-frontend, store-gateway: add experimental support for native histograms. Requires that the experimental protobuf query result response format is enabled by
-query-frontend.query-result-response-format=protobuf
on the query frontend. #4286 #4352 #4354 #4376 #4377 #4387 #4396 #4425 #4442 #4494 #4512 #4513 #4526 - [FEATURE] Added
-<prefix>.s3.storage-class
flag to configure the S3 storage class for objects written to S3 buckets. #4300 - [FEATURE] Add
freebsd
to the target OS when generating binaries for a Mimir release. #4654 - [FEATURE] Ingester: Add
prepare-shutdown
endpoint which can be used as part of Kubernetes scale down automations. #4718 - [ENHANCEMENT] Add timezone information to Alpine Docker images. #4583
- [ENHANCEMENT] Ruler: Sync rules when ruler JOINING the ring instead of ACTIVE, In order to reducing missed rule iterations during ruler restarts. #4451
- [ENHANCEMENT] Allow to define service name used for tracing via
JAEGER_SERVICE_NAME
environment variable. #4394 - [ENHANCEMENT] Querier and query-frontend: add experimental, more performant protobuf query result response format enabled with
-query-frontend.query-result-response-format=protobuf
. #4304 #4318 #4375 - [ENHANCEMENT] Compactor: added experimental configuration parameter
-compactor.first-level-compaction-wait-period
, to configure how long the compactor should wait before compacting 1st level blocks (uploaded by ingesters). This configuration option allows to reduce the chances compactor begins compacting blocks before all ingesters have uploaded their blocks to the storage. #4401 - [ENHANCEMENT] Store-gateway: use more efficient chunks fetching and caching. #4255
- [ENHANCEMENT] Query-frontend and ruler: add experimental, more performant protobuf internal query result response format enabled with
-ruler.query-frontend.query-result-response-format=protobuf
. #4331 - [ENHANCEMENT] Ruler: increased tolerance for missed iterations on alerts, reducing the chances of flapping firing alerts during ruler restarts. #4432
- [ENHANCEMENT] Optimized
.*
and.+
regular expression label matchers. #4432 - [ENHANCEMENT] Optimized regular expression label matchers with alternates (e.g.
a|b|c
). #4647 - [ENHANCEMENT] Added an in-memory cache for regular expression matchers, to avoid parsing and compiling the same expression multiple times when used in recurring queries. #4633
- [ENHANCEMENT] Query-frontend: results cache TTL is now configurable by using
-query-frontend.results-cache-ttl
and-query-frontend.results-cache-ttl-for-out-of-order-time-window
options. These values can also be specified per tenant. Default values are unchanged (7 days and 10 minutes respectively). #4385 - [ENHANCEMENT] Ingester: added advanced configuration parameter
-blocks-storage.tsdb.wal-replay-concurrency
representing the maximum number of CPUs used during WAL replay. #4445 - [ENHANCEMENT] Ingester: added metrics
cortex_ingester_tsdb_open_duration_seconds_total
to measure the total time it takes to open all existing TSDBs. The time tracked by this metric also includes the TSDBs WAL replay duration. #4465 - [ENHANCEMENT] Store-gateway: use streaming implementation for LabelNames RPC. The batch size for streaming is controlled by
-blocks-storage.bucket-store.batch-series-size
. #4464 - [ENHANCEMENT] Memcached: Add support for TLS or mTLS connections to cache servers. #4535
- [ENHANCEMENT] Compactor: blocks index files are now validated for correctness for blocks uploaded via the TSDB block upload feature. #4503
- [ENHANCEMENT] Compactor: block chunks and segment files are now validated for correctness for blocks uploaded via the TSDB block upload feature. #4549
- [ENHANCEMENT] Ingester: added configuration options to configure the "postings for matchers" cache of each compacted block queried from ingesters: #4561
-blocks-storage.tsdb.block-postings-for-matchers-cache-ttl
-blocks-storage.tsdb.block-postings-for-matchers-cache-size
-blocks-storage.tsdb.block-postings-for-matchers-cache-force
- [ENHANCEMENT] Compactor: validation of blocks uploaded via the TSDB block upload feature is now configurable on a per tenant basis: #4585
-compactor.block-upload-validation-enabled
has been added,compactor_block_upload_validation_enabled
can be used to override per tenant-compactor.block-upload.block-validation-enabled
was the previous global flag and has been removed
- [ENHANCEMENT] TSDB Block Upload: block upload validation concurrency can now be limited with
-compactor.max-block-upload-validation-concurrency
. #4598 - [ENHANCEMENT] OTLP: Add support for converting OTel exponential histograms to Prometheus native histograms. The ingestion of native histograms must be enabled, please set
-ingester.native-histograms-ingestion-enabled
totrue
. #4063 #4639 - [ENHANCEMENT] Query-frontend: add metric
cortex_query_fetched_index_bytes_total
to measure TSDB index bytes fetched to execute a query. #4597 - [ENHANCEMENT] Query-frontend: add experimental limit to enforce a max query expression size in bytes via
-query-frontend.max-query-expression-size-bytes
ormax_query_expression_size_bytes
. #4604 - [ENHANCEMENT] Query-tee: improve message logged when comparing responses and one response contains a non-JSON payload. #4588
- [ENHANCEMENT] Distributor: add ability to set per-distributor limits via
distributor_limits
block in runtime configuration in addition to the existing configuration. #4619 - [ENHANCEMENT] Querier: reduce peak memory consumption for queries that touch a large number of chunks. #4625
- [ENHANCEMENT] Query-frontend: added experimental
-query-frontend.query-sharding-max-regexp-size-bytes
limit to query-frontend. When set to a value greater than 0, query-frontend disabled query sharding for any query with a regexp matcher longer than the configured limit. #4632 - [ENHANCEMENT] Store-gateway: include statistics from LabelValues and LabelNames calls in
cortex_bucket_store_series*
metrics. #4673 - [ENHANCEMENT] Query-frontend: improve readability of distributed tracing spans. #4656
- [ENHANCEMENT] Update Docker base images from
alpine:3.17.2
toalpine:3.17.3
. #4685 - [ENHANCEMENT] Querier: improve performance when shuffle sharding is enabled and the shard size is large. #4711
- [ENHANCEMENT] Ingester: improve performance when Active Series Tracker is in use. #4717
- [ENHANCEMENT] Store-gateway: optionally select
-blocks-storage.bucket-store.series-selection-strategy
, which can limit the impact of large posting lists (when many series share the same label name and value). #4667 #4695 #4698 - [ENHANCEMENT] Querier: Cache the converted float histogram from chunk iterator, hence there is no need to lookup chunk every time to get the converted float histogram. #4684
- [BUGFIX] Querier: Streaming remote read will now continue to return multiple chunks per frame after the first frame. #4423
- [BUGFIX] Store-gateway: the values for
stage="processed"
for the metricscortex_bucket_store_series_data_touched
andcortex_bucket_store_series_data_size_touched_bytes
when using fine-grained chunks caching is now reporting the correct values of chunks held in memory. #4449 - [BUGFIX] Compactor: fixed reporting a compaction error when compactor is correctly shut down while populating blocks. #4580
- [BUGFIX] OTLP: Do not drop exemplars of the OTLP Monotonic Sum metric. #4063
- [BUGFIX] Packaging: flag
/etc/default/mimir
and/etc/sysconfig/mimir
as config to prevent overwrite. #4587 - [BUGFIX] Query-frontend: don't retry queries which error inside PromQL. #4643
- [BUGFIX] Store-gateway & query-frontend: report more consistent statistics for fetched index bytes. #4671
- [BUGFIX] Native histograms: fix how IsFloatHistogram determines if mimirpb.Histogram is a float histogram. #4706
- [BUGFIX] Query-frontend: fix query sharding for native histograms. #4666
- [BUGFIX] Ring status page: fixed the owned tokens percentage value displayed. #4730
- [BUGFIX] Querier: fixed chunk iterator that can return sample with wrong timestamp. #4450
Mixin
- [ENHANCEMENT] Queries: Display data touched per sec in bytes instead of number of items. #4492
- [ENHANCEMENT]
_config.job_names.<job>
values can now be arrays of regular expressions in addition to a single string. Strings are still supported and behave as before. #4543 - [ENHANCEMENT] Queries dashboard: remove mention to store-gateway "streaming enabled" in panels because store-gateway only support streaming series since Mimir 2.7. #4569
- [ENHANCEMENT] Ruler: Add panel description for Read QPS panel in Ruler dashboard to explain values when in remote ruler mode. #4675
- [BUGFIX] Ruler dashboard: show data for reads from ingesters. #4543
- [BUGFIX] Pod selector regex for deployments: change
(.*-mimir-)
to(.*mimir-)
. #4603
Jsonnet
- [CHANGE] Ruler: changed ruler deployment max surge from
0
to50%
, and max unavailable from1
to0
. #4381 - [CHANGE] Memcached connections parameters
-blocks-storage.bucket-store.index-cache.memcached.max-idle-connections
,-blocks-storage.bucket-store.chunks-cache.memcached.max-idle-connections
and-blocks-storage.bucket-store.metadata-cache.memcached.max-idle-connections
settings are now configured based onmax-get-multi-concurrency
andmax-async-concurrency
. #4591 - [CHANGE] Add support to use external Redis as cache. Following are some changes in the jsonnet config: #4386 #4640
- Renamed
memcached_*_enabled
config options tocache_*_enabled
- Renamed
memcached_*_max_item_size_mb
config options tocache_*_max_item_size_mb
- Added
cache_*_backend
config options
- Renamed
- [CHANGE] Store-gateway StatefulSets with disabled multi-zone deployment are also unregistered from the ring on shutdown. This eliminated resharding during rollouts, at the cost of extra effort during scaling down store-gateways. For more information see Scaling down store-gateways. #4713
- [ENHANCEMENT] Alertmanager: add
alertmanager_data_disk_size
andalertmanager_data_disk_class
configuration options, by default no storage class is set. #4389 - [ENHANCEMENT] Update
rollout-operator
tov0.4.0
. #4524 - [ENHANCEMENT] Update memcached to
memcached:1.6.19-alpine
. #4581 - [ENHANCEMENT] Add support for mTLS connections to Memcached servers. #4553
- [ENHANCEMENT] Update the
memcached-exporter
tov0.11.2
. #4570 - [ENHANCEMENT] Autoscaling: Add
autoscaling_query_frontend_memory_target_utilization
,autoscaling_ruler_query_frontend_memory_target_utilization
, andautoscaling_ruler_memory_target_utilization
configuration options, for controlling the corresponding autoscaler memory thresholds. Each has a default of 1, i.e. 100%. #4612 - [ENHANCEMENT] Distributor: add ability to set per-distributor limits via
distributor_instance_limits
using runtime configuration. #4627 - [BUGFIX] Add missing query sharding settings for user_24M and user_32M plans. #4374
Mimirtool
- [ENHANCEMENT] Backfill: mimirtool will now sleep and retry if it receives a 429 response while trying to finish an upload due to validation concurrency limits. #4598
- [ENHANCEMENT]
gauge
panel type is supported now inmimirtool analyze dashboard
. #4679 - [ENHANCEMENT] Set a
User-Agent
header on requests to Mimir or Prometheus servers. #4700
Mimir Continuous Test
- [FEATURE] Allow continuous testing of native histograms as well by enabling the flag
-tests.write-read-series-test.histogram-samples-enabled
. The metrics exposed by the tool will now have a new label calledtype
with possible values offloat
,histogram_float_counter
,histogram_float_gauge
,histogram_int_counter
,histogram_int_gauge
, the list of metrics impacted: #4457mimir_continuous_test_writes_total
mimir_continuous_test_writes_failed_total
mimir_continuous_test_queries_total
mimir_continuous_test_queries_failed_total
mimir_continuous_test_query_result_checks_total
mimir_continuous_test_query_result_checks_failed_total
- [ENHANCEMENT] Added a new metric
mimir_continuous_test_build_info
that reports version information, similar to the existingcortex_build_info
metric exposed by other Mimir components. #4712 - [ENHANCEMENT] Add coherency for the selected ranges and instants of test queries. #4704
Documentation
- [CHANGE] Clarify what deprecation means in the lifecycle of configuration parameters. #4499
- [CHANGE] Update compactor
split-groups
andsplit-and-merge-shards
recommendation on component page. #4623 - [FEATURE] Add instructions about how to configure native histograms. #4527
- [ENHANCEMENT] Runbook for MimirCompactorHasNotSuccessfullyRunCompaction extended to include scenario where compaction has fallen behind. #4609
- [ENHANCEMENT] Add explanation for QPS values for reads in remote ruler mode and writes generally, to the Ruler dashboard page. #4629
- [ENHANCEMENT] Expand zone-aware replication page to cover single physical availability zone deployments. #4631
- [FEATURE] Add instructions to use puppet module. #4610
Tools
- [ENHANCEMENT] tsdb-index: iteration over index is now faster when any equal matcher is supplied. #4515
All changes in this release: mimir-2.7.1...mimir-2.8.0-rc.0