This release contains 593 PRs from 66 authors, including new contributors Adrian Berger, Albert Kerr, Alexander Davis, Alyssa Wada, Aofei Sheng, Bailhache Pierre, Bradley, David Stevens, Davin Kevin, Dennis Haney, Felipe Ferreira, Jeongseup, Nicholas Kress, Paul Farver, Pooya, Rajguru, Sephia Laureencia, Sviat Loginov, Taehyun Kim, Taylor C, Tito Lins, Willem Gillis, William Travis Holton, William Wernert, Yuri Tseretyan. Thank you!
Grafana Mimir version 2.14.0-rc.0 release notes
Grafana Labs is excited to announce version 2.14 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bug fixes in this release.
For the complete list of changes, refer to the CHANGELOG.
Features and enhancements
The streaming of chunks from store-gateways to queriers is now enabled by default. This reduces the memory usage in queriers. This was an experimental feature since Mimir 2.10, and is now considered stable.
Compactor adds a new cortex_compactor_disk_out_of_space_errors_total
counter metric that tracks how many times a compaction fails due to the compactor being out of disk.
The distributor now replies with the Retry-After
header on retryable errors by default. This protects Mimir from clients, including Prometheus, that default to retrying very quickly, making recovering from an outage easier. The feature was originally added as experimental in Mimir 2.11.
Incoming OTLP requests were previously size-limited with the distributor's -distributor.max-recv-msg-size
configuration. The distributor has a new -distributor.max-otlp-request-size
configuration for limiting OTLP requests. The default value is 100 MiB.
Ingesters can be marked as read-only as part of their downscaling procedure. The new prepare-instance-ring-downscale
endpoint updates the read-only status of an ingester in the ring.
Important changes
In Grafana Mimir 2.14, the following behavior has changed:
When running a remote read request, the querier honors the time range specified in the read hints.
The default inactivity timeout of active series in ingesters, controlled by the -ingester.active-series-metrics-idle-timeout
configuration, is increased from 10m
to 20m
.
The following featues of store-gateway are changed: -blocks-storage.bucket-store.max-concurrent-queue-timeout
is set to five seconds; -blocks-storage.bucket-store.index-header.lazy-loading-concurrency-queue-timeout
is set to five seconds; -blocks-storage.bucket-store.max-concurrent
is set to 200.
The experimental support for Redis caching is now deprecated and set to be removed in the next major release. Users are encouraged
to switch to use Memcached.
The following deprecated configuration options were removed in this release:
- The
-ingester.return-only-grpc-errors
option in the ingester - The
-ingester.client.circuit-breaker.*
options in the ingester - The
-ingester.limit-inflight-requests-using-grpc-method-limiter
option in the ingester - The
-ingester.client.report-grpc-codes-in-instrumentation-label-enabled
option in the distributor and ruler - The
-distributor.limit-inflight-requests-using-grpc-method-limiter
option in the distributor - The
-distributor.enable-otlp-metadata-storage
option in the distributor - The
-ruler.drain-notification-queue-on-shutdown
option in the ruler - The
-querier.max-query-into-future
option in the querier - The
-querier.prefer-streaming-chunks-from-store-gateways
option in the querier and the store-gateway - The
-query-scheduler.use-multi-algorithm-query-queue
option in the querier-scheduler - The YAML configuration
frontend.align_queries_with_step
in the query-frontend
Experimental features
Grafana Mimir 2.14 includes some features that are experimental and disabled by default.
Use these features with caution and report any issues that you encounter:
The ingester added an experimental -ingester.ignore-ooo-exemplars
configuration. When set, out-of-order exemplars are no longer reported to the remote write client.
The querier supports the experimental limitk()
and limit_ratio()
PromQL functions. This feature is disabled by default,
but you can enable it with the -querier.promql-experimental-functions-enabled=true
setting in the query-frontend and the querier.
Bug fixes
- Alertmanager: fix configuration validation gap around unreferenced templates.
- Alertmanager: fix goroutine leak when stored configuration fails to apply and there is no existing tenant alertmanager.
- Alertmanager: fix receiver firewall to detect
0.0.0.0
and IPv6 interface-local multicast address as local addresses. - Alertmanager: fix per-tenant silence limits not reloaded during runtime.
- Alertmanager: fix bugs in silences that could cause an existing silence to expire/be deleted when updating the silence fails. This could happen when the updated silence was invalid or exceeded limits.
- Alertmanager: fix help message for utf-8-strict-mode.
- Compactor: fix a race condition between different compactor replicas that may cause a deleted block to be referenced as non-deleted in the bucket index.
- Configuration: multi-line environment variables are flattened during injection to be compatible with YAML syntax.
- HA Tracker: store correct timestamp for the last-received request from the elected replica.
- Ingester: fix the sporadic
not found
error causing an internal server error if label names are queried with matchers during head compaction. - Ingester, store-gateway: fix case insensitive regular expressions not correctly matching some Unicode characters.
- Ingester: fixed timestamp reported in the "the sample has been rejected because its timestamp is too old" error when the write request contains only histograms.
- Query-frontend: fix
-querier.max-query-lookback
and-compactor.blocks-retention-period
enforcement in query-frontend when one of the two is not set. - Query-frontend: "query stats" log includes the actual
status_code
when the request fails due to an error occurring in the query-frontend itself. - Query-frontend: ensure that internal errors result in an HTTP 500 response code instead of a 422 response code.
- Query-frontend: return annotations generated during evaluation of sharded queries.
- Query-scheduler: fix a panic in request queueing.
- Querier: fix the issue where "context canceled" is logged for trace spans for requests to store-gateways that return no series when chunks streaming is enabled.
- Querier: fix issue where queries can return incorrect results if a single store-gateway returns overlapping chunks for a series.
- Querier: do not return
grpc: the client connection is closing
errors as HTTP499
. - Querier: fix issue where some native histogram-related warnings were not emitted when
rate()
was used over native histograms. - Querier: fix invalid query results when multiple chunks are merged.
- Querier: support optional start and end times on
/prometheus/api/v1/labels
,/prometheus/api/v1/label/<label>/values
, and/prometheus/api/v1/series
whenmax_query_into_future: 0
. - Querier: fix issue where both recently compacted blocks and their source blocks can be skipped during querying if store-gateways are restarting.
- Ruler: add support for draining any outstanding alert notifications before shutting down. Enable this setting with the
-ruler.drain-notification-queue-on-shutdown=true
CLI flag. - Store-gateway: fixed a case where, on a quick subsequent restart, the previous lazy-loaded index header snapshot was overwritten by a partially loaded one.
- Store-gateway: store sparse index headers atomically to disk.
- Ruler: map invalid org-id errors to the 400 status code.
Helm chart improvements
The Grafana Mimir and Grafana Enterprise Metrics Helm charts are released independently.
Refer to the Grafana Mimir Helm chart documentation.
Changelog
2.14.0-rc.0
Grafana Mimir
- [CHANGE] Update minimal supported version of Go to 1.22. #9134
- [CHANGE] Store-gateway / querier: enable streaming chunks from store-gateways to queriers by default. #6646
- [CHANGE] Querier: honor the start/end time range specified in the read hints when executing a remote read request. #8431
- [CHANGE] Querier: return only samples within the queried start/end time range when executing a remote read request using "SAMPLES" mode. Previously, samples outside of the range could have been returned. Samples outside of the queried time range may still be returned when executing a remote read request using "STREAMED_XOR_CHUNKS" mode. #8463
- [CHANGE] Querier: Set minimum for
-querier.max-concurrent
to four to prevent queue starvation with querier-worker queue prioritization algorithm; values below the minimum four are ignored and set to the minimum. #9054 - [CHANGE] Store-gateway: enabled
-blocks-storage.bucket-store.max-concurrent-queue-timeout
by default with a timeout of 5 seconds. #8496 - [CHANGE] Store-gateway: enabled
-blocks-storage.bucket-store.index-header.lazy-loading-concurrency-queue-timeout
by default with a timeout of 5 seconds . #8667 - [CHANGE] Distributor: Incoming OTLP requests were previously size-limited by using limit from
-distributor.max-recv-msg-size
option. We have added option-distributor.max-otlp-request-size
for limiting OTLP requests, with default value of 100 MiB. #8574 - [CHANGE] Distributor: remove metric
cortex_distributor_sample_delay_seconds
. #8698 - [CHANGE] Query-frontend: Remove deprecated
frontend.align_queries_with_step
YAML configuration. The configuration option has been moved to per-tenant and defaultlimits
since Mimir 2.12. #8733 #8735 - [CHANGE] Store-gateway: Change default of
-blocks-storage.bucket-store.max-concurrent
to 200. #8768 - [CHANGE] Added new metric
cortex_compactor_disk_out_of_space_errors_total
which counts how many times a compaction failed due to the compactor being out of disk, alert if there is a single increase. #8237 #8278 - [CHANGE] Store-gateway: Remove experimental parameter
-blocks-storage.bucket-store.series-selection-strategy
. The default strategy is nowworst-case
. #8702 - [CHANGE] Store-gateway: Rename
-blocks-storage.bucket-store.series-selection-strategies.worst-case-series-preference
to-blocks-storage.bucket-store.series-fetch-preference
and promote to stable. #8702 - [CHANGE] Querier, store-gateway: remove deprecated
-querier.prefer-streaming-chunks-from-store-gateways=true
. Streaming from store-gateways is now always enabled. #8696 - [CHANGE] Ingester: remove deprecated
-ingester.return-only-grpc-errors
. #8699 #8828 - [CHANGE] Distributor, ruler: remove deprecated
-ingester.client.report-grpc-codes-in-instrumentation-label-enabled
. #8700 - [CHANGE] Ingester client: experimental support for client-side circuit breakers, their configuration options (
-ingester.client.circuit-breaker.*
) and metrics (cortex_ingester_client_circuit_breaker_results_total
,cortex_ingester_client_circuit_breaker_transitions_total
) were removed. #8802 - [CHANGE] Ingester: circuit breakers do not open in case of per-instance limit errors anymore. Opening can be triggered only in case of push and pull requests exceeding the configured duration. #8854
- [CHANGE] Query-frontend: Return
413 Request Entity Too Large
if a response shard for an/active_series
request is too large. #8861 - [CHANGE] Distributor: Promote replying with
Retry-After
header on retryable errors to stable and set-distributor.retry-after-header.enabled=true
by default. #8694 - [CHANGE] Distributor: Replace
-distributor.retry-after-header.max-backoff-exponent
and-distributor.retry-after-header.base-seconds
with-distributor.retry-after-header.min-backoff
and-distributor.retry-after-header.max-backoff
for easier configuration. #8694 - [CHANGE] Ingester: increase the default inactivity timeout of active series (
-ingester.active-series-metrics-idle-timeout
) from10m
to20m
. #8975 - [CHANGE] Distributor: Remove
-distributor.enable-otlp-metadata-storage
flag, which was deprecated in version 2.12. #9069 - [CHANGE] Ruler: Removed
-ruler.drain-notification-queue-on-shutdown
option, which is now enabled by default. #9115 - [CHANGE] Querier: allow wrapping errors with context errors only when the former actually correspond to
context.Canceled
andcontext.DeadlineExceeded
. #9175 - [CHANGE] Query-scheduler: Remove the experimental
-query-scheduler.use-multi-algorithm-query-queue
flag. The new multi-algorithm tree queue is always used for the scheduler. #9210 - [CHANGE] Distributor: reject incoming requests until the distributor service has started. #9317
- [CHANGE] Ingester, Distributor: Remove deprecated
-ingester.limit-inflight-requests-using-grpc-method-limiter
and-distributor.limit-inflight-requests-using-grpc-method-limiter
. The feature was deprecated and enabled by default in Mimir 2.12. #9407 - [CHANGE] Querier: Remove deprecated
-querier.max-query-into-future
. The feature was deprecated in Mimir 2.12. #9407 - [FEATURE] Alertmanager: Added
-alertmanager.log-parsing-label-matchers
to control logging when parsing label matchers. This flag is intended to be used with-alertmanager.utf8-strict-mode-enabled
to validate UTF-8 strict mode is working as intended. The default value isfalse
. #9173 - [FEATURE] Alertmanager: Added
-alertmanager.utf8-migration-logging-enabled
to enable logging of tenant configurations that are incompatible with UTF-8 strict mode. The default value isfalse
. #9174 - [FEATURE] Querier: add experimental streaming PromQL engine, enabled with
-querier.query-engine=mimir
. #8422 #8430 #8454 #8455 #8360 #8490 #8508 #8577 #8660 #8671 #8677 #8747 #8850 #8872 #8838 #8911 #8909 #8923 #8924 #8925 #8932 #8933 #8934 #8962 #8986 #8993 #8995 #9008 #9017 #9018 #9019 #9120 #9121 #9136 #9139 #9140 #9145 #9191 #9192 #9194 #9196 #9201 #9212 #9225 #9260 #9272 #9277 #9278 #9280 #9281 #9342 #9343 #9371 - [FEATURE] Experimental Kafka-based ingest storage. #6888 #6894 #6929 #6940 #6951 #6974 #6982 #7029 #7030 #7091 #7142 #7147 #7148 #7153 #7160 #7193 #7349 #7376 #7388 #7391 #7393 #7394 #7402 #7404 #7423 #7424 #7437 #7486 #7503 #7508 #7540 #7621 #7682 #7685 #7694 #7695 #7696 #7697 #7701 #7733 #7734 #7741 #7752 #7838 #7851 #7871 #7877 #7880 #7882 #7887 #7891 #7925 #7955 #7967 #8031 #8063 #8077 #8088 #8135 #8176 #8184 #8194 #8216 #8217 #8222 #8233 #8503 #8542 #8579 #8657 #8686 #8688 #8703 #8706 #8708 #8738 #8750 #8778 #8808 #8809 #8841 #8842 #8845 #8853 #8886 #8988
- What it is:
- When the new ingest storage architecture is enabled, distributors write incoming write requests to a Kafka-compatible backend, and the ingesters asynchronously replay ingested data from Kafka. In this architecture, the write and read path are de-coupled through a Kafka-compatible backend. The write path and Kafka load is a function of the incoming write traffic, the read path load is a function of received queries. Whatever the load on the read path, it doesn't affect the write path.
- New configuration options:
-ingest-storage.enabled
-ingest-storage.kafka.*
: configures Kafka-compatible backend and how clients interact with it.-ingest-storage.ingestion-partition-tenant-shard-size
: configures the per-tenant shuffle-sharding shard size used by partitions ring.-ingest-storage.read-consistency
: configures the default read consistency.-ingest-storage.migration.distributor-send-to-ingesters-enabled
: enabled tee-ing writes to classic ingesters and Kafka, used during a live migration to the new ingest storage architecture.-ingester.partition-ring.*
: configures partitions ring backend.
- What it is:
- [FEATURE] Querier: added support for
limitk()
andlimit_ratio()
experimental PromQL functions. Experimental functions are disabled by default, but can be enabled setting-querier.promql-experimental-functions-enabled=true
in the query-frontend and querier. #8632 - [FEATURE] Querier: experimental support for
X-Mimir-Chunk-Info-Logger
header that triggers logging information about TSDB chunks loaded from ingesters and store-gateways in the querier. The header should contain the comma separated list of labels for which their value will be included in the logs. #8599 - [FEATURE] Query frontend: added new query pruning middleware to enable pruning dead code (eg. expressions that cannot produce any results) and simplifying expressions (eg. expressions that can be evaluated immediately) in queries. #9086
- [FEATURE] Ruler: added experimental configuration,
-ruler.rule-evaluation-write-enabled
, to disable writing the result of rule evaluation to ingesters. This feature can be used for testing purposes. #9060 - [FEATURE] Ingester: added experimental configuration
ingester.ignore-ooo-exemplars
. When set totrue
out of order exemplars are no longer reported to the remote write client. #9151 - [ENHANCEMENT] Compactor: Add
cortex_compactor_compaction_job_duration_seconds
andcortex_compactor_compaction_job_blocks
histogram metrics to track duration of individual compaction jobs and number of blocks per job. #8371 - [ENHANCEMENT] Rules: Added per namespace max rules per rule group limit. The maximum number of rules per rule groups for all namespaces continues to be configured by
-ruler.max-rules-per-rule-group
, but now, this can be superseded by the new-ruler.max-rules-per-rule-group-by-namespace
option on a per namespace basis. This new limit can be overridden using the overrides mechanism to be applied per-tenant. #8378 - [ENHANCEMENT] Rules: Added per namespace max rule groups per tenant limit. The maximum number of rule groups per rule tenant for all namespaces continues to be configured by
-ruler.max-rule-groups-per-tenant
, but now, this can be superseded by the new-ruler.max-rule-groups-per-tenant-by-namespace
option on a per namespace basis. This new limit can be overridden using the overrides mechanism to be applied per-tenant. #8425 - [ENHANCEMENT] Ruler: Added support to protect rules namespaces from modification. The
-ruler.protected-namespaces
flag can be used to specify namespaces that are protected from rule modifications. The headerX-Mimir-Ruler-Override-Namespace-Protection
can be used to override the protection. #8444 - [ENHANCEMENT] Query-frontend: be able to block remote read queries via the per tenant runtime override
blocked_queries
. #8372 #8415 - [ENHANCEMENT] Query-frontend: added
remote_read
toop
supported label values for thecortex_query_frontend_queries_total
metric. #8412 - [ENHANCEMENT] Query-frontend: log the overall length and start, end time offset from current time for remote read requests. The start and end times are calculated as the miminum and maximum times of the individual queries in the remote read request. #8404
- [ENHANCEMENT] Storage Provider: Added option
-<prefix>.s3.dualstack-enabled
that allows disabling S3 client from resolving AWS S3 endpoint into dual-stack IPv4/IPv6 endpoint. Defaults to true. #8405 - [ENHANCEMENT] HA Tracker: Added reporting of most recent elected replica change via
cortex_ha_tracker_last_election_timestamp_seconds
gauge, logging, and a new column in the HA Tracker status page. #8507 - [ENHANCEMENT] Use sd_notify to send events to systemd at start and stop of mimir services. Default systemd mimir.service config now wait for those events with a configurable timeout
TimeoutStartSec
default is 3 min to handle long start time (ex. store-gateway). #8220 #8555 #8658 - [ENHANCEMENT] Alertmanager: Reloading config and templates no longer needs to hit the disk. #4967
- [ENHANCEMENT] Compactor: Added experimental
-compactor.in-memory-tenant-meta-cache-size
option to set size of in-memory cache (in number of items) for parsed meta.json files. This can help when a tenant has many meta.json files and their parsing before each compaction cycle is using a lot of CPU time. #8544 - [ENHANCEMENT] Distributor: Interrupt OTLP write request translation when context is canceled or has timed out. #8524
- [ENHANCEMENT] Ingester, store-gateway: optimised regular expression matching for patterns like
1.*|2.*|3.*|...|1000.*
. #8632 - [ENHANCEMENT] Query-frontend: Add
header_cache_control
to query stats. #8590 - [ENHANCEMENT] Query-scheduler: Introduce
query-scheduler.use-multi-algorithm-query-queue
, which allows use of an experimental queue structure, with no change in external queue behavior. #7873 - [ENHANCEMENT] Query-scheduler: Improve CPU/memory performance of experimental query-scheduler. #8871
- [ENHANCEMENT] Expose a new
s3.trace.enabled
configuration option to enable detailed logging of operations against S3-compatible object stores. #8690 - [ENHANCEMENT] memberlist: locally-generated messages (e.g. ring updates) are sent to gossip network before forwarded messages. Introduced
-memberlist.broadcast-timeout-for-local-updates-on-shutdown
option to modify how long to wait until queue with locally-generated messages is empty when shutting down. Previously this was hard-coded to 10s, and wait included all messages (locally-generated and forwarded). Now it defaults to 10s, 0 means no timeout. Increasing this value may help to avoid problem when ring updates on shutdown are not propagated to other nodes, and ring entry is left in a wrong state. #8761 - [ENHANCEMENT] Querier: allow using both raw numbers of seconds and duration literals in queries where previously only one or the other was permitted. For example,
predict_linear
now accepts a duration literal (eg.predict_linear(..., 4h)
), and range vector selectors now accept a number of seconds (eg.rate(metric[2])
). #8780 - [ENHANCEMENT] Ruler: Add
ruler.max-independent-rule-evaluation-concurrency
to allow independent rules of a tenant to be run concurrently. You can control the amount of concurrency per tenant is controlled via the-ruler.max-independent-rule-evaluation-concurrency-per-tenan
as a limit. Use a-ruler.max-independent-rule-evaluation-concurrency
value of0
can be used to disable the feature for all tenants. By default, this feature is disabled. A rule is eligible for concurrency as long as it doesn't depend on any other rules, doesn't have any other rules that depend on it, and has a total rule group runtime that exceeds 50% of its interval by default. The threshold can can be adjusted with-ruler.independent-rule-evaluation-concurrency-min-duration-percentage
. #8146 #8858 #8880 #8884- This work introduces the following metrics:
cortex_ruler_independent_rule_evaluation_concurrency_slots_in_use
cortex_ruler_independent_rule_evaluation_concurrency_attempts_started_total
cortex_ruler_independent_rule_evaluation_concurrency_attempts_incomplete_total
cortex_ruler_independent_rule_evaluation_concurrency_attempts_completed_total
- This work introduces the following metrics:
- [ENHANCEMENT] Expose a new
s3.session-token
configuration option to enable using temporary security credentials. #8952 - [ENHANCEMENT] Add HA deduplication features to the
mimir-microservices-mode
development environment. #9012 - [ENHANCEMENT] Remove experimental
-query-frontend.additional-query-queue-dimensions-enabled
and-query-scheduler.additional-query-queue-dimensions-enabled
. Mimir now always includes "query components" as a queue dimension. #8984 #9135 - [ENHANCEMENT] Add a new ingester endpoint to prepare instances to downscale. #8956
- [ENHANCEMENT] Query-scheduler: Add
query-scheduler.prioritize-query-components
which, when enabled, will primarily prioritize dequeuing fairly across queue components, and secondarily prioritize dequeuing fairly across tenants. When disabled, tenant fairness is primarily prioritized.query-scheduler.use-multi-algorithm-query-queue
must be enabled in order to use this flag. #9016 #9071 - [ENHANCEMENT] Update runtime configuration to read gzip-compressed files with
.gz
extension. #9074 - [ENHANCEMENT] Ingester: add
cortex_lifecycler_read_only
metric which is set to 1 when ingester's lifecycler is set to read-only mode. #9095 - [ENHANCEMENT] Add a new field,
encode_time_seconds
to query stats log messages, to record the amount of time it takes the query-frontend to encode a response. This does not include any serialization time for downstream components. #9062 - [ENHANCEMENT] OTLP: If the flag
-distributor.otel-created-timestamp-zero-ingestion-enabled
is true, OTel start timestamps are converted to Prometheus zero samples to mark series start. #9131 - [ENHANCEMENT] Querier: attach logs emitted during query consistency check to trace span for query. #9213
- [ENHANCEMENT] Query-scheduler: Experimental
-query-scheduler.prioritize-query-components
flag enables the querier-worker queue priority algorithm to take precedence over tenant rotation when dequeuing requests. #9220 - [ENHANCEMENT] Add application credential arguments for Openstack Swift storage backend. #9181
- [BUGFIX] Ruler: add support for draining any outstanding alert notifications before shutting down. This can be enabled with the
-ruler.drain-notification-queue-on-shutdown=true
CLI flag. #8346 - [BUGFIX] Query-frontend: fix
-querier.max-query-lookback
enforcement when-compactor.blocks-retention-period
is not set, and viceversa. #8388 - [BUGFIX] Ingester: fix sporadic
not found
error causing an internal server error if label names are queried with matchers during head compaction. #8391 - [BUGFIX] Ingester, store-gateway: fix case insensitive regular expressions not matching correctly some Unicode characters. #8391
- [BUGFIX] Query-frontend: "query stats" log now includes the actual
status_code
when the request fails due to an error occurring in the query-frontend itself. #8407 - [BUGFIX] Store-gateway: fixed a case where, on a quick subsequent restart, the previous lazy-loaded index header snapshot was overwritten by a partially loaded one. #8281
- [BUGFIX] Ingester: fixed timestamp reported in the "the sample has been rejected because its timestamp is too old" error when the write request contains only histograms. #8462
- [BUGFIX] Store-gateway: store sparse index headers atomically to disk. #8485
- [BUGFIX] Query scheduler: fix a panic in request queueing. #8451
- [BUGFIX] Querier: fix issue where "context canceled" is logged for trace spans for requests to store-gateways that return no series when chunks streaming is enabled. #8510
- [BUGFIX] Alertmanager: Fix per-tenant silence limits not reloaded during runtime. #8456
- [BUGFIX] Alertmanager: Fixes a number of bugs in silences which could cause an existing silence to be deleted/expired when updating the silence failed. This could happen when the replacing silence was invalid or exceeded limits. #8525
- [BUGFIX] Alertmanager: Fix help message for utf-8-strict-mode. #8572
- [BUGFIX] Query-frontend: Ensure that internal errors result in an HTTP 500 response code instead of 422. #8595 #8666
- [BUGFIX] Configuration: Multi line envs variables are flatten during injection to be compatible with YAML syntax
- [BUGFIX] Querier: fix issue where queries can return incorrect results if a single store-gateway returns overlapping chunks for a series. #8827
- [BUGFIX] HA Tracker: store correct timestamp for last received request from elected replica. #8821
- [BUGFIX] Querier: do not return
grpc: the client connection is closing
errors as HTTP499
. #8865 #8888 - [BUGFIX] Compactor: fix a race condition between different compactor replicas that may cause a deleted block to be still referenced as non-deleted in the bucket index. #8905
- [BUGFIX] Querier: fix issue where some native histogram-related warnings were not emitted when
rate()
was used over native histograms. #8918 - [BUGFIX] Ruler: map invalid org-id errors to 400 status code. #8935
- [BUGFIX] Querier: Fix invalid query results when multiple chunks are being merged. #8992
- [BUGFIX] Query-frontend: return annotations generated during evaluation of sharded queries. #9138
- [BUGFIX] Querier: Support optional start and end times on
/prometheus/api/v1/labels
,/prometheus/api/v1/label/<label>/values
, and/prometheus/api/v1/series
whenmax_query_into_future: 0
. #9129 - [BUGFIX] Alertmanager: Fix config validation gap around unreferenced templates. #9207
- [BUGFIX] Alertmanager: Fix goroutine leak when stored config fails to apply and there is no existing tenant alertmanager #9211
- [BUGFIX] Querier: fix issue where both recently compacted blocks and their source blocks can be skipped during querying if store-gateways are restarting. #9224
- [BUGFIX] Alertmanager: fix receiver firewall to detect
0.0.0.0
and IPv6 interface-local multicast address as local addresses. #9308
Mixin
- [CHANGE] Dashboards: set default auto-refresh rate to 5m. #8758
- [ENHANCEMENT] Dashboards: allow switching between using classic or native histograms in dashboards.
- Overview dashboard: status, read/write latency and queries/ingestion per sec panels,
cortex_request_duration_seconds
metric. #7674 #8502 #8791 - Writes dashboard:
cortex_request_duration_seconds
metric. #8757 #8791 - Reads dashboard:
cortex_request_duration_seconds
metric. #8752 - Rollout progress dashboard:
cortex_request_duration_seconds
metric. #8779 - Alertmanager dashboard:
cortex_request_duration_seconds
metric. #8792 - Ruler dashboard:
cortex_request_duration_seconds
metric. #8795 - Queries dashboard:
cortex_request_duration_seconds
metric. #8800 - Remote ruler reads dashboard:
cortex_request_duration_seconds
metric. #8801
- Overview dashboard: status, read/write latency and queries/ingestion per sec panels,
- [ENHANCEMENT] Alerts:
MimirRunningIngesterReceiveDelayTooHigh
alert has been tuned to be more reactive to high receive delay. #8538 - [ENHANCEMENT] Dashboards: improve end-to-end latency and strong read consistency panels when experimental ingest storage is enabled. #8543 #8830
- [ENHANCEMENT] Dashboards: Add panels for monitoring ingester autoscaling when not using ingest-storage. These panels are disabled by default, but can be enabled using the
autoscaling.ingester.enabled: true
config option. #8484 - [ENHANCEMENT] Dashboards: Add panels for monitoring store-gateway autoscaling. These panels are disabled by default, but can be enabled using the
autoscaling.store_gateway.enabled: true
config option. #8824 - [ENHANCEMENT] Dashboards: add panels to show writes to experimental ingest storage backend in the "Mimir / Ruler" dashboard, when
_config.show_ingest_storage_panels
is enabled. #8732 - [ENHANCEMENT] Dashboards: show all series in tooltips on time series dashboard panels. #8748
- [ENHANCEMENT] Dashboards: add compactor autoscaling panels to "Mimir / Compactor" dashboard. The panels are disabled by default, but can be enabled setting
_config.autoscaling.compactor.enabled
totrue
. #8777 - [ENHANCEMENT] Alerts: added
MimirKafkaClientBufferedProduceBytesTooHigh
alert. #8763 - [ENHANCEMENT] Dashboards: added "Kafka produced records / sec" panel to "Mimir / Writes" dashboard. #8763
- [ENHANCEMENT] Alerts: added
MimirStrongConsistencyOffsetNotPropagatedToIngesters
alert, and renameMimirIngesterFailsEnforceStrongConsistencyOnReadPath
alert toMimirStrongConsistencyEnforcementFailed
. #8831 - [ENHANCEMENT] Dashboards: remove "All" option for namespace dropdown in dashboards. #8829
- [ENHANCEMENT] Dashboards: add Kafka end-to-end latency outliers panel in the "Mimir / Writes" dashboard. #8948
- [ENHANCEMENT] Dashboards: add "Out-of-order samples appended" panel to "Mimir / Tenants" dashboard. #8939
- [ENHANCEMENT] Alerts:
RequestErrors
andRulerRemoteEvaluationFailing
have been enriched with a native histogram version. #9004 - [ENHANCEMENT] Dashboards: add 'Read path' selector to 'Mimir / Queries' dashboard. #8878
- [ENHANCEMENT] Dashboards: add annotation indicating active series are being reloaded to 'Mimir / Tenants' dashboard. #9257
- [ENHANCEMENT] Dashboards: limit results on the 'Failed evaluations rate' panel of the 'Mimir / Tenants' dashboard to 50 to avoid crashing the page when there are many failing groups. #9262
- [FEATURE] Alerts: add
MimirGossipMembersEndpointsOutOfSync
alert. #9347 - [BUGFIX] Dashboards: fix "current replicas" in autoscaling panels when HPA is not active. #8566
- [BUGFIX] Alerts: do not fire
MimirRingMembersMismatch
during the migration to experimental ingest storage. #8727 - [BUGFIX] Dashboards: avoid over-counting of ingesters metrics when migrating to experimental ingest storage. #9170
- [BUGFIX] Dashboards: fix
job_prefix
not utilized injobSelector
. #9155
Jsonnet
- [CHANGE] Changed the following config options when the experimental ingest storage is enabled: #8874
ingest_storage_ingester_autoscaling_min_replicas
changed toingest_storage_ingester_autoscaling_min_replicas_per_zone
ingest_storage_ingester_autoscaling_max_replicas
changed toingest_storage_ingester_autoscaling_max_replicas_per_zone
- [CHANGE] Changed the overrides configmap generation to remove any field with
null
value. #9116 - [CHANGE]
$.replicaTemplate
function now takes replicas and labelSelector parameter. #9248 - [CHANGE] Renamed
ingest_storage_ingester_autoscaling_replica_template_custom_resource_definition_enabled
toreplica_template_custom_resource_definition_enabled
. #9248 - [FEATURE] Add support for automatically deleting compactor, store-gateway, ingester and read-write mode backend PVCs when the corresponding StatefulSet is scaled down. #8382 #8736
- [FEATURE] Automatically set GOMAXPROCS on ingesters. #9273
- [ENHANCEMENT] Added the following config options to set the number of partition ingester replicas when migrating to experimental ingest storage. #8517
ingest_storage_migration_partition_ingester_zone_a_replicas
ingest_storage_migration_partition_ingester_zone_b_replicas
ingest_storage_migration_partition_ingester_zone_c_replicas
- [ENHANCEMENT] Distributor: increase
-distributor.remote-timeout
when the experimental ingest storage is enabled. #8518 - [ENHANCEMENT] Memcached: Update to Memcached 1.6.28 and memcached-exporter 0.14.4. #8557
- [ENHANCEMENT] Rollout-operator: Allow the rollout-operator to be used as Kubernetes statefulset webhook to enable
no-downscale
andprepare-downscale
annotations to be used on ingesters or store-gateways. #8743 - [ENHANCEMENT] Do not deploy ingester-zone-c when experimental ingest storage is enabled and
ingest_storage_ingester_zones
is configured to2
. #8776 - [ENHANCEMENT] Added the config option
ingest_storage_migration_classic_ingesters_no_scale_down_delay
to disable the downscale delay on classic ingesters when migrating to experimental ingest storage. #8775 #8873 - [ENHANCEMENT] Configure experimental ingest storage on query-frontend too when enabled. #8843
- [ENHANCEMENT] Allow to override Kafka client ID on a per-component basis. #9026
- [ENHANCEMENT] Rollout-operator's access to ReplicaTemplate is now configured via config option
rollout_operator_replica_template_access_enabled
. #9252 - [ENHANCEMENT] Added support for new way of downscaling ingesters, using rollout-operator's resource-mirroring feature and read-only mode of ingesters. This can be enabled by using
ingester_automated_downscale_v2_enabled
config option. This is mutually exclusive with bothingester_automated_downscale_enabled
(previous downscale mode) andingest_storage_ingester_autoscaling_enabled
(autoscaling for ingest-storage). - [ENHANCEMENT] Update rollout-operator to
v0.19.1
. #9388 - [BUGFIX] Added missing node affinity matchers to write component. #8910
Mimirtool
- [CHANGE] Analyze Rules: Count recording rules used in rules group as used. #6133
- [CHANGE] Remove deprecated
--rule-files
flag in favor of CLI arguments for the following commands: #8701mimirtool rules load
mimirtool rules sync
mimirtool rules diff
mimirtool rules check
mimirtool rules prepare
- [ENHANCEMENT] Remote read and backfill now supports the experimental native histograms. #9156
Mimir Continuous Test
- [CHANGE] Use test metrics that do not pass through 0 to make identifying incorrect results easier. #8630
- [CHANGE] Allowed authentication to Mimir using both Tenant ID and basic/bearer auth. #9038
- [FEATURE] Experimental support for the
-tests.send-chunks-debugging-header
boolean flag to send theX-Mimir-Chunk-Info-Logger: series_id
header with queries. #8599 - [ENHANCEMENT] Include human-friendly timestamps in diffs logged when a test fails. #8630
- [ENHANCEMENT] Add histograms to measure latency of read and write requests. #8583
- [ENHANCEMENT] Log successful test runs in addition to failed test runs. #8817
- [ENHANCEMENT] Series emitted by continuous-test now distribute more uniformly across ingesters. #9218 #9243
- [ENHANCEMENT] Configure
User-Agent
header for the Mimir client via-tests.client.user-agent
. #9338 - [BUGFIX] Initialize test result metrics to 0 at startup so that alerts can correctly identify the first failure after startup. #8630
Query-tee
- [CHANGE] If a preferred backend is configured, then query-tee always returns its response, regardless of the response status code. Previously, query-tee would only return the response from the preferred backend if it did not have a 5xx status code. #8634
- [ENHANCEMENT] Emit trace spans from query-tee. #8419
- [ENHANCEMENT] Log trace ID (if present) with all log messages written while processing a request. #8419
- [ENHANCEMENT] Log user agent when processing a request. #8419
- [ENHANCEMENT] Add
time
parameter to proxied instant queries if it is not included in the incoming request. This is optional but enabled by default, and can be disabled with-proxy.add-missing-time-parameter-to-instant-queries=false
. #8419 - [ENHANCEMENT] Add support for sending only a proportion of requests to all backends, with the remainder only sent to the preferred backend. The default behaviour is to send all requests to all backends. This can be configured with
-proxy.secondary-backends-request-proportion
. #8532 - [ENHANCEMENT] Check annotations emitted by both backends are the same when comparing responses from two backends. #8660
- [ENHANCEMENT] Compare native histograms in query results when comparing results between two backends. #8724
- [ENHANCEMENT] Don't consider responses to be different during response comparison if both backends' responses contain different series, but all samples are within the recent sample window. #8749 #8894
- [ENHANCEMENT] When the expected and actual response for a matrix series is different, the full set of samples for that series from both backends will now be logged. #8947
- [ENHANCEMENT] Wait up to
-server.graceful-shutdown-timeout
for inflight requests to finish when shutting down, rather than immediately terminating inflight requests on shutdown. #8985 - [ENHANCEMENT] Optionally consider equivalent error messages the same when comparing responses. Enabled by default, disable with
-proxy.require-exact-error-match=true
. #9143 #9350 #9366 - [BUGFIX] Ensure any errors encountered while forwarding a request to a backend (eg. DNS resolution failures) are logged. #8419
- [BUGFIX] The comparison of the results should not fail when either side contains extra samples from within SkipRecentSamples duration. #8920
- [BUGFIX] When
-proxy.compare-skip-recent-samples
is enabled, compare sample timestamps with the time the query requests were made, rather than the time at which the comparison is occurring. #9416
Documentation
- [ENHANCEMENT] Specify in which component the configuration flags
-compactor.blocks-retention-period
,-querier.max-query-lookback
,-query-frontend.max-total-query-length
,-query-frontend.max-query-expression-size-bytes
are applied and that they are applied to remote read as well. #8433 - [ENHANCEMENT] Provide more detailed recommendations on how to migrate from classic to native histograms. #8864
- [ENHANCEMENT] Clarify that
{namespace}
and{groupName}
path segments in the ruler config API should be URL-escaped. #8969 - [ENHANCEMENT] Include stalled compactor network drive information in runbooks. #9297
- [ENHANCEMENT] Document
/ingester/prepare-partition-downscale
and/ingester/prepare-instance-ring-downscale
endpoints. #9132 - [ENHANCEMENT] Describe read-only mode of ingesters in component documentation. #9132
Tools
- [CHANGE]
wal-reader
: Renamed-series-entries
to-print-series
. Renamed-print-series-with-samples
to-print-samples
. #8568 - [FEATURE]
query-bucket-index
: add new tool to query a bucket index file and print the blocks that would be used for a given query time range. #8818 - [FEATURE]
kafkatool
: add new CLI tool to operate Kafka. Supported commands: #9000brokers list-leaders-by-partition
consumer-group commit-offset
consumer-group copy-offset
consumer-group list-offsets
create-partitions
- [ENHANCEMENT]
wal-reader
: References to unknown series from Samples, Exemplars, histogram or tombstones records are now always logged. #8568 - [ENHANCEMENT]
tsdb-series
: added-stats
option to print min/max time of chunks, total number of samples and DPM for each series. #8420 - [ENHANCEMENT]
tsdb-print-chunk
: print counter reset information for native histograms. #8812 - [ENHANCEMENT]
grpcurl-query-ingesters
: print counter reset information for native histograms. #8820 - [ENHANCEMENT]
grpcurl-query-ingesters
: concurrently query ingesters. #9102 - [ENHANCEMENT]
grpcurl-query-ingesters
: sort series and chunks in output. #9180 - [ENHANCEMENT]
grpcurl-query-ingesters
: print full chunk timestamps, not just time component. #9180 - [ENHANCEMENT]
tsdb-series
: Added-json
option to generate JSON output for easier post-processing. #8844 - [ENHANCEMENT]
tsdb-series
: Added-min-time
and-max-time
options to filter samples that are used for computing data-points per minute. #8844 - [ENHANCEMENT]
mimir-rules-action
: Added new input to support matching target namespaces by regex. #9244 - [ENHANCEMENT]
mimir-rules-action
: Added new inputs to support ignoring namespaces and ignoring namespaces by regex. #9258 #9324
All changes in this release: mimir-2.13.0...mimir-2.14.0-rc.0