Fixes
- Use non-exclusive lock for head conversion. Conversion is long operation with disk writes. It is read-only for rotated head, so queries may be done in parallel.
Features
- Added feature flag
head_default_number_of_shards
to adjust the number of shards (default is 2). Increasing the number of shards improves write operations while potentially slightly slowing down read operations and increasing memory consumption. This feature flag is temporary and will be removed in favor of automatic shard count calculation in the future. - Introduced a two-stage process for series selection queries by matchers. The first stage parses the regular expression using prefix trees from the index, which executes quickly but requires locks on the index during its execution. The second stage handles posting operations, which are resource-intensive due to data decoding and set operations on series IDs. By separating these stages, write locking time is reduced and read parallelism is increased since posting operations can use lightweight snapshot states without blocking appends.
- Implemented optimistic non-exclusive relabeling locks for data updates. Since new series appear infrequently, if all data in a append operation is already cached in relabeling, that stage does not lock the series container or indexes. Exclusive locking only occurs when new data must be added. This mechanism works only when intra-shard parallelization is enabled (disabled by default).
- Added a mechanism for executing tasks on a specific shard instead of all shards. This capability is essential for upcoming performance improvements.
Enhancements
- Added metrics tracking the waiting time for locks and head rotations. These metrics improve observability of internal delays and contention, enabling better diagnostics and tuning opportunities.
- Moved lock management inside task execution rather than across the entire task duration depending on task type. This change can yield slight performance improvements when intra-shard parallelization is enabled by reducing unnecessary lock holding time.
- Small performance fixes. In several parts of code there are bytes to string conversions. In some places it was not safe. In all places it was not optimal.
- Eliminate head allocations in original TSDB. Prometheus TSDB used only as historical block querier and compactor. It is not necessary to allocate any buffers in it's head.