Breaking changes
- After this rollout the distributors will use a new API endpoint on the ingesters to push spans. Please rollout all ingesters before rolling the
distributors to prevent downtime. Also, during this period, the ingesters will use considerably more resources and should be scaled up (or
incoming traffic should be heavily throttled). Once all distributors and ingesters have rolled performance will return to normal. Internally we
have observed ~1.5x CPU load on the ingesters during the rollout. #1227 (@joe-elliott) - Querier options related to search have moved under a
search
block: #1350 (@joe-elliott)
becomesquerier: search_query_timeout: 30s search_external_endpoints: [] search_prefer_self: 2
querier: search: query_timeout: 30s prefer_self: 2 external_endpoints: []
- Dropped
tempo-search-retention-duration
parameter on the vulture. #1297 (@joe-elliott)
New Features and Enhancements
- [FEATURE] Added metrics-generator: an optional components to generate metrics from ingested traces #1282 (@mapno, @kvrhdn)
- [ENHANCEMENT] v2 object encoding added. This encoding adds a start/end timestamp to every record to reduce proto marshalling and increase search speed. #1227 (@joe-elliott)
- [ENHANCEMENT] Allow the compaction cycle to be configurable with a default of 30 seconds #1335 (@willdot)
- [ENHANCEMENT] Add new config options for setting GCS metadata on new objects #1368 (@zalegrala)
- [ENHANCEMENT] Add new scaling alerts to the tempo-mixin #1292 (@mapno)
- [ENHANCEMENT] Improve serverless handler error messages #1305 (@joe-elliott)
- [ENHANCEMENT] Added a configuration option
search_prefer_self
to allow the queriers to do some work while also leveraging serverless in search. #1307 (@joe-elliott) - [ENHANCEMENT] Make trace combination/compaction more efficient #1291 (@mdisibio)
- [ENHANCEMENT] Add Content-Type headers to query-frontend paths #1306 (@wperron)
- [ENHANCEMENT] Partially persist traces that exceed
max_bytes_per_trace
during compaction #1317 (@joe-elliott) - [ENHANCEMENT] Make search respect per tenant
max_bytes_per_trace
and addedskippedTraces
to returned search metrics. #1318 (@joe-elliott) - [ENHANCEMENT] Added tenant ID (instance ID) to
trace too large message
. #1385 (@cristiangsp) - [ENHANCEMENT] Add a startTime and endTime parameter to the Trace by ID Tempo Query API to improve query performance #1388 (@sagarwala, @bikashmishra100, @ashwinidulams)
- [ENHANCEMENT] Add hedging to queries to external endpoints. #1350 (@joe-elliott)
New config options and defaults:querier: search: external_hedge_requests_at: 5s external_hedge_requests_up_to: 3
- [ENHANCEMENT] Add a startTime and endTime parameter to the Trace by ID Tempo Query API to improve query performance #1388 (@sagarwala, @bikashmishra100, @ashwinidulams)
Bug Fixes
- [BUGFIX] Correct issue where Azure "Blob Not Found" errors were sometimes not handled correctly #1390 (@joe-elliott)
- [BUGFIX] Enable compaction and retention in Tanka single-binary #1352 (@irizzant)
- [BUGFIX] Fixed issue when query-frontend doesn't log request details when request is cancelled #1136 (@adityapwr)
- [BUGFIX] Update OTLP port in examples (docker-compose & kubernetes) from legacy ports (55680/55681) to new ports (4317/4318) #1294 (@mapno)
- [BUGFIX] Fixes min/max time on blocks to be based on span times instead of ingestion time. #1314 (@joe-elliott)
- Includes new configuration option to restrict the amount of slack around now to update the block start/end time. #1332 (@joe-elliott)
storage: trace: wal: ingestion_time_range_slack: 2m0s
- Includes a new metric to determine how often this range is exceeded:
tempo_warnings_total{reason="outside_ingestion_time_slack"}
- Includes new configuration option to restrict the amount of slack around now to update the block start/end time. #1332 (@joe-elliott)
- [BUGFIX] Prevent data race / ingester crash during searching by trace id by using xxhash instance as a local variable. #1387 (@bikashmishra100, @sagarwala, @ashwinidulams)
- [BUGFIX] Fix spurious "failed to mark block compacted during retention" errors #1372 (@mdisibio)
- [BUGFIX] Fix error message "Writer is closed" by resetting compression writer correctly on the error path. #1379 (@annanay25)
Other Changes
- [CHANGE] Vulture now exercises search at any point during the block retention to test full backend search. #1297 (@joe-elliott)
- [CHANGE] Updated storage.trace.pool.queue_depth default from 200->10000. #1345 (@joe-elliott)
- [CHANGE] Updated flags
-storage.trace.azure.storage-account-name
and-storage.trace.s3.access_key
to no longer to be considered as secrets #1356 (@simonswine)