github grafana/tempo v3.0.0-rc.1

pre-release8 hours ago

Tempo v3.0.0-rc.1 Release Notes

Tempo 3.0 is a major release candidate focused on the new ingest/write architecture, removal of deprecated 2.x components, migration tooling, TraceQL metrics improvements, and live-store/block-builder correctness and observability fixes.

Breaking Changes

  • Remove duplicate "compaction" prefix from CompactorConfig CLI flags. Affected flags: compaction.block-retention, compaction.max-objects-per-block, compaction.max-block-bytes, compaction.compaction-window by @electron0zero in #6909
  • Enable RetryInfo by default. distributor.retry_after_on_resource_exhausted now defaults to 5s (was 0) so OTLP clients receive a retry hint on ResourceExhausted errors by @electron0zero in #7088
    Set to 0 to disable cluster-wide, or set the per-tenant override ingestion.retry_info_enabled: false to disable for a single tenant.
  • Centralize block and WAL config: block_builder and live_store now always use storage.trace.block settings; per-module block config fields are removed by @stoewer in #6647
  • Remove Opencensus receiver by @javiermolinar in #6523
  • Remove legacy mem-ballast-size-mbs cli flag by @orkhan-huseyn in #6403
  • tempo-cli: Support relative time (now, now-1h) for start/end args and standardize on RFC3339 in all commands by @electron0zero in #6458
    query search command no longer accepts timestamps without timezone (e.g. 2024-01-01T00:00:00), use RFC3339 (e.g. 2024-01-01T00:00:00Z) or relative time instead.
  • Consolidate read configuration for recent data cutoff. query_frontend.search.query_ingesters_until is removed in favor of only query_frontend.search.query_backend_after by @mapno in #6507
  • Remove deprecated querier.query_live_store config. This field must be removed from configs on upgrade by @javiermolinar in #7048
  • Optimize TraceQL AST by rewriting conditions on the same attribute to their array equivalent by @stoewer in #6353
    Slightly changes the array matching semantics of != and !~ operators and introduces stricter rules for regex literals.
  • Remove partition ring livestore config by @javiermolinar in #6981
  • Remove ingester module by @javiermolinar in #6959
  • Remove ingest.enabled config by @javiermolinar in #6873
  • Disable legacy (flat, unscoped) overrides by default. Tempo will refuse to start if legacy overrides are detected. Set enable_legacy_overrides: true or -config.enable-legacy-overrides=true to opt back in temporarily. Legacy overrides will be removed in a future release by @electron0zero in #6741
  • Remove remaining app ingester config by @javiermolinar in #6667
  • Remove span-metrics leftovers and lazy-init generator clients by @javiermolinar in #6618
  • Decommission livestore MetricsGenerator query service by @javiermolinar in #6615
  • Remove metrics-generator localblocks processor and related local block storage plumbing by @javiermolinar in #6555
  • Remove ingesters by @javiermolinar in #6504
  • Remove ingesters and compactor alerts by @javiermolinar in #6369
  • Removed v2 block encoding and compactor component by @joe-elliott in #6273
    This includes the removal of the following CLI commands which were v2 specific: list block, list index, view index, gen index, gen bloom.
  • SpanMetricsSummary is removed and querier code simplified by @javiermolinar in #6496 and #6510
  • Sets the all target to be 3.0 compatible and removes the scalable-single-binary target by @joe-elliott in #6283
  • Clean up enterprise jsonnet by @javiermolinar in #6505

Changes

  • Stop publishing 32-bit ARM binary archives. Release artifacts continue to include amd64 and arm64 binaries by @javiermolinar in #7106
  • Upgrade Tempo to Go 1.26.0 by @stoewer in #6443
  • Allow duplicate dimensions for span metrics and service graphs. This is a valid use case if using different instrumentation libraries, with spans having "deployment.environment" and others "deployment_environment", for example by @carles-grafana in #6288
  • Update default max duration for TraceQL metrics queries up to one day by @javiermolinar in #6285
  • Set TraceQL query metrics checks by default in Vulture by @javiermolinar in #6275
  • Make Tempo single-binary example use the local backend by @javiermolinar in #7033
  • Bump ingestion limits by @javiermolinar in #7034
  • TraceQL metrics - change default step intervals to align with new vParquet5 timestamp columns by @mdisibio in #6413
  • Remove all traces of ingesters from the dashboards by @javiermolinar in #6352
  • jsonnet: Add emptyDir data volume to block-builder StatefulSet by @mapno in #6648
  • Add quick checks to tempo mixin runbook by @javiermolinar in #6696
  • Deprecate metrics-generator no-local-blocks by @javiermolinar in #6707
  • Own local block and partition ring helpers by @javiermolinar in #6808
  • Track invalid trace and span id discards by @javiermolinar in #6799
  • Deprecate query_frontend.rf1_after and query all blocks regardless of replication factor for non-metrics paths. Simplifies 2.x to 3.0 migration by @mapno in #6969
  • Flush blocks to backend storage from the Live store in single binary mode by @javiermolinar in #6941
  • Remove stale config from the examples by @javiermolinar in #6980
  • tempo-cli: Rewrite migrate overrides-config and add migrate overrides-per-tenant command to help migrate legacy flat overrides to the new scoped format by @electron0zero in #6793
  • Decouple livestore from metrics-generator by @javiermolinar in #6506 and #6535
  • Expose otlp http and grpc ports for Docker examples by @javiermolinar in #6296

Features

  • Add span profiling support via otelpyroscope. Enable with span_profiling: true (or -span-profiling CLI flag) to attach pprof labels to OTel spans by @simonswine in #7063
  • Add tempo-cli migrate config command for migrating Tempo 2.x configs to 3.0 by @mapno in #6982
  • jsonnet: Add KEDA-based horizontal pod autoscaling support for microservices deployment by @mapno in #6970
  • Add automemlimit support for automatic GOMEMLIMIT configuration. Enable with memory.automemlimit_enabled: true by @oleg-kozlyuk-grafana in #6313
  • Support comparison operators in TraceQL Metrics queries by @ruslan-mikhailov in #6474
  • metrics-generator: Add span filtering to service graphs through filter_policies by @javiermolinar in #6453
  • Add new include_any filter policy for spanmetrics filter by @javiermolinar in #6392
  • Add span_multiplier_key to overrides. This allows tenants to specify the attribute key used for span multiplier values to compensate for head-based sampling by @carles-grafana in #6260
  • metrics-generator: Add per-label limiter to control cardinality by @electron0zero in #6414
    Adds max_cardinality_per_label per tenant override and new metrics to estimate per label cardinality demand estimate.
  • Add an extension mechanism for per-tenant overrides by @stoewer in #6758
  • Extend TraceRedactor interface to support hiding complete traces via ErrTraceHidden by @stoewer in #6811
  • Single-binary mode: push distributor local ingest directly to live-store and metrics-generator without Kafka by @javiermolinar in #6729

Enhancements

  • Support OR conditions for tag name and tag value autocomplete (search tags v2) by @ie-pham in #6827
  • Expose MinIO retry settings via S3 config by @rwhitty in #6561
  • Reduce default livestore WAL size and align query defaults: max_block_duration 1m to 30s, max_block_bytes 100MiB to 50MiB, complete_block_timeout 1h to 20m, metrics query_backend_after 30m to 15m by @zhxiaogg in #6974
  • Enable native histogram emission for all promauto-registered histograms, including tempo_request_duration_seconds. Both classic and native formats are emitted simultaneously; existing scrapers are unaffected by @zalegrala in #6910
  • tempo-cli: Add --header flag to query api commands for custom headers by @Nouuu in #6768
  • tempo-cli: add redact command to submit trace redaction jobs to the backend scheduler by @zalegrala in #6832
  • Block builder: deduplicate spans within traces during block creation and track removed duplicates via tempo_block_builder_spans_deduped_total metric by @zhxiaogg in #6539
  • metrics-generator: Support extracting span multiplier from W3C tracestate OTel probability sampling threshold via enable_tracestate_span_multiplier config option by @csmarchbanks in #6684
  • Add new alerts and runbooks entries by @javiermolinar in #6276
  • Double the maximum number of dedicated string columns in vParquet5 and update tempo-cli to determine the optimum number for the data by @mdisibio in #6282
  • TraceQL metrics - experimental faster read path for most metrics queries, accessible behind the query hint spanonly_fetch=true when unsafe_query_hints is enabled by @mdisibio in #6359
  • TraceQL metrics - add new per-tenant override to opt-in or opt-out of the new experimental faster read path for most metrics queries by @mdisibio in #6849
  • Vulture: extend data consistency checks to include more strings, integers, and blobs, at resource/span/event scopes, and perform deeper trace content check by @mdisibio in #6731
  • Improve attribute truncating observability by @javiermolinar in #6400
  • Log truncated oversized attributes by @carles-grafana in #6467
  • livestore: make trace_too_large log line an insight by @carles-grafana in #6371
  • Remove live-store partition owner from ring on shutdown to prevent stale owner entries by @oleg-kozlyuk-grafana in #6409
  • Improved live store readiness check and added readiness_target_lag and readiness_max_wait config parameters. Live store will now - if readiness_target_lag is set - not report /ready until Kafka lag is brought under the specified value by @oleg-kozlyuk-grafana and @ruslan-mikhailov in #6238 and #6405
  • Expose a new histogram metric to track the jobs per query distribution by @javiermolinar in #6343
  • Do deep validation for filter policies in user configurable overrides API by @electron0zero in #6407
  • Allow span_name_sanitization to be set via user-configurable overrides API by @Logiraptor in #6411
  • Add fail_on_high_lag parameter to allow live-store to fail if it is lagged by @ruslan-mikhailov and @carles-grafana in #6363, #6567 and #7066
  • Add support for per-tenant left-padding of trace IDs by @mapno in #6489
  • Add new metric for generator ring size: tempo_distributor_metrics_generator_tenant_ring_size by @zalegrala in #5686
  • Remove explicit runtime.GC() calls in vParquet5 compactor/block creation and CLI by @oleg-kozlyuk-grafana in #6603
  • Reduce allocations in extendReuseSlice growth path during WAL writes and block creation by @mapno in #6863
  • Implemented anti-affinity for pods in same livestore zone by @zhxiaogg in #6757
  • Livestore: skipped WAL complete op during shutdown by @zhxiaogg in #6839
  • Add metric to track livestore block cut reasons by @zhxiaogg in #6922
  • Enable async parquet read mode for WAL completion path by @zhxiaogg in #6967
  • metrics-generator: add leave_consumer_group_on_shutdown to send LeaveGroup on shutdown for immediate partition reassignment instead of waiting for session timeout by @zalegrala in #6575

Bugfixes

  • Fix tempo-vulture ignoring -tempo-push-tls flag in normal operating mode by @zachfi in #6976
  • livestore: check readiness before lag for SearchRecent and QueryRange queries by @zhxiaogg in #6911
  • Fix integer overflow in query parameters by using strconv.ParseUint instead of strconv.Atoi/strconv.ParseInt for unsigned integer fields by @ricardbejarano in #6612
  • Fix live-store SearchTagValuesV2 disk cache never being populated on complete blocks by @mapno in #6858
  • Fix dedicated columns fallback in block_builder and live_store to use storage.trace.block.parquet_dedicated_columns when not set via overrides by @stoewer in #6647
  • Force live-store to rehydrate from Kafka lookback period when local data is missing (e.g. PVC wipe, new node) instead of resuming from the committed consumer group offset by @oleg-kozlyuk-grafana in #6428
  • fix: reload span_name_sanitization overrides during runtime by @electron0zero in #6435
  • fix: live store honor the config options for block and WAL versions by @mdisibio in #6509
  • fix: block builder honor the global storage block config for block and WAL versions by @Harry-kp in #6532
  • fix: normalize allowlist headers when building the allowlist map by @javiermolinar in #6481
  • fix: bug related to dedicated column filtering by @stoewer in #6586
  • fix: compactor deduped spans metric uses wrong type (gauge instead of counter) by @bejaratommy in #6576
  • metrics-generator: Fix active-series counter underflow in local series limiter when overflow series are deleted by @carles-grafana in #6568
  • fix: skip per-label limiter and sanitizer for target_info and host_info metrics in metrics-generator by @electron0zero in #6660
  • fix(traceql): err on division by zero by @Proximyst in #6580
  • fix(traceql): stop intPow from hanging by @Proximyst in #6581
  • fix(traceql): Fix incorrect search results for some queries on new blob columns by @mdisibio in #6815
  • fix(vparquet5) Fix buffer-reuse bug where event attributes in dedicated columns could be persisted on additional spans and events by @mdisibio in #6914
  • fix: race condition where remove_owner_on_shutdown flag was set too late — after context cancellation already triggered the lifecycler's shutdown, causing the partition owner to remain in the ring by @oleg-kozlyuk-grafana in #6693
  • Return 400 instead of 500 when query_range or query_instant requests have unparseable start/end parameters by @ruslan-mikhailov in #6694
  • fix: correct block-builder fetch metrics to use counters instead of gauges by @WinterCabbage in #6578
  • Log tenant on receiver push errors by @javiermolinar in #6780
  • Fix race conditions in WAL block by @ruslan-mikhailov in #6773
  • metrics-generator: Fix target_info being skipped when resource attributes have empty values by @carles-grafana in #6774
  • metrics-generator: Drain old series on metric replacement to prevent limiter leak and permanent overflow by @carles-grafana in #6653
  • live-store: fixed unsuccessful deregistering from membership/partition rings during shutdown by @zhxiaogg in #6848
  • fix: respect context cancellation when reading WAL block iterator by @zhxiaogg in #6928
  • Complete lifecycler shutdown on errors by @javiermolinar in #6906
  • livestore: fix concurrent WAL writes from periodic and shutdown flushes by @zhxiaogg in #6972
  • live-store: fix race conditions for tag values endpoint by @ruslan-mikhailov in #7000
  • live-store: correct backoff duration calculation by @ruslan-mikhailov in #6999
  • vulture: fix for recent traces when query_end_cutoff is enabled by @ruslan-mikhailov in #7018
  • Fix live-store producing WAL blocks exceeding max_block_bytes when flushing large batches of idle traces by @ruslan-mikhailov in #6971
  • live-store: skip lookback replay when partition is Inactive during scaling down by @zhxiaogg in #7101

New Contributors

Thanks to the following first-time contributors:

Full Changelog: v2.10.0-rc.0...v3.0.0-rc.1

Don't miss a new tempo release

NewReleases is sending notifications on new releases.