github grafana/tempo v3.0.0
Tempo v3.0.0

6 hours ago

Tempo 3.0 is a major release that completes the transition to the new ingest/write architecture, removes deprecated 2.x components, graduates TraceQL metrics to general availability, and adds migration tooling for Tempo 2.x users.

This release contains breaking configuration and deployment changes. Review the migration guide before upgrading, especially if you use legacy ingesters, legacy overrides, v2 blocks, OpenCensus, localblocks, or custom live-store/query-frontend settings.

Highlights

  • New ingest/write architecture replaces legacy ingesters.
  • TraceQL metrics is generally available; alerting on TraceQL metrics and the faster read path remain experimental.
  • vParquet5 improvements and faster metrics query paths.
  • Expanded metrics-generator controls for cardinality management.
  • Trace redaction support.
  • Span profiling support via otelpyroscope.
  • Live-store and block-builder correctness and observability fixes.

Useful links:

Features

  • Make individual AST transformations skippable via config and query hints by @stoewer in #7012
  • Add span profiling support via otelpyroscope. Enable with span_profiling: true or -span-profiling CLI flag to attach pprof labels to OTel spans by @simonswine in #7063
  • Add tempo-cli migrate config command for migrating Tempo 2.x configs to 3.0 by @mapno in #6982
  • jsonnet: Add KEDA-based horizontal pod autoscaling support for microservices deployment by @mapno in #6970
  • Add automemlimit support for automatic GOMEMLIMIT configuration. Enable with memory.automemlimit_enabled: true by @oleg-kozlyuk-grafana in #6313
  • Support comparison operators in TraceQL Metrics queries by @ruslan-mikhailov in #6474
  • metrics-generator: Add span filtering to service graphs through filter_policies by @javiermolinar in #6453
  • Add new include_any filter policy for spanmetrics filter by @javiermolinar in #6392
  • Add span_multiplier_key to overrides. This allows tenants to specify the attribute key used for span multiplier values to compensate for head-based sampling by @carles-grafana in #6260
  • metrics-generator: Add per-label limiter to control cardinality by @electron0zero in #6414
    • Adds max_cardinality_per_label per tenant override and new metrics to estimate per-label cardinality demand.
  • Add an extension mechanism for per-tenant overrides by @stoewer in #6758
  • Extend TraceRedactor interface to support hiding complete traces via ErrTraceHidden by @stoewer in #6811
  • Single-binary mode: push distributor local ingest directly to live-store and metrics-generator without Kafka by @javiermolinar in #6729
  • Add experimental drain limiter / span name sanitization to the metrics generator to reduce metrics cardinality by clustering similar span names by @Logiraptor in #6098

Enhancements

  • Query-frontend: split streamed search and metrics responses into smaller gRPC packets for better default client compatibility, and fix final streaming updates after metrics series limits are reached by @mdisibio in #6607 / #7239
  • live-store: lock-free block reads via atomic.Pointer[blockSnapshot]; block deletion is two-phase and crash-safe by @zhxiaogg in #7132
  • Support OR conditions for tag name and tag value autocomplete (search tags v2) by @ie-pham in #6827
  • Expose MinIO retry settings via S3 config by @rwhitty in #6561
  • Reduce default livestore WAL size and align query defaults: max_block_duration 1m to 30s, max_block_bytes 100MiB to 50MiB, complete_block_timeout 1h to 20m, metrics query_backend_after 30m to 15m by @zhxiaogg in #6974
  • Enable native histogram emission for all promauto-registered histograms, including tempo_request_duration_seconds. Both classic and native formats are emitted simultaneously; existing scrapers are unaffected by @zalegrala in #6910
  • tempo-cli: Add --header flag to query api commands for custom headers by @Nouuu in #6768
  • tempo-cli: add redact command to submit trace redaction jobs to the backend scheduler by @zalegrala in #6832
  • Block builder: deduplicate spans within traces during block creation and track removed duplicates via tempo_block_builder_spans_deduped_total metric by @zhxiaogg in #6539
  • metrics-generator: Support extracting span multiplier from W3C tracestate OTel probability sampling threshold via enable_tracestate_span_multiplier config option by @csmarchbanks in #6684
  • Add new alerts and runbook entries by @javiermolinar in #6276
  • Double the maximum number of dedicated string columns in vParquet5 and update tempo-cli to determine the optimum number for the data by @mdisibio in #6282
  • TraceQL metrics: experimental faster read path for most metrics queries, accessible behind the query hint spanonly_fetch=true when unsafe_query_hints is enabled by @mdisibio in #6359
  • TraceQL metrics: add new per-tenant override to opt in or opt out of the new experimental faster read path for most metrics queries by @mdisibio in #6849
  • Vulture: extend data consistency checks to include more strings, integers, and blobs, at resource/span/event scopes, and perform deeper trace content checks by @mdisibio in #6731
  • Improve attribute truncating observability by @javiermolinar in #6400
  • Log truncated oversized attributes by @carles-grafana in #6467
  • livestore: make trace_too_large log line an insight by @carles-grafana in #6371
  • Remove live-store partition owner from ring on shutdown to prevent stale owner entries by @oleg-kozlyuk-grafana in #6409
  • Improved live store readiness check and added readiness_target_lag and readiness_max_wait config parameters. Live store will now, if readiness_target_lag is set, not report /ready until Kafka lag is brought under the specified value by @oleg-kozlyuk-grafana and @ruslan-mikhailov in #6238 and #6405
  • Expose a new histogram metric to track the jobs per query distribution by @javiermolinar in #6343
  • Do deep validation for filter policies in user configurable overrides API by @electron0zero in #6407
  • Allow span_name_sanitization to be set via user-configurable overrides API by @Logiraptor in #6411
  • Add fail_on_high_lag parameter to allow live-store to fail if it is lagged by @ruslan-mikhailov and @carles-grafana in #6363, #6567, and #7066
  • Add support for per-tenant left-padding of trace IDs by @mapno in #6489
  • Add new metric for generator ring size: tempo_distributor_metrics_generator_tenant_ring_size by @zalegrala in #5686
  • Remove explicit runtime.GC() calls in vParquet5 compactor/block creation and CLI by @oleg-kozlyuk-grafana in #6603
  • Reduce allocations in extendReuseSlice growth path during WAL writes and block creation by @mapno in #6863
  • Implemented anti-affinity for pods in same livestore zone by @zhxiaogg in #6757
  • Livestore: skipped WAL complete op during shutdown by @zhxiaogg in #6839
  • Add metric to track livestore block cut reasons by @zhxiaogg in #6922
  • Enable async parquet read mode for WAL completion path by @zhxiaogg in #6967
  • metrics-generator: add leave_consumer_group_on_shutdown to send LeaveGroup on shutdown for immediate partition reassignment instead of waiting for session timeout by @zalegrala in #6575

Breaking Changes

Tempo 3.0 includes several operator-impacting removals and default changes. Review the migration guide before upgrading.

  • The legacy ingester-based write path has been removed. Deployments must migrate to the 3.0 ingest/write architecture.
  • Removed v2 block encoding and the legacy v2 compactor component by @joe-elliott in #6273
    • Removed v2-specific CLI commands: list block, list index, view index, gen index, and gen bloom.
  • Centralized block and WAL config: block_builder and live_store now always use storage.trace.block settings; per-module block config fields are removed by @stoewer in #6647
  • Recent data queries now guarantee complete results by failing when an instance is lagging. Defaults query_frontend.query_end_cutoff to 30s and live_store.fail_on_high_lag to true by @mapno in #7210 / #7232
  • Disabled legacy (flat, unscoped) overrides by default. Tempo refuses to start if legacy overrides are detected. Set enable_legacy_overrides: true or -config.enable-legacy-overrides=true to opt back in temporarily. Legacy overrides will be removed in a future release by @electron0zero in #6741
  • User-configurable overrides config metrics_generator.processors no longer merges with runtime overrides. metrics_generator.processors now takes precedence over runtime overrides, matching every other config in user-configurable overrides. Setting processors: [] disables all processors for the tenant by @electron0zero in #7176 / #7185
  • Enabled RetryInfo by default. distributor.retry_after_on_resource_exhausted now defaults to 5s (was 0) so OTLP clients receive a retry hint on ResourceExhausted errors by @electron0zero in #7088
    • Set to 0 to disable cluster-wide, or set the per-tenant override ingestion.retry_info_enabled: false to disable for a single tenant.
  • Removed duplicate compaction prefix from CompactorConfig CLI flags by @electron0zero in #6909
    • compaction.compaction.block-retentioncompaction.block-retention
    • compaction.compaction.max-objects-per-blockcompaction.max-objects-per-block
    • compaction.compaction.max-block-bytescompaction.max-block-bytes
    • compaction.compaction.compaction-windowcompaction.compaction-window
  • Removed OpenCensus receiver by @javiermolinar in #6523
  • Removed legacy mem-ballast-size-mbs CLI flag by @orkhan-huseyn in #6403
  • tempo-cli: Support relative time (now, now-1h) for start/end args and standardize on RFC3339 in all commands by @electron0zero in #6458
    • query search command no longer accepts timestamps without timezone, for example 2024-01-01T00:00:00; use RFC3339, for example 2024-01-01T00:00:00Z, or relative time instead.
  • Consolidated read configuration for recent data cutoff. query_frontend.search.query_ingesters_until is removed in favor of only query_frontend.search.query_backend_after by @mapno in #6507
  • Removed deprecated querier.query_live_store config. This field must be removed from configs on upgrade by @javiermolinar in #7048
  • Optimized TraceQL AST by rewriting conditions on the same attribute to their array equivalent by @stoewer in #6353
    • Slightly changes the array matching semantics of != and !~ operators and introduces stricter rules for regex literals.
  • Removed span-metrics leftovers and lazy-init generator clients by @javiermolinar in #6618
  • Decommissioned livestore MetricsGenerator query service by @javiermolinar in #6615
  • Removed metrics-generator localblocks processor and related local block storage plumbing by @javiermolinar in #6555
  • SpanMetricsSummary is removed and querier code simplified by @javiermolinar in #6496 and #6510
  • Set the all target to be 3.0-compatible and removed the scalable-single-binary target by @joe-elliott in #6283
  • Cleaned up enterprise jsonnet by @javiermolinar in #6505

Changes

  • Stop publishing 32-bit ARM binary archives. Release artifacts continue to include amd64 and arm64 binaries by @javiermolinar in #7106
  • Upgrade Tempo to Go 1.26.2 by @stoewer in #6443
  • Allow duplicate dimensions for span metrics and service graphs. This is a valid use case if using different instrumentation libraries, with spans having deployment.environment and others deployment_environment, for example by @carles-grafana in #6288
  • Update default max duration for TraceQL metrics queries up to one day by @javiermolinar in #6285
  • Set TraceQL query metrics checks by default in Vulture by @javiermolinar in #6275
  • Make Tempo single-binary example use the local backend by @javiermolinar in #7033
  • Bump ingestion limits by @javiermolinar in #7034
  • Bumped release examples and generated operations config to grafana/tempo:3.0.0 by @javiermolinar in #7291
  • TraceQL metrics: change default step intervals to align with new vParquet5 timestamp columns by @mdisibio in #6413
  • Remove all traces of ingesters from the dashboards by @javiermolinar in #6352
  • jsonnet: Add emptyDir data volume to block-builder StatefulSet by @mapno in #6648
  • Add quick checks to tempo mixin runbook by @javiermolinar in #6696
  • Deprecate metrics-generator no-local-blocks by @javiermolinar in #6707
  • Own local block and partition ring helpers by @javiermolinar in #6808
  • Track invalid trace and span id discards by @javiermolinar in #6799
  • Deprecate query_frontend.rf1_after and query all blocks regardless of replication factor for non-metrics paths. Simplifies 2.x to 3.0 migration by @mapno in #6969
  • Flush blocks to backend storage from the Live store in single binary mode by @javiermolinar in #6941
  • Remove stale config from the examples by @javiermolinar in #6980
  • tempo-cli: Rewrite migrate overrides-config and add migrate overrides-per-tenant command to help migrate legacy flat overrides to the new scoped format by @electron0zero in #6793
  • Decouple livestore from metrics-generator by @javiermolinar in #6506 and #6535
  • Expose OTLP HTTP and gRPC ports for Docker examples by @javiermolinar in #6296

Security fixes

  • Updated golang.org/x/net to v0.55.0 by @renovate-sh-app[bot] in #7267
  • Updated golang.org/x/crypto to v0.52.0 by @renovate-sh-app[bot] in #7266
  • Updated github.com/apache/thrift to v0.23.0 by @renovate-sh-app[bot] in #7265
  • Fixed division by zero error in TraceQL expressions by @Proximyst in #6580
  • Fixed intPow hanging for certain inputs by @Proximyst in #6581
  • Fixed integer overflow in query parameters by using strconv.ParseUint instead of strconv.Atoi/strconv.ParseInt for unsigned integer fields by @ricardbejarano in #6612

Bugfixes

  • Fixed live-store panic in legacy tag-based search against vParquet5 blocks when matching attributes stored in integer dedicated columns by @mdisibio in #7135
  • Fixed livestore panic handling in iterateBlocks per-block paths so malformed parquet values return an error instead of crashing the process by @zhxiaogg in #7134
  • Fixed tempo-vulture ignoring -tempo-push-tls flag in normal operating mode by @zachfi in #6976
  • livestore: check readiness before lag for SearchRecent and QueryRange queries by @zhxiaogg in #6911
  • Fix live-store SearchTagValuesV2 disk cache never being populated on complete blocks by @mapno in #6858
  • Fix dedicated columns fallback in block_builder and live_store to use storage.trace.block.parquet_dedicated_columns when not set via overrides by @stoewer in #6647
  • Force live-store to rehydrate from Kafka lookback period when local data is missing, such as PVC wipe or new node, instead of resuming from the committed consumer group offset by @oleg-kozlyuk-grafana in #6428
  • Fix reload of span_name_sanitization overrides during runtime by @electron0zero in #6435
  • live store honors the config options for block and WAL versions by @mdisibio in #6509
  • block builder honors the global storage block config for block and WAL versions by @Harry-kp in #6532
  • Normalize allowlist headers when building the allowlist map by @javiermolinar in #6481
  • Fix bug related to dedicated column filtering by @stoewer in #6586
  • Fix compactor deduped spans metric using the wrong type, gauge instead of counter, by @bejaratommy in #6576
  • metrics-generator: Fix active-series counter underflow in local series limiter when overflow series are deleted by @carles-grafana in #6568
  • Skip per-label limiter and sanitizer for target_info and host_info metrics in metrics-generator by @electron0zero in #6660
  • Fix incorrect search results for some queries on new blob columns by @mdisibio in #6815
  • Fix vParquet5 buffer-reuse bug where event attributes in dedicated columns could be persisted on additional spans and events by @mdisibio in #6914
  • Fix race condition where remove_owner_on_shutdown flag was set too late, causing the partition owner to remain in the ring by @oleg-kozlyuk-grafana in #6693
  • Return 400 instead of 500 when query_range or query_instant requests have unparseable start/end parameters by @ruslan-mikhailov in #6694
  • Correct block-builder fetch metrics to use counters instead of gauges by @WinterCabbage in #6578
  • Log tenant on receiver push errors by @javiermolinar in #6780
  • Fix race conditions in WAL block by @ruslan-mikhailov in #6773
  • metrics-generator: Fix target_info being skipped when resource attributes have empty values by @carles-grafana in #6774
  • metrics-generator: Drain old series on metric replacement to prevent limiter leak and permanent overflow by @carles-grafana in #6653
  • live-store: fixed unsuccessful deregistering from membership/partition rings during shutdown by @zhxiaogg in #6848
  • Respect context cancellation when reading WAL block iterator by @zhxiaogg in #6928
  • Complete lifecycler shutdown on errors by @javiermolinar in #6906
  • livestore: fix concurrent WAL writes from periodic and shutdown flushes by @zhxiaogg in #6972
  • live-store: fix race conditions for tag values endpoint by @ruslan-mikhailov in #7000
  • live-store: correct backoff duration calculation by @ruslan-mikhailov in #6999
  • vulture: fix for recent traces when query_end_cutoff is enabled by @ruslan-mikhailov in #7018
  • Fix live-store producing WAL blocks exceeding max_block_bytes when flushing large batches of idle traces by @ruslan-mikhailov in #6971
  • live-store: skip lookback replay when partition is Inactive during scaling down by @zhxiaogg in #7101

New Contributors

Thanks to the following first-time contributors:

Full Changelog: v2.10.5...v3.0.0

Don't miss a new tempo release

NewReleases is sending notifications on new releases.