Tempo 3.0 is a major release that completes the transition to the new ingest/write architecture, removes deprecated 2.x components, graduates TraceQL metrics to general availability, and adds migration tooling for Tempo 2.x users.
This release contains breaking configuration and deployment changes. Review the migration guide before upgrading, especially if you use legacy ingesters, legacy overrides, v2 blocks, OpenCensus, localblocks, or custom live-store/query-frontend settings.
Highlights
- New ingest/write architecture replaces legacy ingesters.
- TraceQL metrics is generally available; alerting on TraceQL metrics and the faster read path remain experimental.
- vParquet5 improvements and faster metrics query paths.
- Expanded metrics-generator controls for cardinality management.
- Trace redaction support.
- Span profiling support via otelpyroscope.
- Live-store and block-builder correctness and observability fixes.
Useful links:
- Release notes: https://grafana.com/docs/tempo/latest/release-notes/v3-0/
- Migration guide: https://grafana.com/docs/tempo/latest/set-up-for-tracing/setup-tempo/migrate-to-3/
- Upgrade guide: https://grafana.com/docs/tempo/latest/set-up-for-tracing/setup-tempo/upgrade/
- Changelog: https://github.com/grafana/tempo/blob/v3.0.0/CHANGELOG.md#v300
Features
- Make individual AST transformations skippable via config and query hints by @stoewer in #7012
- Add span profiling support via otelpyroscope. Enable with
span_profiling: trueor-span-profilingCLI flag to attach pprof labels to OTel spans by @simonswine in #7063 - Add
tempo-cli migrate configcommand for migrating Tempo 2.x configs to 3.0 by @mapno in #6982 - jsonnet: Add KEDA-based horizontal pod autoscaling support for microservices deployment by @mapno in #6970
- Add automemlimit support for automatic GOMEMLIMIT configuration. Enable with
memory.automemlimit_enabled: trueby @oleg-kozlyuk-grafana in #6313 - Support comparison operators in TraceQL Metrics queries by @ruslan-mikhailov in #6474
- metrics-generator: Add span filtering to service graphs through
filter_policiesby @javiermolinar in #6453 - Add new
include_anyfilter policy for spanmetrics filter by @javiermolinar in #6392 - Add
span_multiplier_keyto overrides. This allows tenants to specify the attribute key used for span multiplier values to compensate for head-based sampling by @carles-grafana in #6260 - metrics-generator: Add per-label limiter to control cardinality by @electron0zero in #6414
- Adds
max_cardinality_per_labelper tenant override and new metrics to estimate per-label cardinality demand.
- Adds
- Add an extension mechanism for per-tenant overrides by @stoewer in #6758
- Extend
TraceRedactorinterface to support hiding complete traces viaErrTraceHiddenby @stoewer in #6811 - Single-binary mode: push distributor local ingest directly to live-store and metrics-generator without Kafka by @javiermolinar in #6729
- Add experimental drain limiter / span name sanitization to the metrics generator to reduce metrics cardinality by clustering similar span names by @Logiraptor in #6098
Enhancements
- Query-frontend: split streamed search and metrics responses into smaller gRPC packets for better default client compatibility, and fix final streaming updates after metrics series limits are reached by @mdisibio in #6607 / #7239
- live-store: lock-free block reads via
atomic.Pointer[blockSnapshot]; block deletion is two-phase and crash-safe by @zhxiaogg in #7132 - Support OR conditions for tag name and tag value autocomplete (search tags v2) by @ie-pham in #6827
- Expose MinIO retry settings via S3 config by @rwhitty in #6561
- Reduce default livestore WAL size and align query defaults:
max_block_duration1m to 30s,max_block_bytes100MiB to 50MiB,complete_block_timeout1h to 20m, metricsquery_backend_after30m to 15m by @zhxiaogg in #6974 - Enable native histogram emission for all promauto-registered histograms, including
tempo_request_duration_seconds. Both classic and native formats are emitted simultaneously; existing scrapers are unaffected by @zalegrala in #6910 - tempo-cli: Add
--headerflag to query api commands for custom headers by @Nouuu in #6768 - tempo-cli: add
redactcommand to submit trace redaction jobs to the backend scheduler by @zalegrala in #6832 - Block builder: deduplicate spans within traces during block creation and track removed duplicates via
tempo_block_builder_spans_deduped_totalmetric by @zhxiaogg in #6539 - metrics-generator: Support extracting span multiplier from W3C tracestate OTel probability sampling threshold via
enable_tracestate_span_multiplierconfig option by @csmarchbanks in #6684 - Add new alerts and runbook entries by @javiermolinar in #6276
- Double the maximum number of dedicated string columns in vParquet5 and update tempo-cli to determine the optimum number for the data by @mdisibio in #6282
- TraceQL metrics: experimental faster read path for most metrics queries, accessible behind the query hint
spanonly_fetch=truewhenunsafe_query_hintsis enabled by @mdisibio in #6359 - TraceQL metrics: add new per-tenant override to opt in or opt out of the new experimental faster read path for most metrics queries by @mdisibio in #6849
- Vulture: extend data consistency checks to include more strings, integers, and blobs, at resource/span/event scopes, and perform deeper trace content checks by @mdisibio in #6731
- Improve attribute truncating observability by @javiermolinar in #6400
- Log truncated oversized attributes by @carles-grafana in #6467
- livestore: make
trace_too_largelog line an insight by @carles-grafana in #6371 - Remove live-store partition owner from ring on shutdown to prevent stale owner entries by @oleg-kozlyuk-grafana in #6409
- Improved live store readiness check and added
readiness_target_lagandreadiness_max_waitconfig parameters. Live store will now, ifreadiness_target_lagis set, not report/readyuntil Kafka lag is brought under the specified value by @oleg-kozlyuk-grafana and @ruslan-mikhailov in #6238 and #6405 - Expose a new histogram metric to track the jobs per query distribution by @javiermolinar in #6343
- Do deep validation for filter policies in user configurable overrides API by @electron0zero in #6407
- Allow
span_name_sanitizationto be set via user-configurable overrides API by @Logiraptor in #6411 - Add
fail_on_high_lagparameter to allow live-store to fail if it is lagged by @ruslan-mikhailov and @carles-grafana in #6363, #6567, and #7066 - Add support for per-tenant left-padding of trace IDs by @mapno in #6489
- Add new metric for generator ring size:
tempo_distributor_metrics_generator_tenant_ring_sizeby @zalegrala in #5686 - Remove explicit
runtime.GC()calls in vParquet5 compactor/block creation and CLI by @oleg-kozlyuk-grafana in #6603 - Reduce allocations in
extendReuseSlicegrowth path during WAL writes and block creation by @mapno in #6863 - Implemented anti-affinity for pods in same livestore zone by @zhxiaogg in #6757
- Livestore: skipped WAL complete op during shutdown by @zhxiaogg in #6839
- Add metric to track livestore block cut reasons by @zhxiaogg in #6922
- Enable async parquet read mode for WAL completion path by @zhxiaogg in #6967
- metrics-generator: add
leave_consumer_group_on_shutdownto send LeaveGroup on shutdown for immediate partition reassignment instead of waiting for session timeout by @zalegrala in #6575
Breaking Changes
Tempo 3.0 includes several operator-impacting removals and default changes. Review the migration guide before upgrading.
- The legacy ingester-based write path has been removed. Deployments must migrate to the 3.0 ingest/write architecture.
- Removed ingesters and the ingester module by @javiermolinar in #6504 / #6959
- Removed remaining app ingester config and
ingest.enabledby @javiermolinar in #6667 / #6873 - Removed partition ring livestore config by @javiermolinar in #6981
- Removed ingester and compactor alerts by @javiermolinar in #6369
- Removed v2 block encoding and the legacy v2 compactor component by @joe-elliott in #6273
- Removed v2-specific CLI commands:
list block,list index,view index,gen index, andgen bloom.
- Removed v2-specific CLI commands:
- Centralized block and WAL config:
block_builderandlive_storenow always usestorage.trace.blocksettings; per-module block config fields are removed by @stoewer in #6647 - Recent data queries now guarantee complete results by failing when an instance is lagging. Defaults
query_frontend.query_end_cutoffto30sandlive_store.fail_on_high_lagtotrueby @mapno in #7210 / #7232 - Disabled legacy (flat, unscoped) overrides by default. Tempo refuses to start if legacy overrides are detected. Set
enable_legacy_overrides: trueor-config.enable-legacy-overrides=trueto opt back in temporarily. Legacy overrides will be removed in a future release by @electron0zero in #6741 - User-configurable overrides config
metrics_generator.processorsno longer merges with runtime overrides.metrics_generator.processorsnow takes precedence over runtime overrides, matching every other config in user-configurable overrides. Settingprocessors: []disables all processors for the tenant by @electron0zero in #7176 / #7185 - Enabled RetryInfo by default.
distributor.retry_after_on_resource_exhaustednow defaults to5s(was0) so OTLP clients receive a retry hint on ResourceExhausted errors by @electron0zero in #7088- Set to
0to disable cluster-wide, or set the per-tenant overrideingestion.retry_info_enabled: falseto disable for a single tenant.
- Set to
- Removed duplicate
compactionprefix from CompactorConfig CLI flags by @electron0zero in #6909compaction.compaction.block-retention→compaction.block-retentioncompaction.compaction.max-objects-per-block→compaction.max-objects-per-blockcompaction.compaction.max-block-bytes→compaction.max-block-bytescompaction.compaction.compaction-window→compaction.compaction-window
- Removed OpenCensus receiver by @javiermolinar in #6523
- Removed legacy
mem-ballast-size-mbsCLI flag by @orkhan-huseyn in #6403 - tempo-cli: Support relative time (
now,now-1h) for start/end args and standardize on RFC3339 in all commands by @electron0zero in #6458query searchcommand no longer accepts timestamps without timezone, for example2024-01-01T00:00:00; use RFC3339, for example2024-01-01T00:00:00Z, or relative time instead.
- Consolidated read configuration for recent data cutoff.
query_frontend.search.query_ingesters_untilis removed in favor of onlyquery_frontend.search.query_backend_afterby @mapno in #6507 - Removed deprecated
querier.query_live_storeconfig. This field must be removed from configs on upgrade by @javiermolinar in #7048 - Optimized TraceQL AST by rewriting conditions on the same attribute to their array equivalent by @stoewer in #6353
- Slightly changes the array matching semantics of
!=and!~operators and introduces stricter rules for regex literals.
- Slightly changes the array matching semantics of
- Removed span-metrics leftovers and lazy-init generator clients by @javiermolinar in #6618
- Decommissioned livestore MetricsGenerator query service by @javiermolinar in #6615
- Removed metrics-generator localblocks processor and related local block storage plumbing by @javiermolinar in #6555
SpanMetricsSummaryis removed and querier code simplified by @javiermolinar in #6496 and #6510- Set the
alltarget to be 3.0-compatible and removed thescalable-single-binarytarget by @joe-elliott in #6283 - Cleaned up enterprise jsonnet by @javiermolinar in #6505
Changes
- Stop publishing 32-bit ARM binary archives. Release artifacts continue to include amd64 and arm64 binaries by @javiermolinar in #7106
- Upgrade Tempo to Go 1.26.2 by @stoewer in #6443
- Allow duplicate dimensions for span metrics and service graphs. This is a valid use case if using different instrumentation libraries, with spans having
deployment.environmentand othersdeployment_environment, for example by @carles-grafana in #6288 - Update default max duration for TraceQL metrics queries up to one day by @javiermolinar in #6285
- Set TraceQL query metrics checks by default in Vulture by @javiermolinar in #6275
- Make Tempo single-binary example use the local backend by @javiermolinar in #7033
- Bump ingestion limits by @javiermolinar in #7034
- Bumped release examples and generated operations config to
grafana/tempo:3.0.0by @javiermolinar in #7291 - TraceQL metrics: change default step intervals to align with new vParquet5 timestamp columns by @mdisibio in #6413
- Remove all traces of ingesters from the dashboards by @javiermolinar in #6352
- jsonnet: Add emptyDir data volume to block-builder StatefulSet by @mapno in #6648
- Add quick checks to tempo mixin runbook by @javiermolinar in #6696
- Deprecate metrics-generator no-local-blocks by @javiermolinar in #6707
- Own local block and partition ring helpers by @javiermolinar in #6808
- Track invalid trace and span id discards by @javiermolinar in #6799
- Deprecate
query_frontend.rf1_afterand query all blocks regardless of replication factor for non-metrics paths. Simplifies 2.x to 3.0 migration by @mapno in #6969 - Flush blocks to backend storage from the Live store in single binary mode by @javiermolinar in #6941
- Remove stale config from the examples by @javiermolinar in #6980
- tempo-cli: Rewrite
migrate overrides-configand addmigrate overrides-per-tenantcommand to help migrate legacy flat overrides to the new scoped format by @electron0zero in #6793 - Decouple livestore from metrics-generator by @javiermolinar in #6506 and #6535
- Expose OTLP HTTP and gRPC ports for Docker examples by @javiermolinar in #6296
Security fixes
- Updated
golang.org/x/netto v0.55.0 by @renovate-sh-app[bot] in #7267 - Updated
golang.org/x/cryptoto v0.52.0 by @renovate-sh-app[bot] in #7266 - Updated
github.com/apache/thriftto v0.23.0 by @renovate-sh-app[bot] in #7265 - Fixed division by zero error in TraceQL expressions by @Proximyst in #6580
- Fixed
intPowhanging for certain inputs by @Proximyst in #6581 - Fixed integer overflow in query parameters by using
strconv.ParseUintinstead ofstrconv.Atoi/strconv.ParseIntfor unsigned integer fields by @ricardbejarano in #6612
Bugfixes
- Fixed live-store panic in legacy tag-based search against vParquet5 blocks when matching attributes stored in integer dedicated columns by @mdisibio in #7135
- Fixed livestore panic handling in
iterateBlocksper-block paths so malformed parquet values return an error instead of crashing the process by @zhxiaogg in #7134 - Fixed tempo-vulture ignoring
-tempo-push-tlsflag in normal operating mode by @zachfi in #6976 - livestore: check readiness before lag for SearchRecent and QueryRange queries by @zhxiaogg in #6911
- Fix live-store SearchTagValuesV2 disk cache never being populated on complete blocks by @mapno in #6858
- Fix dedicated columns fallback in
block_builderandlive_storeto usestorage.trace.block.parquet_dedicated_columnswhen not set via overrides by @stoewer in #6647 - Force live-store to rehydrate from Kafka lookback period when local data is missing, such as PVC wipe or new node, instead of resuming from the committed consumer group offset by @oleg-kozlyuk-grafana in #6428
- Fix reload of
span_name_sanitizationoverrides during runtime by @electron0zero in #6435 - live store honors the config options for block and WAL versions by @mdisibio in #6509
- block builder honors the global storage block config for block and WAL versions by @Harry-kp in #6532
- Normalize allowlist headers when building the allowlist map by @javiermolinar in #6481
- Fix bug related to dedicated column filtering by @stoewer in #6586
- Fix compactor deduped spans metric using the wrong type, gauge instead of counter, by @bejaratommy in #6576
- metrics-generator: Fix active-series counter underflow in local series limiter when overflow series are deleted by @carles-grafana in #6568
- Skip per-label limiter and sanitizer for target_info and host_info metrics in metrics-generator by @electron0zero in #6660
- Fix incorrect search results for some queries on new blob columns by @mdisibio in #6815
- Fix vParquet5 buffer-reuse bug where event attributes in dedicated columns could be persisted on additional spans and events by @mdisibio in #6914
- Fix race condition where
remove_owner_on_shutdownflag was set too late, causing the partition owner to remain in the ring by @oleg-kozlyuk-grafana in #6693 - Return 400 instead of 500 when query_range or query_instant requests have unparseable start/end parameters by @ruslan-mikhailov in #6694
- Correct block-builder fetch metrics to use counters instead of gauges by @WinterCabbage in #6578
- Log tenant on receiver push errors by @javiermolinar in #6780
- Fix race conditions in WAL block by @ruslan-mikhailov in #6773
- metrics-generator: Fix target_info being skipped when resource attributes have empty values by @carles-grafana in #6774
- metrics-generator: Drain old series on metric replacement to prevent limiter leak and permanent overflow by @carles-grafana in #6653
- live-store: fixed unsuccessful deregistering from membership/partition rings during shutdown by @zhxiaogg in #6848
- Respect context cancellation when reading WAL block iterator by @zhxiaogg in #6928
- Complete lifecycler shutdown on errors by @javiermolinar in #6906
- livestore: fix concurrent WAL writes from periodic and shutdown flushes by @zhxiaogg in #6972
- live-store: fix race conditions for tag values endpoint by @ruslan-mikhailov in #7000
- live-store: correct backoff duration calculation by @ruslan-mikhailov in #6999
- vulture: fix for recent traces when
query_end_cutoffis enabled by @ruslan-mikhailov in #7018 - Fix live-store producing WAL blocks exceeding
max_block_byteswhen flushing large batches of idle traces by @ruslan-mikhailov in #6971 - live-store: skip lookback replay when partition is Inactive during scaling down by @zhxiaogg in #7101
New Contributors
Thanks to the following first-time contributors:
- @evan361425 made their first contribution in #5968
- @mihaelmiklec made their first contribution in #6442
- @Harry-kp made their first contribution in #6532
- @bejaratommy made their first contribution in #6576
- @jasuade made their first contribution in #6610
- @antonio-mazzini made their first contribution in #6609
- @orkhan-huseyn made their first contribution in #6403
- @ricardbejarano made their first contribution in #6612
- @rwhitty made their first contribution in #6561
- @WinterCabbage made their first contribution in #6578
- @csmarchbanks made their first contribution in #6684
- @gounthar made their first contribution in #6756
- @Nouuu made their first contribution in #6768
- @EoinTrial made their first contribution in #6905
- @sethmccombs made their first contribution in #7108
Full Changelog: v2.10.5...v3.0.0