Apache Druid 36.0.0 contains over 189 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 34 contributors.
See the complete set of changes for additional details, including bug fixes.
Review the upgrade notes before you upgrade to Druid 36.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.
Important features, changes, and deprecations
This section contains important information about new and existing features.
Functional area and related changes
This section contains detailed release notes separated by areas.
Druid operator
Druid Operator is a Kubernetes controller that manages the lifecycle of your Druid clusters. The operator simplifies the management of Druid clusters with its custom logic that is configurable through
Kubernetes CRDs.
Cost-based autoscaling for streaming ingestion
Druid now supports cost-based autoscaling for streaming ingestion that optimizes task count by balancing lag reduction against resource efficiency.. This autoscaling strategy uses the following formula:
totalCost = lagWeight × lagRecoveryTime + idleWeight × idlenessCost
which accounts for the time to clear the backlog and compute time:
lagRecoveryTime = aggregateLag / (taskCount × avgProcessingRate) — time to clear backlog
idlenessCost = taskCount × taskDuration × predictedIdleRatio — wasted compute time
Kubernetes client mode (experimental)
In kubernetes-overlord-extensions an experimental Kubernetes client mode was added. The new mode uses the fabric8 SharedInformers to cache k8s metadata. This greatly reduces API traffic between the Overlord and k8s control plane. You can try out this feature using the following config:
druid.indexer.runner.useK8sSharedInformers=true
cgroup v2 support
cgroup v2 is now supported, and all cgroup metrics now emit cgroupversion to identify which version is being used.
The following metrics automatically switch to v2 if v2 is detected: CgroupCpuMonitor , CgroupCpuSetMonitor, CgroupDiskMonitor,MemoryMonitor. CpuAcctDeltaMonitor fails gracefully if v2 is detected.
Additionally, CgroupV2CpuMonitor now also emits cgroup/cpu/shares and cgroup/cpu/cores_quota.
Query reports for Dart
Dart now supports query reports for running and recently completed queries. The reports can be fetched from the /druid/v2/sql/queries/<sqlQueryId>/reports endpoint.
The format of the response is a JSON object with two keys, "query" and "report". The "query" key is the same info that is available from the existing /druid/v2/sql/queries endpoint. The "report" key is a report map including an MSQ report.
You can control the retention behavior for reports using the following configs:
druid.msq.dart.controller.maxRetainedReportCount: Max number of reports that are retained. The default is 0, meaning no reports are retaineddruid.msq.dart.controller.maxRetainedReportDuration: How long reports are retained in ISO 8601 duration format. The default isPT0S, meaning time-based expiration is turned off
New segment format
The new version 10 segment format improves upon version 9. It is off by default and not compatible with older segment format versions.
Set druid.indexer.task.buildV10=true to make segments in the new format.
If you downgrade, you must reindex your data with a supported segment format version.
You can use the bin/dump-segment tool to view segment metadata. The tool outputs serialized JSON.
Web console
New info available in the web console
The web console now includes information about the number of available processors and the total memory (in binary bytes).
This information is also available through the sys.servers table.
Other web console improvements
- Added tracking for inactive workers for MSQ execution stages #18768
- Added a refresh button for JSON views and stage viewers #18768
- You can now define
ARRAYtype parameters in the query view #18586 - Changed system table queries to now automatically use the native engine #18857
- Improved time charts to support multiple measures #18701
Ingestion
- Added support for AWS
InternalErrorcode retries #18720 - Improved ingestion to be more resilient. Ingestion tasks no longer fail if the task log upload fails with an exception #18748
- Improved how Druid handles situations where data doesn't match the expected type #18878
- Improved JSON ingestion so that Druid can compute JSON values directly from dictionary or index structures, allowing ingestion to skip persisting raw JSON data entirely. This reduces on-disk storage size #18589
- You can now choose between full dictionary-based indexing and nulls-only indexing for long/double fields in a nested column #18722
SQL-based ingestion
Additional ingestion configurations
You can now use the following configs to control how your data gets ingested and stored:
maxInputFilesPerWorker: Controls the maximum number of input files or segments per worker.maxPartitions: Controls the maximum number of output partitions for any single stage, which affects how many segments are generated during ingestion.
Other SQL-based ingestion improvements
- Added
maxRowsInMemoryto replacerowsInMemory.rowsInMemorynow functions as an alternate way to provide that config and is ignored ifmaxRowsInMemoryis specified. Previously, onlyrowsInMemoryexisted #18832
Streaming ingestion
Record offset and partition
You can now ingest the record offset (offsetColumnName) and partition (partitionColumnName) using the KafkaInputFormat. Their default names are kafka.offset and kafka.partition respectively .
Other streaming ingestion improvements
- Improved supervisors so that they can't kill tasks while the supervisor is stopping #18767
- Improved the lag-based autoscaler for streaming ingestion #18745
- Improved the
SeekableStreamsupervisor autoscaler to wait for tasks to complete before attempting subsequent scale operations. This helps prevent duplicate supervisor history entries #18715
Querying
Other querying improvements
- Improved the user experience for invalid
regex_expqueries. An error gets returned now #18762
Cluster management
Dynamic capacity for Kubernetes-based deployments
Druid can now dynamically tune the task runner capacity.
Include the capacity field in a POST API call to /druid/indexer/v1/k8s/taskrunner/executionconfig. Setting a value this way overrides druid.indexer.runner.capacity.
Server properties table
The sys.server_properties table exposes the runtime properties configured for each Druid server. Each row represents a single property key-value pair associated with a specific server.
Other cluster management improvements
- Added quality of service filtering for the Overlord so that health check threads don't get blocked #18033
Data management
Other data management improvements
- Added the
mostFragmentedFirstcompaction policy that prioritizes intervals with the most small uncompacted segments #18802 - Improved how segment files get deleted to prevent partial segment files from remaining in the event of a failure during a delete operation #18696
- Improved compaction so that it identifies multi-value dimensions for dimension schemas that can produce them #18760
- Improved the reliability when pulling segments #18821
Metrics and monitoring
Task metrics
All task metrics now emit the following dimensions: taskId, dataSource, taskType, groupId, and id. Note that id is emitted for backwards compatibility. It will be removed in favor of the taskId dimension in a future release.
Ingestion metrics
The following metrics for streaming and batch tasks now emit the actual values instead of 0: ingest/merge/time, ingest/merge/cpu, and ingest/persists/cpu.
statsd metrics
The following metrics have been added to the default list for statsd:
task/action/run/timetask/status/queue/counttask/status/updated/countingest/handoff/time
Other metrics and monitoring improvements
-
Added
reasondimension toingest/events/thrownAwaymetric. This allows for increased observability on why certain events are being logically excluded from ingest #188855 -
Added logging for all handlers for a stage before they start or stop, which can help you understand execution order #18662
-
Added new Jetty thread pool metrics to capture request-serving thread statistics:
jetty/threadPool/utilized,jetty/threadPool/readyandjetty/threadPool/utilizationRate#18883 -
Added
tierandprioritydimensions to thesegments/maxmetric #18890 -
Added
GroupByStatsMonitor, which includesdataSourceandtaskIddimensions for metrics emitted on peons #18711 -
Added
task/waiting/timemetric, which measures the time it takes for a task to be placed onto the task runner for scheduling and running #18735 -
Added the
supervisorIdto streaming task metrics to help clarify situations where multiple supervisors ingest data into a single datasource #18803 -
Added
StorageMonitortodruid.monitoring.monitorsto measure storage and virtual storage usage by segment cache #18742 -
Druid now logs the following:
-
Changed Prometheus emitter TTL tracking to consider all label value combinations instead of just the metric name. Labels aren't tracked when the TTL isn't set #18718 #18689
-
Changed lifecycle
stop()to be logged at theinfolevel to matchstart()#18640 -
Changed the trigger for metrics emission so that metrics get emitted any time a task completes #18766
-
Improved the metrics emitter so that it emits metrics for all task completions #18766
Extensions
SpectatorHistogram extension
- Added
SPECTATOR_COUNTandSPECTATOR_PERCENTILESQL functions #18885 - Improved the performance of the SpectatorHistogram extension through vectorization #18813
Upgrade notes and incompatible changes
Upgrade notes
MSQ controller tasks
When upgrading from Druid 30 or earlier, MSQ query_controller tasks can fail during a rolling update due to the addition of new counters that are not backwards compatible with these older versions. You can either retry any failed queries after the update completes; or you can set includeAllCounters to false in the query context for any MSQ jobs that need to run during the rolling update; or you can upgrade to Druid 31–35 first before upgrading to Druid 36.
Segment format
Druid 36.0.0 supports segment format version 10. Previous versions of Druid don't support version 10. If you downgrade, you must reindex your data with a supported segment format version.
Deprecated metrics
Monitors on peons that previously emitted the id dimension from JettyMonitor, OshiSysMonitor, JvmMonitor, JvmCpuMonitor, JvmThreadsMonitor and SysMonitor to represent the task ID are deprecated and will be removed in a future release. Use the taskId dimension instead.
Removed metrics
The following obsolete metrics have been removed:
segment/cost/raw#18846segment/cost/normalized#18846segment/cost/normalization#18846task/action/log/time#18649
Developer notes
Segment file interfaces
New SegmentFileBuilder and SegmentFileMapper interfaces have been defined to replace direct usages of FileSmoosher and SmooshedFileMapper to abstract the segment building and reading process.
The main developer visible changes for extension writers with custom column implementations is that the Serializer interface has changed the writeTo method:
- It now accepts a
SegmentFileBuilderinstead of aFileSmoosher - The
ColumnBuildermethodgetFileMappernow returns aSegmentFileMapperinstead ofSmooshedFileMapper.
Extensions which do not provide custom column implementations should not be impacted by these changes.
Other developer improvements
- Added the ability to override the default Kafka image for testing #18739
- Changed
fastDecompressortosafeDecompressor#18930 - Extensions can now provide query kit implementations #18875
- Removed version overrides in individual
pomfiles. For a full list, see the pull request #18708
Dependency updates
The following dependencies have had their versions bumped:
org.apache.logging.log4j:log4j-corefrom2.22.1to2.25.3#18874org.mozilla:rhinofrom1.7.14to1.7.14.1#18868net.java.dev.jnaandnet.java.dev.jnaversions from5.13.0to5.18.1for Oshi monitor #18848com.github.oshi:oshi-core fromfrom6.4.4to6.9.1#18839bcpkix-jdk18onfrom1.78.1to1.79#18834org.eclipse.jettyfrom12.0.25to12.0.30#18773hamcrestfrom1.3to2.2#18708org.apache.commons:commons-lang3from3.18.0to3.19.0#18695org.apache.maven.plugins:maven-shade-pluginfrom3.5.0to3.6.1com.netflix.spectatorfrom1.7.0to1.9.0#18887org.bouncycastle:bcpkix-jdk18onfrom1.79to1.81to resolveSONATYPE-2025-001911
Credits
@317brian
@abhishekrb19
@AdheipSingh
@aho135
@Akshat-Jain
@amaechler
@ashwintumma23
@capistrant
@cecemei
@clintropolis
@cryptoe
@dependabot[bot]
@EdwinIngJ
@Fly-Style
@GabrielCWT
@gargvishesh
@gianm
@GWphua
@hfukada
@inponomarev
@jtuglu1
@keoliva
@kfaraz
@kgyrtkirk
@Pankaj260100
@Rasnar
@rohangarg
@Shiyang-Zhao
@TessaIO
@tinnou
@uds5501
@vogievetsky
@vtlim
@writer-jill