opensearch-project/OpenSearch 3.7.0 on GitHub

Version 3.7.0 Release Notes

Compatible with OpenSearch and OpenSearch Dashboards version 3.7.0

Features

Add dynamic properties support for pattern-based field definitions without cluster state mapping updates (#20816)
Add pluggable data format engine with DataFormatAwareEngine for multi-format indexing (#21181)
Add Lucene engine implementation for pluggable data formats (#21299)
Add merge support for Parquet data format plugin via streaming k-way merge sort (#21079)
Add directory and IndexInput layers for WritableWarm tiered storage (#21178)
Add server-side implementation for tiering status APIs (GetTieringStatus and ListTieringStatus) (#21220)
Add server-side implementation for HotToWarm, WarmToHot, and CancelTiering APIs (#21295)
Add prefetch settings and stored fields prefetch for WritableWarm tiered storage (#21285)
Add slow logs, per-query metrics, and migration metrics for WritableWarm tiered storage (#21332)
Add module wiring and integration tests for WritableWarm tiered storage (#21427)
Add tiered object storage crate for warm node file routing (#21204)
Add event-driven scheduler and stage execution for analytics engine (#21242)
Add coordinator-side DataFusion reduce with streaming Arrow batches (#21356)
Add distributed aggregation with partial/final mode for analytics engine (#21457)
Add distributed join planning and execution for analytics engine (#21639)
Add PPL append command support with multi-child stage runtime for Union (#21474)
Add PPL dedup command support via ROW_NUMBER window function (#21622)
Add PPL eventstats and streamstats window function support (#21734)
Add PPL top and rare command support via window functions (#21593)
Add PPL parse command with regex mode via Rust UDFs (#21573)
Add PPL rex command with sed and extract modes (#21550)
Add PPL spath command with auto-extract mode via json_extract_all UDF (#21664)
Add 7 PPL JSON scalar functions to analytics engine route (#21513)
Add 23 PPL datetime scalar functions to analytics engine route (#21556)
Add 14 additional PPL datetime functions (Wave A) including strftime, date_format, maketime (#21582)
Add 30+ PPL math scalar functions to analytics engine (#21520)
Add PPL string scalar functions to analytics engine (18 functions) (#21543)
Add PPL conditional functions (coalesce, isempty, isblank, case, if, ifnull) to analytics engine (#21643)
Add PPL conversion scalar functions (num, auto, memk, rmcomma, dur2sec, ctime, mktime) to analytics engine (#21628)
Add PPL cryptographic functions (md5, sha1, sha2, crc32) to analytics engine (#21611)
Add PPL array constructor and 8 multivalue functions to analytics engine (#21554)
Add PPL bucketing scalars (span_bucket, width_bucket, minspan_bucket, range_bucket) (#21621)
Add PPL TAKE, FIRST, LAST, LIST, VALUES aggregate functions (#21731)
Add Lucene filter delegation from DataFusion for full-text search predicates (#21555)
Add performance delegation to Lucene for selective filter predicates (#21701)
Add native Arrow transport path with zero-copy transfer for stream transport (#21253)
Stream Arrow batches on data-node fragment execution path (#21418)
Add support for extra_fields outside _source indexing for improved vector ingestion throughput (#20635)
Add gRPC support for Min, Max, and Terms aggregations (#21205)
Add partition strategy setting for flexible shard-to-partition mapping in pull-based ingestion (#21165)
Add SplitToFieldsProcessor for distributing split values to target fields (#21216)
Add native memory based admission control for transport request throttling (#21191)
Add native memory search backpressure for off-heap query cancellation (#21647)
Add unified native allocator framework for Arrow allocations with elastic rebalancing (#21703)
Add on-demand jemalloc heap profiling support via JMX CLI tool (#21599)
Add search.max_buckets to workload group settings for per-tenant bucket limits (#21721)
Add additional search settings and override_request_values to workload management groups (#21523)
Add hunspell dictionary hot-reload support via _refresh_search_analyzers API (#21559)

Enhancements

Add adaptive query budget for DataFusion engine with bounded memory and improved throughput (#21695)
Add DynamicLimitPool for runtime memory pool limit changes in DataFusion (#21286)
Add configurable coordinator buffer limit for per-query Arrow allocator (#21726)
Add CPU task cancellation for DataFusion queries (#21560)
Add IO task cancellation support for DataFusion queries (#21531)
Add DataFusion logical and physical plan logging at DEBUG level (#21646)
Add dynamic settings for indexed query execution path (#21522)
Add dedicated analytics_scheduler thread pool to prevent coordinator deadlock (#21771)
Add dedicated analytics_reduce thread pool for coordinator reduce drains (#21800)
Add native memory stats and task cancellation stats to node stats API (#21637)
Add current_application_duration_ms to cluster state download stats in node stats API (#20922)
Add segments and segment stats support for DataFormatAwareEngine (#21696)
Add DataFormat-aware NRT replication engine and remote-store wiring (#21311)
Add DataFormat-aware shallow snapshot v2 support (#21742)
Add DataFormat-aware read-only engine for warm primaries with tiering service improvements (#21720)
Add dynamic mapping support for pluggable data formats (#21444)
Add delete execution engine abstraction for DataFormatAwareEngine (#21313)
Add cluster-scope defaults for pluggable dataformat settings (#21435)
Add indexing support for metadata fields in pluggable data formats (#21585)
Add Lucene merge support for pluggable data format composite engine (#21422)
Add composite merge handler and merge policy for data-format-aware engine (#21128)
Add sort-on-refresh for composite engine with cross-format row-ID consistency (#21468)
Add warm+format directory wiring with per-format tiered directory routing (#21361)
Add block cache SPI and Foyer plugin for warm nodes (#21530)
Add REST API paths for block cache prune and detailed file cache stats (#21705)
Add cancellation checkpoints in field data loading and aggregation paths (#21318)
Add queryTimeout to IndexSearcher for KNN vector search timeout enforcement (#21316)
Add index-level authorization to analytics engine via ActionFilter dispatch (#21789)
Add /_analytics/ppl/_explain endpoint with stage profiling (#21660)
Add relevance function support (match_phrase, multi_match, query_string, etc.) to analytics engine (#21562)
Add relevance functions optional parameter support and new functions (wildcard_query, query, match_all) (#21661)
Add filter pushdown rules and Calcite rule metrics for profiling (#21684)
Add per-column encoding and compression configuration for Parquet data format (#21665)
Avoid repeated encoding and compression for sort column writes in Parquet (#21464)
Add pipeline execution metrics to PollingIngestStats for pull-based ingestion (#21024)
Add batching for persistent task cluster service to reduce cluster manager load (#21245)
Refactor BitsetFilterCache to node-level cache with configurable size limit (#21179)
Skip zone awareness when auto_expand_replicas is set to all (#21217)
Relax field-level meta validation constraints to allow any number of entries with string values (#20578)
Deprecate boolean constructor of FetchSourceContext in favor of static constants (#21235)
Add validation and deprecation warnings for ambiguous _source filtering (#21203)
Speed up Painless Script Engine initialization by ~10% (#21463)
Fix accumulation of file sizes when multiple files share the same extension in segment stats (#21000)
Improve native memory admission control precision with auto-derived budget and JVM non-heap subtraction (#21749)
Tighten DataFusion memory guard with RSS-based hard guard to prevent OOM under concurrent load (#21814)
Support indices_boost_2 array format for gRPC search (#21300)
Add configurable Kafka metadata timeout for pull-based ingestion (#21425)
Expose tokio-metrics as DataFusion plugin stats (#21303)
Add Lucene FFM callbacks to task resource tracking (#21610)

Bug Fixes

Fix YAML parser corrupting string values that resemble booleans after Jackson 3.x migration (#21294)
Fix map_unmapped_fields_as_text lost after dynamic mapping update in PercolatorFieldMapper (#21301)
Fix O(n²) removeAll in remote translog metadata cleanup causing CPU spikes (#21350)
Fix Rounding.isUTC() to recognize UTC timezone aliases for date histogram optimization (#21221)
Fix NPE in QueryPhaseResultConsumer when all shards fail (#21158)
Fix bulk request hang when index is deleted during primary phase (#21305)
Fix deadlock between engineMutex and writeLock during index close and engine reset (#21404)
Fix FlightOutboundHandler clearing caller's ThreadContext (#21167)
Fix IndicesRequestCacheCleanupIT flakiness by removing too-short assertBusy timeouts (#21494)
Fix negative fielddata stats by guarding against stale removals after shard reallocation (#21667)
Fix half_float ingest writing wrong fp16 bit pattern in Parquet (#21783)
Fix StringView buffer bloat in DataFusion stream_next FFI export causing 435x data amplification (#21753)
Fix Utf8View/Utf8 schema mismatch panic in indexed parquet path (#21826)
Fix memory leak in transport-reactor-netty4 plugin with persistent connections (#21788)
Fix ExitablePostingsEnum to extend FilterPostingsEnum for proper delegation (#21558)
Fix local recovery from flush for DataFormatAwareEngine (#21553)
Fix safe-commit info and replication checksum for DFA shards (#21787)
Fix DFA recovery failures: file-handle leak and reset-path crash (#21759)
Handle null scripted metric combine results (#21534)
Demote "No resource usage stats available for node" log from WARN to DEBUG (#21638)
Fix pull-based ingestion document mapper usage to reflect mapping updates (#21183)
Fix pull-based ingestion consumer factory to be stateless and prevent race conditions (#21652)
Fix pull-based ingestion multi-threaded writer batchStartPointer computation (#21697)
Fix Netty4Http3ServerTransport to use configured HeaderVerifier and Decompressor instances (#21281)
Convert varchar to str in analytics engine Project operations to fix DataFusion type errors (#21794)
Fix microsecond() function and add timestamp lower-bound validation in analytics engine (#21793)
Enforce write blocks for DFA hot-to-warm tiering to survive DiskThresholdMonitor removal (#21828)

Maintenance

Bump Netty to 4.2.14.Final (#21772)
Update Jackson to 2.21.3 / 3.1.3 (#21493)
Update ASM to 9.10 (#21764)
Update OpenTelemetry to 1.62.0 and SemConv to 1.41.0 (#21595)
Update Project Reactor to 3.8.5 and Reactor Netty to 1.3.5 (#21226)
Update bundled JDK to JDK 25.0.3 (#21353)
Update log4j2 to 2.25.4 (#21416)
Update httpclient5 to 5.6.1 (#21441)
Bump commons-configuration2 from 2.14.0 to 2.15.0 (#21806)
Bump org.apache.commons:commons-configuration2 from 2.13.0 to 2.14.0 (#21213)
Bump com.google.protobuf from 0.9.6 to 0.10.0 (#21291)
Bump org.apache.hadoop:hadoop-minicluster from 3.4.2 to 3.5.0 (#21138)
Bump org.codehaus.woodstox:stax2-api from 4.2.2 to 4.3.0 (#21137)
Bump org.jline:jline from 4.0.0 to 4.0.14 (#21471)
Bump org.jsoup:jsoup from 1.22.1 to 1.22.2 (#21290)
Bump com.nimbusds:nimbus-jose-jwt from 10.8 to 10.9 (#21214)
Remove Unsafe class injection from Java agent (#21542)
Replace mimalloc with jemalloc as global allocator for native sandbox plugins (#21497)
Upgrade DataFusion to v53 and Arrow to v58 (#21590)
Pin GitHub Actions to commit SHAs for supply chain security (#21808)
Update FIPS bootstrap check to use OpenSearch env var instead of BouncyCastle system property (#21415)