github opensearch-project/OpenSearch 3.7.0

8 hours ago

Version 3.7.0 Release Notes

Compatible with OpenSearch and OpenSearch Dashboards version 3.7.0

Features

  • Add dynamic properties support for pattern-based field definitions without cluster state mapping updates (#20816)
  • Add pluggable data format engine with DataFormatAwareEngine for multi-format indexing (#21181)
  • Add Lucene engine implementation for pluggable data formats (#21299)
  • Add merge support for Parquet data format plugin via streaming k-way merge sort (#21079)
  • Add directory and IndexInput layers for WritableWarm tiered storage (#21178)
  • Add server-side implementation for tiering status APIs (GetTieringStatus and ListTieringStatus) (#21220)
  • Add server-side implementation for HotToWarm, WarmToHot, and CancelTiering APIs (#21295)
  • Add prefetch settings and stored fields prefetch for WritableWarm tiered storage (#21285)
  • Add slow logs, per-query metrics, and migration metrics for WritableWarm tiered storage (#21332)
  • Add module wiring and integration tests for WritableWarm tiered storage (#21427)
  • Add tiered object storage crate for warm node file routing (#21204)
  • Add event-driven scheduler and stage execution for analytics engine (#21242)
  • Add coordinator-side DataFusion reduce with streaming Arrow batches (#21356)
  • Add distributed aggregation with partial/final mode for analytics engine (#21457)
  • Add distributed join planning and execution for analytics engine (#21639)
  • Add PPL append command support with multi-child stage runtime for Union (#21474)
  • Add PPL dedup command support via ROW_NUMBER window function (#21622)
  • Add PPL eventstats and streamstats window function support (#21734)
  • Add PPL top and rare command support via window functions (#21593)
  • Add PPL parse command with regex mode via Rust UDFs (#21573)
  • Add PPL rex command with sed and extract modes (#21550)
  • Add PPL spath command with auto-extract mode via json_extract_all UDF (#21664)
  • Add 7 PPL JSON scalar functions to analytics engine route (#21513)
  • Add 23 PPL datetime scalar functions to analytics engine route (#21556)
  • Add 14 additional PPL datetime functions (Wave A) including strftime, date_format, maketime (#21582)
  • Add 30+ PPL math scalar functions to analytics engine (#21520)
  • Add PPL string scalar functions to analytics engine (18 functions) (#21543)
  • Add PPL conditional functions (coalesce, isempty, isblank, case, if, ifnull) to analytics engine (#21643)
  • Add PPL conversion scalar functions (num, auto, memk, rmcomma, dur2sec, ctime, mktime) to analytics engine (#21628)
  • Add PPL cryptographic functions (md5, sha1, sha2, crc32) to analytics engine (#21611)
  • Add PPL array constructor and 8 multivalue functions to analytics engine (#21554)
  • Add PPL bucketing scalars (span_bucket, width_bucket, minspan_bucket, range_bucket) (#21621)
  • Add PPL TAKE, FIRST, LAST, LIST, VALUES aggregate functions (#21731)
  • Add Lucene filter delegation from DataFusion for full-text search predicates (#21555)
  • Add performance delegation to Lucene for selective filter predicates (#21701)
  • Add native Arrow transport path with zero-copy transfer for stream transport (#21253)
  • Stream Arrow batches on data-node fragment execution path (#21418)
  • Add support for extra_fields outside _source indexing for improved vector ingestion throughput (#20635)
  • Add gRPC support for Min, Max, and Terms aggregations (#21205)
  • Add partition strategy setting for flexible shard-to-partition mapping in pull-based ingestion (#21165)
  • Add SplitToFieldsProcessor for distributing split values to target fields (#21216)
  • Add native memory based admission control for transport request throttling (#21191)
  • Add native memory search backpressure for off-heap query cancellation (#21647)
  • Add unified native allocator framework for Arrow allocations with elastic rebalancing (#21703)
  • Add on-demand jemalloc heap profiling support via JMX CLI tool (#21599)
  • Add search.max_buckets to workload group settings for per-tenant bucket limits (#21721)
  • Add additional search settings and override_request_values to workload management groups (#21523)
  • Add hunspell dictionary hot-reload support via _refresh_search_analyzers API (#21559)

Enhancements

  • Add adaptive query budget for DataFusion engine with bounded memory and improved throughput (#21695)
  • Add DynamicLimitPool for runtime memory pool limit changes in DataFusion (#21286)
  • Add configurable coordinator buffer limit for per-query Arrow allocator (#21726)
  • Add CPU task cancellation for DataFusion queries (#21560)
  • Add IO task cancellation support for DataFusion queries (#21531)
  • Add DataFusion logical and physical plan logging at DEBUG level (#21646)
  • Add dynamic settings for indexed query execution path (#21522)
  • Add dedicated analytics_scheduler thread pool to prevent coordinator deadlock (#21771)
  • Add dedicated analytics_reduce thread pool for coordinator reduce drains (#21800)
  • Add native memory stats and task cancellation stats to node stats API (#21637)
  • Add current_application_duration_ms to cluster state download stats in node stats API (#20922)
  • Add segments and segment stats support for DataFormatAwareEngine (#21696)
  • Add DataFormat-aware NRT replication engine and remote-store wiring (#21311)
  • Add DataFormat-aware shallow snapshot v2 support (#21742)
  • Add DataFormat-aware read-only engine for warm primaries with tiering service improvements (#21720)
  • Add dynamic mapping support for pluggable data formats (#21444)
  • Add delete execution engine abstraction for DataFormatAwareEngine (#21313)
  • Add cluster-scope defaults for pluggable dataformat settings (#21435)
  • Add indexing support for metadata fields in pluggable data formats (#21585)
  • Add Lucene merge support for pluggable data format composite engine (#21422)
  • Add composite merge handler and merge policy for data-format-aware engine (#21128)
  • Add sort-on-refresh for composite engine with cross-format row-ID consistency (#21468)
  • Add warm+format directory wiring with per-format tiered directory routing (#21361)
  • Add block cache SPI and Foyer plugin for warm nodes (#21530)
  • Add REST API paths for block cache prune and detailed file cache stats (#21705)
  • Add cancellation checkpoints in field data loading and aggregation paths (#21318)
  • Add queryTimeout to IndexSearcher for KNN vector search timeout enforcement (#21316)
  • Add index-level authorization to analytics engine via ActionFilter dispatch (#21789)
  • Add /_analytics/ppl/_explain endpoint with stage profiling (#21660)
  • Add relevance function support (match_phrase, multi_match, query_string, etc.) to analytics engine (#21562)
  • Add relevance functions optional parameter support and new functions (wildcard_query, query, match_all) (#21661)
  • Add filter pushdown rules and Calcite rule metrics for profiling (#21684)
  • Add per-column encoding and compression configuration for Parquet data format (#21665)
  • Avoid repeated encoding and compression for sort column writes in Parquet (#21464)
  • Add pipeline execution metrics to PollingIngestStats for pull-based ingestion (#21024)
  • Add batching for persistent task cluster service to reduce cluster manager load (#21245)
  • Refactor BitsetFilterCache to node-level cache with configurable size limit (#21179)
  • Skip zone awareness when auto_expand_replicas is set to all (#21217)
  • Relax field-level meta validation constraints to allow any number of entries with string values (#20578)
  • Deprecate boolean constructor of FetchSourceContext in favor of static constants (#21235)
  • Add validation and deprecation warnings for ambiguous _source filtering (#21203)
  • Speed up Painless Script Engine initialization by ~10% (#21463)
  • Fix accumulation of file sizes when multiple files share the same extension in segment stats (#21000)
  • Improve native memory admission control precision with auto-derived budget and JVM non-heap subtraction (#21749)
  • Tighten DataFusion memory guard with RSS-based hard guard to prevent OOM under concurrent load (#21814)
  • Support indices_boost_2 array format for gRPC search (#21300)
  • Add configurable Kafka metadata timeout for pull-based ingestion (#21425)
  • Expose tokio-metrics as DataFusion plugin stats (#21303)
  • Add Lucene FFM callbacks to task resource tracking (#21610)

Bug Fixes

  • Fix YAML parser corrupting string values that resemble booleans after Jackson 3.x migration (#21294)
  • Fix map_unmapped_fields_as_text lost after dynamic mapping update in PercolatorFieldMapper (#21301)
  • Fix O(n²) removeAll in remote translog metadata cleanup causing CPU spikes (#21350)
  • Fix Rounding.isUTC() to recognize UTC timezone aliases for date histogram optimization (#21221)
  • Fix NPE in QueryPhaseResultConsumer when all shards fail (#21158)
  • Fix bulk request hang when index is deleted during primary phase (#21305)
  • Fix deadlock between engineMutex and writeLock during index close and engine reset (#21404)
  • Fix FlightOutboundHandler clearing caller's ThreadContext (#21167)
  • Fix IndicesRequestCacheCleanupIT flakiness by removing too-short assertBusy timeouts (#21494)
  • Fix negative fielddata stats by guarding against stale removals after shard reallocation (#21667)
  • Fix half_float ingest writing wrong fp16 bit pattern in Parquet (#21783)
  • Fix StringView buffer bloat in DataFusion stream_next FFI export causing 435x data amplification (#21753)
  • Fix Utf8View/Utf8 schema mismatch panic in indexed parquet path (#21826)
  • Fix memory leak in transport-reactor-netty4 plugin with persistent connections (#21788)
  • Fix ExitablePostingsEnum to extend FilterPostingsEnum for proper delegation (#21558)
  • Fix local recovery from flush for DataFormatAwareEngine (#21553)
  • Fix safe-commit info and replication checksum for DFA shards (#21787)
  • Fix DFA recovery failures: file-handle leak and reset-path crash (#21759)
  • Handle null scripted metric combine results (#21534)
  • Demote "No resource usage stats available for node" log from WARN to DEBUG (#21638)
  • Fix pull-based ingestion document mapper usage to reflect mapping updates (#21183)
  • Fix pull-based ingestion consumer factory to be stateless and prevent race conditions (#21652)
  • Fix pull-based ingestion multi-threaded writer batchStartPointer computation (#21697)
  • Fix Netty4Http3ServerTransport to use configured HeaderVerifier and Decompressor instances (#21281)
  • Convert varchar to str in analytics engine Project operations to fix DataFusion type errors (#21794)
  • Fix microsecond() function and add timestamp lower-bound validation in analytics engine (#21793)
  • Enforce write blocks for DFA hot-to-warm tiering to survive DiskThresholdMonitor removal (#21828)

Maintenance

  • Bump Netty to 4.2.14.Final (#21772)
  • Update Jackson to 2.21.3 / 3.1.3 (#21493)
  • Update ASM to 9.10 (#21764)
  • Update OpenTelemetry to 1.62.0 and SemConv to 1.41.0 (#21595)
  • Update Project Reactor to 3.8.5 and Reactor Netty to 1.3.5 (#21226)
  • Update bundled JDK to JDK 25.0.3 (#21353)
  • Update log4j2 to 2.25.4 (#21416)
  • Update httpclient5 to 5.6.1 (#21441)
  • Bump commons-configuration2 from 2.14.0 to 2.15.0 (#21806)
  • Bump org.apache.commons:commons-configuration2 from 2.13.0 to 2.14.0 (#21213)
  • Bump com.google.protobuf from 0.9.6 to 0.10.0 (#21291)
  • Bump org.apache.hadoop:hadoop-minicluster from 3.4.2 to 3.5.0 (#21138)
  • Bump org.codehaus.woodstox:stax2-api from 4.2.2 to 4.3.0 (#21137)
  • Bump org.jline:jline from 4.0.0 to 4.0.14 (#21471)
  • Bump org.jsoup:jsoup from 1.22.1 to 1.22.2 (#21290)
  • Bump com.nimbusds:nimbus-jose-jwt from 10.8 to 10.9 (#21214)
  • Remove Unsafe class injection from Java agent (#21542)
  • Replace mimalloc with jemalloc as global allocator for native sandbox plugins (#21497)
  • Upgrade DataFusion to v53 and Arrow to v58 (#21590)
  • Pin GitHub Actions to commit SHAs for supply chain security (#21808)
  • Update FIPS bootstrap check to use OpenSearch env var instead of BouncyCastle system property (#21415)

Don't miss a new OpenSearch release

NewReleases is sending notifications on new releases.