github apache/arrow-rs 58.2.0
arrow 58.2.0

5 hours ago

Changelog

58.2.0 (2026-04-28)

Full Changelog

Implemented enhancements:

  • Expose ColumnCloseResult on ArrowColumnChunk #9774 [parquet]
  • Expose FFI data structures fields #9771 [arrow]
  • short-circuit last predicate in RowFilter when with_limit(N) is set #9765 [parquet]
  • vectorise dict-index bounds check #9747 [parquet]
  • Refactor RleEncoder::flush_bit_packed_run #9734 [parquet]
  • Add benchmark for cast from/to decimals #9728 [arrow]
  • Add a security policy for arrow-rs #9727 [parquet] [arrow] [arrow-flight]
  • Support FixedSizeList in arrow-json reader #9714 [arrow]
  • [Variant] Add VariantArrayBuilder::append_nulls API #9684
  • [Json] RunEndEncoded decoder optimization #9645 [arrow]
  • [Variant] variant_get(..., List<_>) non-Struct types support #9615
  • [Variant] Add unshredded Struct fast-path for variant_get(..., Struct) #9596
  • Allow setting custom line terminator for CSV writer #9571 [arrow]
  • [Variant] Align cast logic for variant_get to cast kernel for numeric/bool types #9564 [arrow]
  • ci: use ubuntu-slim where applicable #9536
  • Publicly export arrow_string::Predicate and its methods? #9480
  • Don't create CompressionContext when no compression is selected [IPC] #9463 [arrow]
  • Parquet: Raw level buffering causes unbounded memory growth for sparse columns #9446 [parquet]
  • Parallel Parquet Reading #9381 [parquet]

Fixed bugs:

  • [Variant] unshred_variant panics on malformed bytes despite returning Result #9740
  • RecordBatch::normalize() does not propagate top level null bitmap into the results #9732 [arrow]
  • Incorrect accounting in DictEncoder::estimated_memory_size #9719 [parquet]
  • arrow-ipc writer does not comply with spec for empty variable-size arrays #9716 [arrow]
  • Panic when reading corrupt parquet file with truncated data instead of ParquetError #9705 [parquet]
  • NOTICE.txt is inaccurate #9703 [arrow]
  • Unnecessary dependency on regex crate #9672
  • [arrow-avro] Avro reader produces incorrect results when reader schema and writer schema differ #9655 [arrow]
  • parquet docs are broken on docs.rs #9649
  • [Parquet] ArrowWriter with CDC panics on nested ListArrays #9637 [parquet] [arrow] [arrow-flight]
  • Use release KEYS file for verification instead of dev KEYS #9603
  • IPC reader: handling of dictionaries with only null values #9595 [arrow]
  • Parquet RleDecoder::get_batch_with_dict panics on oob dictionary indices #9434 [parquet]

Documentation updates:

Performance improvements:

  • parquet: avoid decode and heap allocation on terminal skip in DeltaBitPackDecoder #9784 [parquet]
  • parquet: O(1) skip for bw=0 miniblocks in DeltaBitPackDecoder #9783 [parquet]
  • Remove per-message flush overhead in Arrow IPC writer #9762 [arrow]
  • Support GenericListViewArray::new_unchecked and refactor ListView json decoder #9646 [arrow]
  • Support nested REE in arrow-ord partition function #9640 [arrow]
  • [Parquet] Remove the BIT_PACKED encoder #9635 [parquet]
  • Pre-reserve output capacity in ByteView/ByteArray dictionary decoding #9587 [parquet]
  • Fuse RLE decoding and view gathering for StringView dictionary decoding #9582 [parquet]
  • Use branchless index clamping and add get_batch_direct to RleDecoder #9581 [parquet]
  • Reduce per-byte overhead in VLQ integer decoding #9580 [parquet]
  • feat(parquet): batch RLE runs in level encoder via scan-ahead #9830 [parquet] (HippoBaro)
  • fix: lazy-init zstd compression contexts to avoid unnecessary FFI calls #9808 [arrow] (mbutrovich)
  • parquet: O(1) skip for bw=0 miniblocks in DeltaBitPackDecoder #9786 [parquet] (sahuagin)
  • chore: add benchmark for row filters with LIMIT short-circuit #9767 [parquet] (haohuaijin)
  • Push LIMIT / OFFSET into the last RowFilter predicate and skip unused row groups #9766 [parquet] (haohuaijin)
  • feat(ipc): Remove per-message flush in IPC writer hot path #9763 [arrow] (pchintar)
  • perf(parquet): Defer fixed length byte array buffer alloc and skip zero-batch init #9756 [parquet] (lyang24)
  • feat(parquet): batch consecutive null/empty rows in write_list #9752 [parquet] (HippoBaro)
  • Remove len field from buffer builder #9750 [arrow] (cetra3)
  • perf(parquet): Vectorize dict-index bounds check in RleDecoder::get_batch_with_dict (up to -7.9%) #9746 [parquet] (Dandandan)
  • feat(parquet): precompute offset_index_disabled at build-time #9724 [parquet] (HippoBaro)
  • [Parquet] Improve dictionary decoder by unrolling loops #9662 [parquet] (Dandandan)
  • [Json] Use partition and take in RunEndEncoded decoder #9658 [arrow] (liamzwbao)
  • Improve take performance on List arrays #9643 [arrow] (AdamGS)
  • [Json] Replace ArrayData with typed Array construction in json-reader #9497 [arrow] (liamzwbao)
  • feat(parquet): stream-encode definition/repetition levels incrementally #9447 [parquet] (HippoBaro)

Closed issues:

  • Incorrect buffer skipping for V4 Union types in IPC skip_field #9828 [arrow]
  • Replace wildcard match in skip_field with explicit DataType handling #9821 [arrow]
  • Column projection misalignment for ListView / LargeListView in IPC reader #9805 [arrow]
  • Avoid panic on malformed compressed buffer prefix in IPC #9801 [arrow]
  • DeltaByteArrayDecoder panics on invalid prefix lengths #9796 [parquet]
  • Use NullBufferBuilder when reading json #9781 [arrow]
  • Perfectly shredded arrays with top-level null values loss nullability when typed_value is extracted #9701
  • [Parquet Metadata] API to determine page-index presence separately from page-index load #9693
  • Union cast is incorrect for duplicate field names #9664 [arrow]
  • List and ListView are missing take benchmarks #9627 [arrow]
  • Support RunEndEncoded arrays in comparison kernels (eq, lt, etc.) #9620 [arrow]
  • variant_get should follow JSONpath semantics #9606
  • GenericByteViewArray: support finding total length of all strings #9435 [arrow]

Merged pull requests:

* This Changelog was automatically generated by github_changelog_generator

Don't miss a new arrow-rs release

NewReleases is sending notifications on new releases.