Changelog
58.2.0 (2026-04-28)
Implemented enhancements:
- Expose ColumnCloseResult on ArrowColumnChunk #9774 [parquet]
- Expose FFI data structures fields #9771 [arrow]
- short-circuit last predicate in
RowFilterwhenwith_limit(N)is set #9765 [parquet] - vectorise dict-index bounds check #9747 [parquet]
- Refactor
RleEncoder::flush_bit_packed_run#9734 [parquet] - Add benchmark for cast from/to decimals #9728 [arrow]
- Add a security policy for arrow-rs #9727 [parquet] [arrow] [arrow-flight]
- Support
FixedSizeListin arrow-json reader #9714 [arrow] - [Variant] Add
VariantArrayBuilder::append_nullsAPI #9684 - [Json] RunEndEncoded decoder optimization #9645 [arrow]
- [Variant]
variant_get(..., List<_>)non-Struct types support #9615 - [Variant] Add unshredded
Structfast-path forvariant_get(..., Struct)#9596 - Allow setting custom line terminator for CSV writer #9571 [arrow]
- [Variant] Align cast logic for
variant_getto cast kernel for numeric/bool types #9564 [arrow] - ci: use ubuntu-slim where applicable #9536
- Publicly export
arrow_string::Predicateand its methods? #9480 - Don't create CompressionContext when no compression is selected [IPC] #9463 [arrow]
- Parquet: Raw level buffering causes unbounded memory growth for sparse columns #9446 [parquet]
- Parallel Parquet Reading #9381 [parquet]
Fixed bugs:
- [Variant]
unshred_variantpanics on malformed bytes despite returningResult#9740 - RecordBatch::normalize() does not propagate top level null bitmap into the results #9732 [arrow]
- Incorrect accounting in
DictEncoder::estimated_memory_size#9719 [parquet] - arrow-ipc writer does not comply with spec for empty variable-size arrays #9716 [arrow]
- Panic when reading corrupt parquet file with truncated data instead of ParquetError #9705 [parquet]
- NOTICE.txt is inaccurate #9703 [arrow]
- Unnecessary dependency on regex crate #9672
- [arrow-avro] Avro reader produces incorrect results when reader schema and writer schema differ #9655 [arrow]
- parquet docs are broken on docs.rs #9649
- [Parquet] ArrowWriter with CDC panics on nested ListArrays #9637 [parquet] [arrow] [arrow-flight]
- Use release KEYS file for verification instead of dev KEYS #9603
- IPC reader: handling of dictionaries with only null values #9595 [arrow]
- Parquet RleDecoder::get_batch_with_dict panics on oob dictionary indices #9434 [parquet]
Documentation updates:
- docs(variant): link VariantArray doc to official Parquet Variant extension type #9779 (mcharrel)
- Document Security Policy #9730 [parquet] [arrow] [arrow-flight] (alamb)
- Docs: add example of how to read parquet row groups in parallel #9396 [parquet] (alamb)
Performance improvements:
- parquet: avoid decode and heap allocation on terminal skip in DeltaBitPackDecoder #9784 [parquet]
- parquet: O(1) skip for bw=0 miniblocks in DeltaBitPackDecoder #9783 [parquet]
- Remove per-message flush overhead in Arrow IPC writer #9762 [arrow]
- Support
GenericListViewArray::new_uncheckedand refactor ListView json decoder #9646 [arrow] - Support nested REE in arrow-ord
partitionfunction #9640 [arrow] - [Parquet] Remove the BIT_PACKED encoder #9635 [parquet]
- Pre-reserve output capacity in ByteView/ByteArray dictionary decoding #9587 [parquet]
- Fuse RLE decoding and view gathering for StringView dictionary decoding #9582 [parquet]
- Use branchless index clamping and add get_batch_direct to RleDecoder #9581 [parquet]
- Reduce per-byte overhead in VLQ integer decoding #9580 [parquet]
- feat(parquet): batch RLE runs in level encoder via scan-ahead #9830 [parquet] (HippoBaro)
- fix: lazy-init zstd compression contexts to avoid unnecessary FFI calls #9808 [arrow] (mbutrovich)
- parquet: O(1) skip for bw=0 miniblocks in DeltaBitPackDecoder #9786 [parquet] (sahuagin)
- chore: add benchmark for row filters with LIMIT short-circuit #9767 [parquet] (haohuaijin)
- Push
LIMIT/OFFSETinto the lastRowFilterpredicate and skip unused row groups #9766 [parquet] (haohuaijin) - feat(ipc): Remove per-message flush in IPC writer hot path #9763 [arrow] (pchintar)
- perf(parquet): Defer fixed length byte array buffer alloc and skip zero-batch init #9756 [parquet] (lyang24)
- feat(parquet): batch consecutive null/empty rows in
write_list#9752 [parquet] (HippoBaro) - Remove
lenfield from buffer builder #9750 [arrow] (cetra3) - perf(parquet): Vectorize dict-index bounds check in RleDecoder::get_batch_with_dict (up to -7.9%) #9746 [parquet] (Dandandan)
- feat(parquet): precompute
offset_index_disabledat build-time #9724 [parquet] (HippoBaro) - [Parquet] Improve dictionary decoder by unrolling loops #9662 [parquet] (Dandandan)
- [Json] Use
partitionandtakein RunEndEncoded decoder #9658 [arrow] (liamzwbao) - Improve take performance on List arrays #9643 [arrow] (AdamGS)
- [Json] Replace
ArrayDatawith typed Array construction in json-reader #9497 [arrow] (liamzwbao) - feat(parquet): stream-encode definition/repetition levels incrementally #9447 [parquet] (HippoBaro)
Closed issues:
- Incorrect buffer skipping for V4 Union types in IPC
skip_field#9828 [arrow] - Replace wildcard match in
skip_fieldwith explicitDataTypehandling #9821 [arrow] - Column projection misalignment for ListView / LargeListView in IPC reader #9805 [arrow]
- Avoid panic on malformed compressed buffer prefix in IPC #9801 [arrow]
- DeltaByteArrayDecoder panics on invalid prefix lengths #9796 [parquet]
- Use NullBufferBuilder when reading json #9781 [arrow]
- Perfectly shredded arrays with top-level null values loss nullability when
typed_valueis extracted #9701 - [Parquet Metadata] API to determine page-index presence separately from page-index load #9693
- Union cast is incorrect for duplicate field names #9664 [arrow]
- List and ListView are missing
takebenchmarks #9627 [arrow] - Support RunEndEncoded arrays in comparison kernels (eq, lt, etc.) #9620 [arrow]
- variant_get should follow JSONpath semantics #9606
- GenericByteViewArray: support finding total length of all strings #9435 [arrow]
Merged pull requests:
- support length() on Run-end encoding arrays #9838 [arrow] (Rich-T-kid)
- fix(ipc): correct skip_field handling for V4 Union #9829 [arrow] (pchintar)
- fix(ipc): replace wildcard in skip_field with explicit DataType handling #9822 [arrow] (pchintar)
- Prevent buffer builder length overflow in
MutableBuffer::extend_zeros#9820 [arrow] (alamb) - Prevent repeat slice length overflow #9819 [arrow] (alamb)
- Prevent BitChunks length overflow #9818 [arrow] (alamb)
- Prevent Rows row index overflow #9817 [arrow] (alamb)
- Prevent ArrayData validation length overflow #9816 [arrow] (alamb)
- [Json] Remove arrow-data dependency from arrow-json #9812 [arrow] (liamzwbao)
- Replace
BooleanBufferBuilderwithNullBufferBuilderin arrow-json if applicable #9811 [arrow] (liamzwbao) - refactor(ipc): derive Default for CompressionContext #9809 [arrow] (mbutrovich)
- fix(ipc): reader misalignment when skipping ListView / LargeListView columns #9806 [arrow] (pchintar)
- fix(ipc): Avoid panic on malformed compressed buffer prefix #9802 [arrow] (pchintar)
- parquet: fix panic in DeltaByteArrayDecoder on invalid prefix lengths #9797 [parquet] (pchintar)
- feat(parquet): fuse level encoding with counting and histogram updates #9795 [parquet] (HippoBaro)
- Expose ColumnCloseResult on ArrowColumnChunk #9773 [parquet] (leoyvens)
- feat: make FFI structs fields
pub#9772 [arrow] (ashdnazg) - chore: Refine the error message for List to non List cast #9757 [arrow] (comphead)
- refactor(parquet): replace magic
8literals with named constants #9751 [parquet] (HippoBaro) - feat(ipc): add with_skip_validation to StreamDecoder #9749 [arrow] (pantShrey)
- remove panics in unshred variant #9741 (friendlymatthew)
- Add benchmark for ListView interleave #9738 [arrow] (vegarsti)
- arrow-arith: fix 'occured' -> 'occurred' in arity.rs comments #9736 [arrow] (SAY-5)
- Refactor
RleEncoder::flush_bit_packed_runto make flow clearer #9735 [parquet] (etseidl) - Fix RecordBatch::normalize() null bitmap bug and add StructArray::flatten() #9733 [arrow] (sqd)
- Add benchmark for cast from/to decimals #9729 [arrow] (klion26)
- refactor(arrow-avro): use
Decoder::flush_blockin async reader #9726 [arrow] (mzabaluev) - fix: ParquetError when reading corrupt parquet file with truncated data instead of Panic #9725 [parquet] (xuzifu666)
- feat(parquet): add wide-schema writer overhead benchmark #9723 [parquet] (HippoBaro)
- fix: correct accounting in
DictEncoder::estimated_memory_size,Interner::estimated_memory_size#9720 [parquet] (mzabaluev) - arrow-ipc: Write 0 offset buffer for length-0 variable-size arrays #9717 [arrow] (atwam)
- [Json] Support
FixedSizeListin json decoder #9715 [arrow] (liamzwbao) - chore(deps): bump actions/upload-pages-artifact from 4 to 5 #9713 (dependabot[bot])
- Fix clippy warning in fixed_size_binary_array.rs #9712 [arrow] (AdamGS)
- feat: add
has_non_empty_nullshelper function inOffsetBuffer#9711 [arrow] (rluvaton) - chore(deps): bump pytest from 7.2.0 to 9.0.3 in /parquet/pytest #9706 [parquet] (dependabot[bot])
- Fedora license audit #9704 [arrow] (michel-slm)
- [Variant] Take top-level nulls into consideration when extracting perfectly shredded children #9702 (AdamGS)
- feat(parquet): add
push_decoderbenchmark forPushBuffersoverhead #9696 [parquet] (HippoBaro) - Add mutable bitwise operations to
BooleanArrayandNullBuffer::union_many#9692 [arrow] (mbutrovich) - chore(deps): update hashbrown requirement from 0.16.0 to 0.17.0 #9691 [parquet] [arrow] (dependabot[bot])
- chore(deps): bump actions/github-script from 8 to 9 #9690 (dependabot[bot])
- minor: Re-enable CDC bench #9686 [parquet] (etseidl)
- [Variant] Add
VariantArrayBuilder::append_nullsAPI #9685 (sdf-jkl) - feat(parquet): add struct-column writer benchmarks #9679 [parquet] (HippoBaro)
- [Arrow] Add API to check if
Fieldhas a validExtensionType#9677 [parquet] [arrow] (sdf-jkl) - [Variant]
variant_getshould follow JSONPath semantics for Field path element #9676 (sdf-jkl) - ParquetMetaDataPushDecoder API to clear all buffered ranges #9673 [parquet] (nathanb9)
- Fix union cast incorrectness for duplicate field names #9666 [arrow] (friendlymatthew)
- chore: re-export
MAX_INLINE_VIEW_LENfromarrow_data#9665 [arrow] (rluvaton) - No longer allow BIT_PACKED level encoding in Parquet writer #9656 [parquet] (etseidl)
- feat(parquet): add sparse-column writer benchmarks #9654 [parquet] (HippoBaro)
- Support
GenericListViewArray::new_uncheckedand refactorListViewjson decoder #9648 [arrow] (liamzwbao) - [Json] Add json reader benchmarks for ListView #9647 [arrow] (liamzwbao)
- fix(parquet): fix CDC panic on nested ListArrays with null entries #9644 [parquet] (kszucs)
- Add a test for reading nested REE data in json #9634 [arrow] (alamb)
- [Variant] Fix
variant_getto returnList<T>instead ofList<Struct>#9631 (liamzwbao) - ci: use ubuntu-slim runner for lightweight CI jobs #9630 (CuteChuanChuan)
- Add bloom filter folding to automatically size SBBF filters #9628 [parquet] (adriangb)
- Add List and ListView take benchmarks #9626 [arrow] (AdamGS)
- ParquetPushDecoder API to clear all buffered ranges #9624 [parquet] (nathanb9)
- fix: handle missing dictionary batch for null-only columns in IPC reader #9623 [arrow] (joaquinhuigomez)
- Fix
MutableBuffer::clear#9622 [parquet] [arrow] (Rafferty97) - feat[arrow-ord]: suppport REE comparisons #9621 [arrow] (asubiotto)
- chore(deps): update sha2 requirement from 0.10 to 0.11 #9618 [arrow] (dependabot[bot])
- Expose option to set line terminator for CSV writer #9617 [arrow] (svranesevic)
- [Json] Add json reader benchmarks for Map and REE #9616 [arrow] (liamzwbao)
- deps: fix
object_storebreakage for 0.13.2 #9612 (mzabaluev-flarion) - [Variant] Support Binary/LargeBinary children #9610 (AdamGS)
- fix: use writer types in Skipper for resolved named record types #9605 [arrow] (ariel-miculas)
- feat(parquet): derive
PartialEqandEqforCdcOptions#9602 [parquet] (kszucs) - Add
finish_preserve_valuestoArrayBuildertrait #9601 [arrow] (adamreichold) - [Variant] extend shredded null handling for arrays #9599 (sdf-jkl)
- [Variant] Add unshredded
Structfast-path forvariant_get(..., Struct)#9597 (sdf-jkl) - Pre-reserve output capacity in ByteView/ByteArray dictionary decoding #9590 [parquet] (Dandandan)
- [Variant] Align cast logic for variant_get to cast kernel for numeric/bool types #9563 [arrow] (klion26)
- Add support to cast from
UnionArray#9544 [arrow] (friendlymatthew) - Support
ListViewcodec in arrow-json #9503 [arrow] (liamzwbao)
* This Changelog was automatically generated by github_changelog_generator