Changelog
57.1.0 (2025-11-20)
Implemented enhancements:
- Eliminate bound checks in filter kernels #8865 [arrow]
- Respect page index policy option for ParquetObjectReader when it's not skip #8856 [parquet]
- Speed up collect_bool and remove
unsafe#8848 [arrow] - Error reading parquet FileMetaData with empty lists encoded as element-type=0 #8826 [parquet]
- ValueStatistics methods can't be used from generic context in external crate #8823 [parquet]
- Custom Pretty-Printing Implementation for Column when Formatting Record Batches #8821 [arrow]
- Parquet-concat: supports bloom filter and page index #8804 [parquet]
- [Parquet] virtual row group number support #8800
- [Variant] Enforce shredded-type validation in
shred_variant#8795 [arrow] - Simplify decision logic to call
FilterBuilder::optimizeor not #8781 [arrow] - [Variant] Add variant to arrow for DataType::{Binary, LargeBinary, BinaryView} #8767 [arrow]
- Provide algorithm that allows zipping arrays whose values are not prealigned #8752 [arrow]
- [Parquet] ParquetMetadataReader decodes too much metadata under point-get scenerio #8751 [parquet]
arrow-jsonsupports encoding binary arrays, but not decoding #8736 [arrow]- Allow
FilterPredicateinstances to be reused for RecordBatches #8692 [arrow] - ArrowJsonBatch::from_batch is incomplete #8684 [arrow]
- parquet-layout: More info about layout including footer size, page index, bloom filter? #8682 [parquet]
- Rewrite
ParquetRecordBatchStream(async API) in terms of the PushDecoder #8677 [parquet] - [JSON] Add encoding for binary view #8674 [arrow]
- Refactor arrow-cast decimal casting to unify the rescale logic used in Parquet variant casts #8670 [arrow]
- [Variant] Support Uuid/
FixedSizeBinary(16)shredding #8665 - [Parquet]There should be an encoding counter to know how many encodings the repo supports in total #8662 [parquet]
- Improve
parse_data_typeforList,ListView,LargeList,LargeListView,FixedSizeList,Union,Map,RunEndCoded. #8648 [arrow] - [Variant] Support variant to arrow primitive support null/time/decimal_* #8637
- Return error from
RleDecoder::resetrather than panic #8632 [parquet] - Add bitwise ops on
BooleanBufferBuilderandMutableBufferthat mutate directly the buffer #8618 [arrow] - [Variant] Add variant_to_arrow Utf-8, LargeUtf8, Utf8View types support #8567 [arrow]
Fixed bugs:
- Regression: Parsing
List(Int64)results in nullable list in 57.0.0 and a non-nullable list in 57.1.0 #8883 - Regression: FixedSlizeList data type parsing fails on 57.1.0 #8880
- (dyn ArrayFormatterFactory + 'static) can't be safely shared between threads #8875
- RowNumber reader has wrong row group ordering #8864 [parquet]
ThriftMetadataWriter::write_column_indexescannot handle aColumnIndexMetaData::NONE#8815 [parquet]- "Archery test With other arrows" Integration test failing on main: #8813 [arrow]
- [Parquet] Writing in 57.0.0 seems 10% slower than 56.0.0 #8783 [parquet]
- Parquet reader cannot handle files with unknown logical types #8776 [parquet]
- zip now treats nulls as false in provided mask regardless of the underlying bit value #8721 [arrow]
- [avro] Incorrect version in crate.io landing page #8691 [arrow]
- Array: ViewType gc() has bug when array sum length exceed i32::MAX #8681 [arrow]
- Parquet 56: encounter
error: item_reader def levels are Nonewhen reading nested field with row filter #8657 [parquet] - Degnerate and non-nullable
FixedSizeListArrays are not handled #8623 [arrow] - [Parquet]Performance Degradation with RowFilter on Unsorted Columns due to Fragmented ReadPlan #8565 [parquet]
Documentation updates:
- docs: Add example for creating a
MutableBufferfromBuffer#8853 [arrow] (alamb) - docs: Add examples for creating MutableBuffer from Vec #8852 [arrow] (alamb)
- Improve ParquetDecoder docs #8802 [parquet] (alamb)
- Update docs for zero copy conversion of ScalarBuffer #8772 [arrow] (alamb)
- Add example to convert
PrimitiveArrayto aVec#8771 [arrow] (alamb) - docs: Add links for arrow-avro #8770 [arrow] (alamb)
- [Parquet] Minor: Update comments in page decompressor #8764 [parquet] (alamb)
- Document limitations of the
arrow_integration_testcrate #8738 [arrow] (phil-opp) - docs: Add link to the Arrow implementation status page #8732 [arrow] (alamb)
- docs: Update Parquet readme implementation status #8731 [parquet] (alamb)
Performance improvements:
RowConverter::from_binaryshould opportunistically take ownership of the buffer #8685 [arrow]- Speed up filter some more (up to 2x) #8868 [arrow] (Dandandan)
- Speed up
collect_booland removeunsafe, optimizetake_bits,take_nativefor null values #8849 [arrow] (Dandandan) - Change
BooleanBuffer::append_packed_rangeto useapply_bitwise_binary_op#8812 [arrow] (alamb) - [Parquet] Avoid copying
LogicalTypeinColumnOrder::get_sort_order, deprecateget_logical_type#8789 [parquet] (alamb) - perf: Speed up Parquet file writing (10%, back to speed of 56) #8786 [parquet] (etseidl)
- perf: override
ArrayIterdefault impl fornth,nth_back,lastandcount#8785 [arrow] (rluvaton) - [Parquet] Reduce one copy in
SerializedPageReader#8745 [parquet] (XiangpengHao) - Small optimization in Parquet varint decoder #8742 [parquet] (etseidl)
- perf: override
count,nth,nth_back,lastandmaxfor BitIterator #8696 [arrow] (rluvaton) - Add
FilterPredicate::filter_record_batch#8693 [arrow] (pepijnve) - perf: zero-copy path in
RowConverter::from_binary#8686 [arrow] (mzabaluev) - perf: add optimized zip implementation for scalars #8653 [arrow] (rluvaton)
- feat: add
apply_unary_opandapply_binary_opbitwise operations #8619 [arrow] (rluvaton) - [Parquet]Optimize the performance in record reader #8607 [parquet] (hhhizzz)
Closed issues:
- Variant to NullType conversion ignores strict casting #8810
- Unify display representation for
Field#8784 - Misleading configuration name: skip_arrow_metadata #8780
- Inconsistent display for types with Metadata #8761 [arrow]
- Internal
arrow-integration-testcrate is linked fromarrowdocs #8739 [arrow] - Add benchmark for RunEndEncoded casting #8709 [arrow]
- [Varaint] Support
VariantArray::valueto return aResult<Variant>#8672
Merged pull requests:
- Fix regression caused by changes in Display for DataType - display (
List(non-null Int64)instead ofList(nullable Int64)#8890 [parquet] [arrow] (etseidl) - Support parsing for old style FixedSizeList #8882 [arrow] (alamb)
- Make ArrayFormatterFactory Send + Sync and add a test #8878 [arrow] (tobixdev)
- Make
ArrowReaderOptions::with_virtual_columnserror rather than panic on invalid input #8867 [parquet] (alamb) - Fix errors when reading nested Lists with pushdown predicates. #8866 [parquet] (alamb)
- Fix
RowNumberReaderwhen not all row groups are selected #8863 [parquet] (vustef) - Respect page index policy option for ParquetObjectReader when it's not skip #8857 [parquet] (zhuqi-lucas)
- build(deps): update apache-avro requirement from 0.20.0 to 0.21.0 #8832 [arrow] (dependabot[bot])
- Allow Users to Provide Custom
ArrayFormatters when Pretty-Printing Record Batches #8829 [arrow] (tobixdev) - Allow reading of improperly constructed empty lists in Parquet metadata #8827 [parquet] (etseidl)
- [Variant] Fix cast logic for Variant to Arrow for DataType::Null #8825 (klion26)
- remove T: ParquetValueType bound on ValueStatistics #8824 [parquet] (pmarks)
- build(deps): update lz4_flex requirement from 0.11 to 0.12 #8820 [parquet] [arrow] (dependabot[bot])
- Fix bug in handling of empty Parquet page index structures #8817 [parquet] (etseidl)
- Parquet-concat: supports page index and bloom filter #8811 [parquet] (mapleFU)
- [Doc] Correct
ListArraydocumentation #8803 [arrow] (liamzwbao) - [Parquet] Add additional docs for
ArrowReaderOptionsandArrowReaderMetadata#8798 [parquet] (alamb) - [Variant] Enforce shredded-type validation in
shred_variant#8796 (liamzwbao) - Add
VariantPath::is_empty#8791 (friendlymatthew) - Add FilterBuilder::is_optimize_beneficial #8782 [arrow] (pepijnve)
- [Parquet] Allow reading of files with unknown logical types #8777 [parquet] (etseidl)
- bench: add
ArrayIterbenchmarks #8774 [arrow] (rluvaton) - Update Rust toolchain to 1.91 #8769 [parquet] [arrow] (mbrobbel)
- [Variant] Add variant to arrow for
DataType::{Binary/LargeBinary/BinaryView}#8768 [arrow] (klion26) - feat: parse
DataType::Union,DataType::Map,DataType::RunEndEncoded#8765 [arrow] (dqkqd) - Add options to control various aspects of Parquet metadata decoding #8763 [parquet] (etseidl)
- feat: Ensure consistent metadata display for data types #8760 [arrow] (mhilton)
- Clean up predicate_cache tests #8755 [parquet] (alamb)
- refactor
test_cache_projection_excludes_nested_columnsto use high level APIs #8754 [parquet] (alamb) - Add
mergeandmerge_nkernels #8753 [arrow] (pepijnve) - Fix lint in arrow-flight by updating assert_cmd after it upgraded #8741 [arrow] [arrow-flight] (vegarsti)
- Remove link to internal
arrow-integration-testcrate from mainarrowcrate #8740 [arrow] (phil-opp) - Implement hex decoding of JSON strings to binary arrays #8737 [arrow] (phil-opp)
- [Parquet] Adaptive Parquet Predicate Pushdown #8733 [parquet] (hhhizzz)
- [Parquet] Return error from
RleDecoder::reloadrather than panic #8729 [parquet] (liamzwbao) - fix:
ArrayIterdoes not report size hint correctly after advancing from the iterator back #8728 [arrow] (rluvaton) - perf: Use Vec::with_capacity in cast_to_run_end_encoded #8726 [arrow] (vegarsti)
- [Variant] Fix the index of an item in VariantArray in a unit test #8725 (martin-g)
- build(deps): bump actions/download-artifact from 5 to 6 #8720 (dependabot[bot])
- [Variant] Add try_value/value for VariantArray #8719 (klion26)
- General virtual columns support + row numbers as a first use-case #8715 [parquet] (vustef)
- feat: Parquet-layout add Index and Footer info #8712 [parquet] (mapleFU)
- fix:
zipnow treats nulls as false in provided mask regardless of the underlying bit value #8711 [arrow] (rluvaton) - Add benchmark for casting to RunEndEncoded (REE) #8710 [arrow] (vegarsti)
- [Minor]: Document visibility for enums produced by Thrift macros #8706 [parquet] (etseidl)
- Update
arrow-avroREADME.mdversion to 57 #8695 [arrow] (jecsand838) - Fix: ViewType gc on huge batch would produce bad output #8694 [arrow] (mapleFU)
- Refactor arrow-cast decimal casting to unify the rescale logic used in Parquet variant casts #8689 [arrow] (liamzwbao)
- check bit width to avoid panic in DeltaBitPackDecoder #8688 [parquet] (rambleraptor)
- [thrift-remodel] Use
thrift_enummacro forConvertedType#8680 [parquet] (etseidl) - [JSON] Map key supports utf8 view #8679 [arrow] (mapleFU)
- [JSON] Add encoding for binary view #8675 [arrow] (mapleFU)
- [Parquet] Account for FileDecryptor in ParquetMetaData heap size calculation #8671 [parquet] (adamreeve)
- chore: update
OffsetBuffer::from_lengths(std::iter::repeat_n(<val>, <repeat>));withOffsetBuffer::from_repeated_length(<val>, <repeat>);#8669 [arrow] (rluvaton) - [Variant] Support
shred_variantfor Uuids #8666 (friendlymatthew) - [Variant] Remove
create_test_variant_arrayhelper method #8664 (friendlymatthew) - [parquet] Adding counting method in thrift_enum macro to support ENCODING_SLOTS #8663 [parquet] (hhhizzz)
- chore: add test case of RowSelection::trim #8660 [parquet] (lichuang)
- feat: add
new_repeatedtoByteArray#8659 [arrow] (rluvaton) - perf: add
repeat_slice_n_timestoMutableBuffer#8658 [arrow] (rluvaton) - perf: add optimized function to create offset with same length #8656 [arrow] (rluvaton)
- [Variant]
rescale_decimalfollowup #8655 [arrow] (liamzwbao) - feat: parse DataType
List,ListView,LargeList,LargeListView,FixedSizeList#8649 [arrow] (dqkqd) - Support more operations on ListView #8645 [arrow] (a10y)
- [Variant] Implement primitive type access for null/time/decimal* #8638 (klion26)
- [Variant] refactor: Split builder.rs into several smaller files #8635 (Weijun-H)
- add
try_new_with_lengthconstructor toFixedSizeList#8624 [arrow] (connortsui20) - Change some panics to errors in parquet decoder #8602 [parquet] (rambleraptor)
- Support
variant_to_arrowfor utf8 #8600 [arrow] (sdf-jkl) - Cast support for RunEndEncoded arrays #8589 [arrow] (vegarsti)
* This Changelog was automatically generated by github_changelog_generator