Changelog
54.1.0 (2025-01-29)
Implemented enhancements:
- Create GitHub releases automatically on tagging #7041
- Add required methods to access inner builder for
NullBufferBuilder
#7002 [arrow] - Re-export
NullBufferBuilder
in the arrow crate #6975 [arrow] arrow-string
function should support binary input as well #6923 [arrow]- MMap support for IPC files #6709 [arrow]
- fix: mark (Large)ListView as nested and support in equal data type #6995 [arrow] (rluvaton)
- Expose min/max values for Decimal128/256 and improve docs #6992 [arrow] (alamb)
- [Parquet] Improve speed of dictionary encoding NaN float values #6953 [parquet] (adamreeve)
- Optimize
BooleanBufferBuilder
for non nullable columns #6973 [arrow] arrow::compute::concat
should merge dictionary type when concatenating list of dictionaries #6888 [arrow]- Improve error message for unsupported cast between struct and other types #6724 [arrow]
- implement regexp_match, regexp_scalar_match and regexp_array_match for StringViewArray #6717 [arrow]
- Speed up Parquet utf8 validation #6667 [parquet]
Fixed bugs:
- Regression: Concatenating sliced
ListArray
s is broken #7034 PrimitiveDictionaryBuilder
with specific value data type and capacity #7011 [arrow]- Arrow IPC Writer Panics for sliced nested arrays #6997 [arrow]
- RecordBatch with no columns cannot be roundtripped through Parquet #6988 [parquet]
- StringView: Using the Interleave kernel (and potentially others) results in many repeated buffers in variadic_buffers #6780 [arrow]
- fix prefetch of page index #6999 [parquet] (adriangb)
- fix: Parquet column writer
Dictionary(_, Decimal128)
andDictionary(_, Decimal256)
#6987 [parquet] (korowa) - Writing floating point values containing NaN to Parquet is slow when using dictionary encoding #6952 [parquet] [arrow]
- Public API using private types:
Buffer::from_bytes
takes unexportedBytes
#6754 [parquet] [arrow] [arrow-flight] - Some MSRVs are inaccurate #6741 [parquet] [arrow] [arrow-flight]
Documentation updates:
- docs: add to bit slice iterator docs that the start value is inclusive and end value is exclusive #7022 [arrow] (rluvaton)
- Fix duplicate link references in README #7020 (Jefffrey)
- Enhance ListViewArray related docs #7007 [arrow] (Jefffrey)
- Document data type support and examples to predicates
*like
,starts_with
,ends_with
,contains
#7003 [arrow] (alamb) - Minor: improve documentation on timezone representations #7000 [arrow] (alamb)
- Add additional documentation for UTC representation of timestamps #6994 [arrow] (Abdullahsab3)
- Improve
ParquetRecordBatchStreamBuilder
docs / examples #6948 [parquet] (alamb) - Document the
ParquetRecordBatchStream
buffering #6947 [parquet] (alamb) - Minor: improve
zip
kernel docs, add examples #6928 [arrow] (alamb) - Add doctest example for
Buffer::from_bytes
#6920 [arrow] (kylebarron) - [object store] Add planned object_store release schedule to crate readme #6904 (alamb)
- Avoid panics? #6737 [parquet]
Merged pull requests:
- Create GitHub releases automatically on tagging #7042 (kou)
- Fix
concat
for slicedListArrays
#7037 [arrow] (alamb) - Minor: Clarify NullBufferBuilder::new capacity parameter #7016 [arrow] (alamb)
- Add
is_valid
andtruncate
methods toNullBufferBuilder
#7013 [arrow] (Chen-Yuan-Lai) - fix: use the values builder capacity for the hash map in
PrimitiveDictionaryBuilder::new_from_builders
#7012 [arrow] (rluvaton) - Refactor ipc reading code into methods on
ArrayReader
#7006 [arrow] (alamb) - Minor: make it clear Predicate is crate private #7001 [arrow] (alamb)
- fix: Panic on reencoding offsets in arrow-ipc with sliced nested arrays #6998 [arrow] (HawaiianSpork)
- Add check for empty schema in
parquet::schema::types::from_thrift_helper
#6990 [parquet] (etseidl) - Add example reading data from an
mmap
ed IPC file #6986 [arrow] (alamb) - Improve
arrow-ipc
documentation #6983 [arrow] (alamb) - Add
simdutf8
feature to makesimdutf8
optional, consolidatecheck_valid_utf8
#6979 [parquet] (alamb) - Export NullBufferBuilder along with BooleanBufferBuilder in
arrow
crate #6976 [arrow] (alamb) - Minor: improve the documentation of NullBuffer and BooleanBuffer #6974 [arrow] (alamb)
- Simplify Validation/Alignment APIs of
ArrayDataBuilder
: validate and align #6966 [arrow] (alamb) - Fix WASM CI for Rust 1.84 release #6963 (alamb)
- [Parquet] Add benchmark and test for writing NaNs to Parquet #6955 [parquet] [arrow] (adamreeve)
- Add
peek_next_page_offset
toSerializedPageReader
#6945 [parquet] (XiangpengHao) - Improve
Buffer
documentation, deprecateBuffer::from_bytes
addFrom<Bytes>
andFrom<bytes::Bytes>
impls #6939 [parquet] [arrow] [arrow-flight] (alamb) - minor: fix test and remove println in tests #6935 [arrow] (himadripal)
- Document how to use Extend for generic methods on ArrayBuilders #6932 [arrow] (wiedld)
- [Parquet] Add projection utility functions #6931 [parquet] (XiangpengHao)
- [Parquet] Reuse buffer in
ByteViewArrayDecoderPlain
#6930 [parquet] (XiangpengHao) - Support
Binary
arrays instarts_with
,ends_with
andcontains
#6926 [arrow] (rluvaton) - Improve the error message for casting between struct and non-struct types #6919 [arrow] (takaebato)
- Fix error message typos with Parquet compression #6918 [parquet] (orf)
- Expose arrow-schema methods, for use when writing parquet outside of ArrowWriter #6916 [parquet] (wiedld)
- feat(arrow-ord): support boolean in
rank
and add tests for sorting lists of booleans #6912 [arrow] (rluvaton) - chore(arrow-ord): move
can_rank
to therank
file #6910 [arrow] (rluvaton) - feat(parquet): Add next_row_group API for ParquetRecordBatchStream #6907 [parquet] (Xuanwo)
- feat(arrow-select):
concat
kernel will merge dictionary values for list of dictionaries #6893 [arrow] (rluvaton) - add
extend_dictionary
in dictionary builder for improved performance #6875 [arrow] (rluvaton) - [arrow-string] Implement string view support for
regexp_match
#6849 [arrow] (tlm365) - Add support
StringView
/BinaryView
ininterleave
kernel #6779 [arrow] (onursatici) RecordBatch
normalization (flattening) #6758 [arrow] (ngli-me)- Convert some panics that happen on invalid parquet files to error results #6738 [parquet] (jp0317)
- Faster parquet utf8 validation using
simdutf8
#6668 [parquet] (Dandandan)
* This Changelog was automatically generated by github_changelog_generator