A new release is here, adding a number of new features and improvements to arrow2. Thank you to everyone that contributed to it!
This release adds support to a new format, the "record" JSON format, contributed by @AnIrishDuck, a new trait TryExtendFromSelf
to efficiently concatenate an array into an existing mutable array, and multiple improvements by @sundy-li and @ritchie46 to performance. Finally, we have a new API OffsetsBuffer
and Offsets
proposed by @ritchie46 to allow creating variable sized-arrays without having to check for offsets.
This release also features a number of contributions from first contributors:
- @benesch made their first contribution in #1271
- @RinChanNOWWW made their first contribution in #1287
- @datapythonista made their first contribution in #1290
- @sandflee made their first contribution in #1286
- @Samrose-Ahmed made their first contribution in #1279
- @jondo2010 made their first contribution in #1300
- @cyr made their first contribution in #1318
- @universalmind303 made their first contribution in #1321
Thank you everyone for the great work this year, and happy festivities everyone!
Breaking changes:
- Added values' capacity to
MutableBinaryArray::reserve
#1277 - Removed
from_data
from all arrays #1328 (jorgecarleitao) - Added
Offsets
andOffsetsBuffer
#1316 (jorgecarleitao) - Bumped parquet2 dependency #1304 (ritchie46)
- Added data_pagesize_limit to write parquet pages #1303 (sundy-li)
- Bumped arrow-format to 0.8 #1298 (Xuanwo)
- Improved iterators #1270 (jorgecarleitao)
New features:
- Added
TryExtendFromSelf
#1278 (jorgecarleitao) - Added support for JSON ser/de records layout #1275 (AnIrishDuck)
Fixed bugs:
- Parquet writes all values of sliced arrays? #1323
- Avro schema: Invalid record names #1269
- Fixed writing nested/sliced arrays to parquet #1326 (ritchie46)
- Fixed failing to accept dictionary full of nulls #1312 (ritchie46)
- Added support for Extension types in ffi #1300 (jondo2010)
- Fixed error in memory usage of sliced binary/list/utf8arrays #1293 (ritchie46)
- Fixed descending ordering when specify nulls first #1286 (sandflee)
- Added avro record names when converting arrow schema to avro #1279 (Samrose-Ahmed)
Enhancements:
- Fixed clippy #1336 (jorgecarleitao)
- Improved
UnionArray
#1331 (jorgecarleitao) - Bumped json-deserializer version #1321 (universalmind303)
- Removed flushing during arrow IPC writing to improve performance when using a buffered writer #1318 (cyr)
- Improved performance of check_indexes #1313 (ritchie46)
- Improved performance of checking offsets
~-64-73%
#1305 (ritchie46) - Added
reserve
to pushable containers in parquet extend_from_decoder #1301 (ritchie46) - Optimized slicing #1285 (jorgecarleitao)
- Improved ZipValidity iterators #1284 (ritchie46)
- Added
MutableBinaryValuesArray
#1276 (jorgecarleitao)
Documentation updates:
- Fixed link from the API to the guide #1290 (datapythonista)