A new version (0.13) is now available on crates.io! 🎉🎉🎉🎉
This is another large release of arrow2. Among the many, many changes (see below), it is worth noting:
- Added copy-on-write API to perform operations in place, improving performance of expressions like
(a + b) * 2
by a factor of 2-10x - Added support to read from Apache ORC format
- Added support for projection and limit pushdown when reading from Arrow IPC format
- Added support for
f16
Thank you to the numerous contributors, both via PRs and issues, that resulted in this fantastic release 🙇
Breaking changes:
- Made
nested
argument ofarray_to_pages
non-owning #1174 - Replaced
Result
bypanic
in boolean comparison #1159 (jorgecarleitao) - Improved dictionary invariants #1137 (jorgecarleitao)
- Change signature of PrimitiveScalar::value to return reference #1129 (ncpenke)
- Removed need to pass encodings by value #1123 (ritchie46)
- Removed unused
NativeType::to_ne_bytes
#1112 (jorgecarleitao) - Avoid clone in
with_validity
#1104 (jorgecarleitao) - Reduced need of
unsafe
in FFI #1100 (jorgecarleitao) - Removed
Buffer::into_mut
andmake_mut
functions #1089 (jorgecarleitao) - Renamed
Bitmap::null_count
toBitmap::unset_bits
#1087 (jorgecarleitao) - Made
chunk_size
optional in parquet'scolumn_iter_to_arrays
#1055 (jorgecarleitao) - Migrated from
Arc<dyn Array>
toBox<dyn Array>
#1042 (jorgecarleitao)
New features:
- Added support to read ORC #1189 (jorgecarleitao)
- Added support for limit pushdown to IPC reading #1135 (jorgecarleitao)
- Added support to write and read Intervals from and to parquet #1122 (jorgecarleitao)
- Added support to write
FixedSizeBinary
to Avro #1118 (jorgecarleitao) - Added support for projections in reading IPC streams #1097 (joshuataylor)
- Added support to write parquet
_metadata
sidecar #1063 (jorgecarleitao) - Added cow APIs (2x-10x vs non-cow) #1061 (jorgecarleitao)
- Added support to read and write f16 #1051 (jorgecarleitao)
Fixed bugs:
- Fixed error not implemented error when reading plain, after-dict pages for fix-len-binary from parquet #1192 (jorgecarleitao)
- Fixed error in decoding nested multi-page columns from parquet #1188 (jorgecarleitao)
- Fixed error in counting items in nested parquet #1182 (jorgecarleitao)
- Fixed reading stats from int96 parquet #1181 (jorgecarleitao)
- Fixed limit pushdown in parquet #1180 (jorgecarleitao)
- use
FnOnce
forPrimitiveArray::apply_validity
#1176 (ritchie46) - release memory on predicate with 0% selectivity #1163 (ritchie46)
- Fixed error in reading
Struct<List<...>>
from parquet #1150 (jorgecarleitao) - Fixed IPC projection #1149 (ritchie46)
- Fixed casting dictionary keys #1143 (ritchie46)
- Fixed reading arrays from parquet with required children #1140 (jorgecarleitao)
- Fixed panic in deserializing nested statistics #1139 (jorgecarleitao)
- Aligned name of
FixedSizeBinaryArray::values_iter
#1117 (jorgecarleitao) - Fixed error in
FixedSizeListArray::new_null
#1114 (jorgecarleitao) - Fixed panic in writing dictionaries to parquet #1113 (jorgecarleitao)
- Fixed error in reading chunked parquet #1108 (jorgecarleitao)
- Raise error when invalid fields are passed to flight #1093 (jorgecarleitao)
- Made IPC projection not sort projection #1082 (jorgecarleitao)
- Fixed error in chunked_mut bitmap #1081 (jorgecarleitao)
- Fixed panic in bitmap assign_mut #1078 (ritchie46)
- Panic-free read of IPC files #1075 (jorgecarleitao)
- Bumped parquet2 (minor) requirement #1071 (jorgecarleitao)
- Fixed divide by zero on reading empty row group #1062 (jorgecarleitao)
- Fixed missing validation of number of encodings passed when writing to parquet #1057 (jorgecarleitao)
Enhancements:
- Improved performance of reading Binary from parquet #1190 (ritchie46)
- Bumped to latest nightly #1186 (gyscos)
- Improved error message #1179 (jorgecarleitao)
- Added support to read and write nested dictionaries to parquet #1175 (jorgecarleitao)
- Added
MutableUtf8Array::into_data
#1170 (ritchie46) - Added
Default
forUtf8Array
#1169 (ritchie46) - fix(parquet): allow to read other logical types from parquet #1168 (sundy-li)
- fix(parquet): enforce to use ParquetTimeUnit::Nanoseconds for PhysicalType::Int96 #1167 (sundy-li)
- Added constructor
MutableFixedSizeListArray::new_from
#1161 (hohav) - Removed unneeded
Default
constraint #1157 (hohav) - Improved checks to safety invariants in FFI #1154 (jorgecarleitao)
- Removed un-needed indirection #1153 (jorgecarleitao)
- Soften generic constraint of
Buffer
#1152 (sundy-li) - Use ahash by default #1148 (ritchie46)
- Reduced bound checks #1142 (ritchie46)
- Moved
Bytes
to own crate #1141 (jorgecarleitao) - Fixed clippy for 1.62 #1134 (Xuanwo)
- Cleaned example #1130 (jorgecarleitao)
- Removed
O(N)
clone in writing CSV #1128 (jorgecarleitao) - Avoid zeroed allocation in reading avro #1127 (jorgecarleitao)
- Reduced allocations of reading bitmaps from IPC #1126 (jorgecarleitao)
- Improved performance of reading from IPC #1125 (jorgecarleitao)
- Improved parquet read performance #1124 (jorgecarleitao)
- Optimized write nulls to Avro #1119 (jorgecarleitao)
- Made
row_group::get_field_columns
public #1110 (ritchie46) - Removed some panics reading invalid parquet files #1106 (jorgecarleitao)
- Reduced reallocations when reading from IPC (
~12%
) #1105 (ritchie46) - Exposed utilities in
io::flight
#1094 (jorgecarleitao) - Accept decoding parquet's
i64
intou32
written bypyarrow
#1090 (jorgecarleitao) - Simplified code #1088 (jorgecarleitao)
- Removed un-necessary allocation in
assign_ops
#1085 (jorgecarleitao) - Replaced some macros by generics #1084 (jorgecarleitao)
- Improved performance of
Bitmap::make_mut
with offset #1079 (jorgecarleitao) - Implemented
Default
forPrimitiveArray
#1073 (ritchie46) - Expose share counts in
Buffer
#1072 (ritchie46) - Added
compute::arity_assign
#1070 (jorgecarleitao) - Improved performance in lexical write (~5%) #1067 (ritchie46)
- Added cast to/from
Null
from/to every type #1066 (jorgecarleitao) - prevent unneeded offset check #1059 (ritchie46)
Documentation updates:
- Fixed parquet write example #1193 (rajasekarv)
- Improved docs #1164 (jorgecarleitao)
- Minor cleanup of internal namings #1160 (jorgecarleitao)
- Added example reading Avro produced by Kafka #1151 (jorgecarleitao)
- Updated license wording #1138 (jorgecarleitao)
- Fixed wrong package name in examples #1133 (Xuanwo)
- Improved example #1131 (jorgecarleitao)
- Added more tests #1111 (jorgecarleitao)
- Improved examples #1109 (jorgecarleitao)
- Improved internal docs #1107 (jorgecarleitao)
- Added notes about creating parquet files and submodules in the development documentation #1096 (joshuataylor)
- Improved docs for
BooleanArray
#1083 (jorgecarleitao) - Added missing link to guide #1065 (jorgecarleitao)
- Improve Docs Readability #1054 (ryanrussell)
Testing updates:
- Temporary skip decimal256 integration tests #1198 (jorgecarleitao)
- Simplified code #1183 (jorgecarleitao)
- Made kafka schema_id
u32
in example #1162 (jorgecarleitao) - Added more tests #1158 (jorgecarleitao)
- Bumped MIRI #1156 (jorgecarleitao)
- Simplified code in flight integration tests #1136 (jorgecarleitao)
- Added more tests for nested parquet #1121 (jorgecarleitao)
- Added more tests for reading and writing CSV #1120 (jorgecarleitao)
- Added test for scalar division #1115 (jorgecarleitao)
- Added more tests #1103 (jorgecarleitao)
- Enabled more integration tests with pyarrow #1102 (jorgecarleitao)
- Simplified
Bytes
(internal) #1099 (jorgecarleitao) - Updated patch to arrow integration tests #1068 (jorgecarleitao)
- Added more tests #1064 (jorgecarleitao)