Another release of arrow2 is here!
Besides API improvements to reading IPC and parquet, there are two main new features, the ability to memory map arrow files (check out https://jorgecarleitao.github.io/arrow2/v0.14.0/guide/io/ipc_mmap.html) and support for decimal 256.
The following had their first time contribution to their crate:
- @daniel-martinez-maqueda-sap made their first contribution in #1204
- @AnIrishDuck made their first contribution in #1211
- @samkaufman made their first contribution in #1213
- @teymour-aldridge made their first contribution in #1225
- @poga made their first contribution in #1234
- @knil-sama made their first contribution in #1237
Thank you everyone for all the issues, PRs and ideas!
Breaking changes:
- Removed
Count
(parquet statistics) #1217 (jorgecarleitao) - Exposed parquet indexed page filtering to
FileReader
#1216 (jorgecarleitao) - Simpler IPC API #1208 (jorgecarleitao)
- Migrated Avro code to avro-schema repo #1199 (jorgecarleitao)
- Added support for decimal 256 #1194 (jorgecarleitao)
New features:
- Added support for decoding delta-length-encoded binary (parquet) #1228 (jorgecarleitao)
- Added support to read and write Parquet's delta-bitpacked (integer encoding) #1226 (jorgecarleitao)
- Added support for parquet sidecar to
FileReader
#1215 (jorgecarleitao) - Write 64bit aligned IPC files #1201 (jorgecarleitao)
- Added support to mmap IPC format #1197 (jorgecarleitao)
- Added
MutableStructArray
#1196 (hohav)
Fixed bugs:
- Stack overflow in parquet RowGroupReader with groups_filter #1206
- fixed comparisson and validity kernels #1243 (ritchie46)
- Fixed reading nested stats #1240 (jorgecarleitao)
FileSink
now closes the underlying writer. #1213 (samkaufman)- Fixed JSON infer order #1212 (jorgecarleitao)
- Fixed StackOverflow in skipping many parquet row groups #1210 (jorgecarleitao)
- Fix escaped like wildcards #1204 (daniel-martinez-maqueda-sap)
- Removed println :( #1203 (jorgecarleitao)
Enhancements:
- Added schema to FileReader #1246 (jorgecarleitao)
- Simpler nested parquet read #1241 (jorgecarleitao)
- Removed unneeded code #1229 (jorgecarleitao)
- Improved
MutableStruct::push
#1223 (hohav) - Reduced binary size #1221 (jorgecarleitao)
- Added utf8 <> binary cast #1220 (jorgecarleitao)
- split parquet compression backend features #1207 (ritchie46)
- Improved API of
mmap
#1205 (ritchie46) - Added
MutableArray::reserve
#1202 (jorgecarleitao) - Delayed dict #1185 (jorgecarleitao)
Documentation updates:
- Fixed guide and improved examples #1247 (jorgecarleitao)
- Added documentation on parquet compatibility under
TimeUnit
. #1238 (TurnOfACard) - Fixed typo in error message for impl StructArray #1237 (knil-sama)
- Fixed incorrect command in doc for generating ORC files #1234 (poga)
- Improved github page generation #1233 (jorgecarleitao)
- Fix a typo in the docs #1225 (teymour-aldridge)
- Fix some doc links/typos #1211 (AnIrishDuck)
Testing updates:
- Fixed clippy warnings #1227 (jorgecarleitao)
- Updated integration test #1214 (jorgecarleitao)