A new release is here! 🎉🎉🎉🎉 This release has four major improvements:
- It is now backed by std's
Vec
, thus making it- zero-copy with the rest of Rust's ecosystem
- use less
unsafe
- more ergonomics
- faster to compile
- (no difference in performance)
- It now supports reading from, and writing to, Apache Avro, both
sync
andasync
- flatbuffers dependency was replaced by
planus
, a re-implementation of the flatbuffers specification in Rust (you should check out that project, awesome work by @kristoff3r and @TethysSvensson)- lower risks of
unsound
- easier-to-maintain code base
- lower risks of
- Improved security and general maintenance:
- Made most of the crate
#[forbid(unsafe)]
- significantly reduced the use of
unsafe
viabytemuck
's dependency - made most of parsing of Arrow IPC
panic
-free, to reduce risks of DOS from untrusted data
- Made most of the crate
A big thanks to all contributors (listed below) and our users for all the dedication, hard work, and patience. 🙇
Breaking changes:
- Added number of rows read in CSV inference #765 (jorgecarleitao)
- Refactored
nullif
#753 (jorgecarleitao) - Migrated to latest parquet2 #752 (jorgecarleitao)
- Replace flatbuffers dependency by Planus #732 (jorgecarleitao)
- Simplified
Schema
andField
#728 (jorgecarleitao) - Replaced
RecordBatch
byChunk
#717 (jorgecarleitao) - Removed
Option
from fields' metadata #715 (jorgecarleitao) - Moved dict_id to IPC-specific IO #713 (jorgecarleitao)
- Moved is_ordered from
Field
toDataType::Dictionary
#711 (jorgecarleitao) - Refactored JSON writing (5-10x) #709 (jorgecarleitao)
- Made Avro read API use
Block
andCompressedBlock
#698 (jorgecarleitao) - Simplified most traits #696 (jorgecarleitao)
- Replaced
Display
byDebug
forArray
#694 (jorgecarleitao) - Replaced
MutableBuffer
bystd::Vec
#693 (jorgecarleitao) - Simplified
Utf8Scalar
andBinaryScalar
#660 (jorgecarleitao) - Simplified Primitive and Boolean scalar #648 (jorgecarleitao)
New features:
- Add
and_scalar
andor_scalar
for boolean_kleene #662 - Add
lower
andupper
support for string #635 - Added support to cast decimal #761 (jorgecarleitao)
- Added support to deserialize JSON (!= NDJSON) #758 (jorgecarleitao)
- Added support to infer nested json structs #750 (jorgecarleitao)
- Added support to compare intervals #746 (jorgecarleitao)
- Added
any
andall
kernel #739 (ritchie46) - Added support to write Avro async #736 (jorgecarleitao)
- Added support to write interval to Avro #734 (jorgecarleitao)
- Added
and_scalar
andor_scalar
for boolean kleene #723 (silathdiir) - Added
and_scalar
andor_scalar
for boolean #707 (silathdiir) - Refactored JSON read to split IO-bounded from CPU-bounded tasks #706 (jorgecarleitao)
- Added more conversions from parquet #701 (jorgecarleitao)
- Added support for compressed Avro write #699 (jorgecarleitao)
- Added support to write to Avro #690 (jorgecarleitao)
- Added dynamic version of negation #685 (jorgecarleitao)
- Added support to read dictionary-encoded required parquet pages #683 (mdrach)
- Added
upper
#664 (Xuanwo) - Added
lower
#641 (Xuanwo) - Added support for
async
read of Avro #620 (jorgecarleitao)
Fixed bugs:
- Pyarrow and Arrow2 don't agree on Timestamp resolution #700
- Writing compressed dictionary in parquet corrupts the files #667
- Replaced assert by error in IPC read #748 (jorgecarleitao)
- Made all panics in IPC read errors #722 (jorgecarleitao)
- Fixed error in compare booleans #721 (jorgecarleitao)
- Fixed error in dispatching scalar arithmetics #682 (jorgecarleitao)
- Fixed error in reading negative decimals from parquet #679 (mdrach)
- Made IPC reader less restrictive #678 (jorgecarleitao)
- Fixed error in trait constraint in compute #665 (jorgecarleitao)
- Fixed performance regression of CSV reading #657 (jorgecarleitao)
- Fixed filter of predicate with validity #653 (ritchie46)
- Made
Scalar: Send+Sync
#644 (jorgecarleitao)
Enhancements:
- Feature: JSON IO? #712
- Simplified code #760 (jorgecarleitao)
- Added iterator of values of
FixedBinaryArray
#757 (jorgecarleitao) - Remove un-needed
unsafe
#756 (jorgecarleitao) - Replaced un-needed
unsafe
#755 (jorgecarleitao) - Made IO
#[forbid(unsafe)]
#749 (jorgecarleitao) - Improved reading nullable Avro arrays #727 (Igosuki)
- Allow to create primitive array by vec without extra memcopy #710 (sundy-li)
- Removed requirement of
use Array
to access primitives'data_type
#697 (jorgecarleitao) - Cleaned up trait usage and added forbid_unsafe to parts #695 (jorgecarleitao)
- Migrated from
avro-rs
toavro-schema
#692 (jorgecarleitao) - Added
MutablePrimitiveArray::extend_constant
#689 (jorgecarleitao) - Do not write validity without nulls in IPC #688 (jorgecarleitao)
- DRY code via macro #681 (jorgecarleitao)
- Made
dyn Array
andScalar
usable in#[derive(PartialEq)]
#680 (jorgecarleitao) - Made IPC ZSTD-compressed consumable by pyarrow #675 (jorgecarleitao)
- Simplified trait bounds in arithmetics #671 (jorgecarleitao)
- Improved performance of reading utf8 required from parquet (-15%) #670 (jorgecarleitao)
- Avoid double utf8 checks on MutableUtf8 -> Utf8 #655 (jorgecarleitao)
- Made
Buffer::offset
public #652 (ritchie46) - Improved performance in cast Primitive to Binary/String (2x) #646 (sundy-li)
- Made
Filter: Send+Sync
#645 (jorgecarleitao) - Made API to create field accept
String
#643 (jorgecarleitao)
Documentation updates:
- Fixed clippy (coming from 1.58) #763 (jorgecarleitao)
- Described how to run part of the tests #762 (jorgecarleitao)
- Improved README #735 (jorgecarleitao)
- clarify boolean value in DataType::Dictionary #718 (ritchie46)
- readme typo #687 (max-sixty)
- Added example to read parquet in parallel with rayon #658 (jorgecarleitao)
- Added documentation to
Bitmap::as_slice
#654 (ritchie46)
Testing updates:
- Improved json tests #742 (jorgecarleitao)
- Added integration tests for writing compressed parquet #740 (jorgecarleitao)
- Updated patch for integration test #731 (jorgecarleitao)
- Added cargo check to benchmarks #730 (sundy-li)
- More tests to CSV writing #724 (jorgecarleitao)
- Added integration tests for other compressions with parquet from pyarrow #674 (jorgecarleitao)
- Bumped nightly in CI #672 (jorgecarleitao)
- Invalidate caches from CI. #656 (jorgecarleitao)