🏆 Highlights
- Rename list namespace accesor from
.arr
to.list
(#8999) Array
(backed byarrow::FixedSizeList
datatype (#8943)
⚠️ Breaking changes
- propagate null in equality comparisons (#9053)
- formalize implode -> explode relation (#9038)
- consistently return list of date/datetime from lazy date_range (#8513)
- Rename list namespace accesor from
.arr
to.list
(#8999) - disallow time zones other than those in zoneinfo.available_timezones() (#8993)
- remove window expression magic (#8992)
- raise error when sorted flag not set (#8994)
- in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
- parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)
- Remove deprecated tz_aware argument (#8696)
🚀 Performance improvements
- speed up write_csv for time-zone-aware columns (#9093)
- parallelize rolling_window group materialization (#9095)
- elide hot loop in hash joins (#9075)
- improve list explode perf (#8974)
- Improve explodes:
offsets_to_indexes
performance (#8964) - avoid quadratic
exclude
behaviour when selecting against dtypes and/or wildcards (#8953) - use simd-json for all json parsing (#8922)
- improve
json_extract
(#8858) - add optimizer passes and change initial order (#8811)
- fused multiply sub / sub multiply (#8799)
- improve parallel work distribution of sort expression
~4x
(#8775) - change default row-group size (#8758)
✨ Enhancements
- conversion from
Utf8
toDecimal
. (#9090) - default to checking sortedness in groupby_rolling… (#9063)
- propagate null in equality comparisons (#9053)
- implement apply for rolling/dynamic_groupby (#9049)
- implement strategy=nearest for join_asof (#9024)
- arr.sum expression (#9041)
- formalize implode -> explode relation (#9038)
- add array namespace and min/max expression (#9032)
- improve error message on row-wise overflow (#9021)
- properly apply slice at UNION level (#9018)
- consistently return list of date/datetime from lazy date_range (#8513)
- disallow time zones other than those in zoneinfo.available_timezones() (#8993)
- raise error when sorted flag not set (#8994)
- in Series constructor, if inputs are time-zone-aware datetimes, convert to UTC (#8881)
- parse offset-naive date time strings as Timestamp(time_unit), offset-aware datetime strings as Timestamp(time_unit, "UTC"), and remove the utc argument (#8714)
- error on invalid sortby expr (#8986)
- Pushdown
is_in
to pyarrow dataset (#8930) Array
(backed byarrow::FixedSizeList
datatype (#8943)- multiple enhancements for
SQLContext
(#8944) - add sql UNION, UNION ALL & UNION DISTINCT (#8936)
- add sql compound identifiers (#8934)
- add sql EXCLUDE (#8913)
- add sql CASE (#8911)
- add sql EXPLAIN (#8897)
- improve
json_extract
(#8858) - add support for sql DISTINCT ON (#8824)
- add LazyFrame
null_count
(#8837) - check categorical cache on transpose (#8836)
- add support for
OFFSET
keyword in SQL queries (#8833) - add a new
time_range
utility function (#8776) - Add hint to use _saturating on overflow (#8805)
- support boolean addition (#8778)
- improved detail in several error messages (#8747)
🐞 Bug fixes
- rolling_groupy was returning incorrect results when offset was positive (#9082)
- fix null/empty in List::take_unchecked (#9074)
- repeat by (#9023)
- raise in to_datetime/strptime if format contains hour but not minute directive (#9044)
- propagate nulls in broadcasting of order comparisons (#9050)
- fix apply with passed date/datetime return_dtype (#9035)
- raise error on invalid aggregation (#9013)
- fix fused arithmetic in window functions (#9012)
- JoinBuilder::force_parallel is modifying allow_parallel (#8617)
- Fix erroneous warning in
hist
(#8982) - respect rechunk in parquet (#8935)
- Simplify offsets_to_indexes, fix empty offsets edge cases (#8920)
- sql qualified wildcards (#8916)
- don't check sortedness in asof by (#8906)
- check for object type in csv writer (#8894)
- window function with filtered groups (#8880)
- parse offset-aware strings as UTC in read_csv when try_parse_dates=True (#8864)
- free buffer, but not its contents (#8848)
- improve agg expr field types (#8834)
- sql
BETWEEN
bounds should be inclusive (#8818) - sort cached window groups (#8813)
- check null data before take (#8812)
- fix broadcasting on integer bitwise (#8798)
- correct aggregation of overlapping groups (#8794)
- modify join error (#8768)
- don't parallelize sort within rayon job (#8774)
- fix deadlock in cache and improve parallelism/work… (#8765)
- check offset before doing owned mutation (#8760)
- validate data on successful deserialization (#8757)
- improve supertype coercion of functions (#8755)
🛠️ Other improvements
- use concrete type for time zones (#9076)
- factor add_month out of add_impl_month_week_or_day (#9066)
- remove unnecessary timezone trait usage, use concrete type (#9065)
- Fix broken links (#9072)
- bump sqlparser version (#9043)
- move list namespace functions to seperate module (#9040)
- Clean up
arange
/date_range
/time_range
(#9027) - Rename list namespace accesor from
.arr
to.list
(#8999) - replace pattern match with unwrap (#9000)
- remove window expression magic (#8992)
- Remove deprecated tz_aware argument (#8696)
- simplify
take_every
(#8971) - add readmes to all sub crates (#8770)
- refactor(rust); improve arithmetic reuse and don't allocate on binary… (#8781)
- accumulate windows flag during translation (#8773)
Thank you to all our contributors for making this release possible!
@CloseChoice, @MarcoGorelli, @alexander-beedie, @avimallu, @cbowdon, @charliegallop, @chitralverma, @jonashaag, @kpberry, @mcrumiller, @petar-savov, @raymead, @ritchie46, @sorhawell, @stinodego, @tim-habitat, @uchiiii and @universalmind303