🚀 Performance improvements
- Collapse expanded filters in eager (#20493)
- Remove predicate from
IR::DataFrame
(#20492) - Use different binview dedup strategy depending on chunks ratio (#20451)
- Generalize the
arg_sort
fast path ontoColumn
(#20437) - Dedup binviews up front (#20449)
- Re-enable common subplan elim for new-streaming engine (#20443)
- Don't collect all LHS arrays in gather (#20441)
- Remove prepare_series for gather kernels (#20439)
- Don't always take all data buffers when gathering views (#20435)
✨ Enhancements
- Add
Int128
IO support for csv & ipc (#20535) - Support arbitrary expressions in 'join_where' (#20525)
- Allow use of Python types in
cs.by_dtype
andcol
(#20491) - Add an "include_file_paths" parameter to
read_excel
andread_ods
(#20476) - Allow more join lossless casting (#20474)
- Accept more generic
Iterable[bool]
in Series.filter (#20431) - Allow loading data from multiple Excel/ODS workbooks and worksheets (#20465)
🐞 Bug fixes
- Output index type instead of u32 for
sum_horizontal
with boolean inputs (#20531) - Fix more global categorical issues (#20547)
- Update eager join doctest on multiple columns (#20542)
- Revert categorical unique code (#20540)
- Add
unique
fast path for empty categoricals (#20536) - Fix various
Int128
operations (#20515) - Fix global cat unique (#20524)
- Fix union (#20523)
- Fix rolling aggregations for various integer types (#20512)
- Ensure
ignore_nulls
is respected in horizontal sum/mean (#20469) - Fix incorrectly added sorted flag after append for lexically ordered categorical series (#20414)
- More
Int128
testing and related fixes (#20494) - Validate column names in
unique()
for empty DataFrames (#20411) - Implement
list.min
andlist.max
forlist[i128]
(#20488) - Decimal from physical in horizontal min/max and shift (#20487)
- Don't remove sort if first/last strategy is set in unique (#20481)
- Fix join literal behavior (#20477)
- Validate asof join by args in IR resolving phase (#20473)
- Fix
align_frames
with single row panicking (#20466) - Allow multiple column sort for Decimal (#20452)
- Fix mode panicking for String dtype (#20458)
- Return correct schema for
sum_horizontal
with boolean dtype (#20459) - Fix return type for
add_business_days
,millennium
,century
andcombine
methods inSeries.dt
namespace (#20436)
📖 Documentation
- Fix typo in
DataFrame.cast
(#20532) - Fix flaky doctests (#20516)
- Add examples for bitwise expressions (#20503)
- Clarify the join pre-condition of
join_asof
(#20509) - Fix
Expr.all
description of Kleene logic (#20409)
🛠️ Other improvements
- Increase categorical test coverage (#20514)
- Report wheel sizes (#20541)
- Add tests for
floor/ceil
on integers (#20479) - Expose and rewrite 'can_pre_agg' (#20450)
- Skip test on windows; kuzu import segfaults (#20463)
- Add a
TypeCheckRule
to the optimizer (#20425)
Thank you to all our contributors for making this release possible!
@Biswas-N, @IndexSeek, @Prathamesh-Ghatole, @Terrigible, @alexander-beedie, @brifitz, @coastalwhite, @dependabot, @dependabot[bot], @jqnatividad, @lukemanley, @mcrumiller, @orlp, @ritchie46 and @siddharth-vi