🚀 Performance improvements
- change top_k algorithm (#7718)
- runtime SIMD target detection for
min/max/sum
and impl SIMDmean
~2-5x
(#7702) - implement top-k optimization (#7678)
- ooc-sort dump in thread local if IO-thread is full. (#7668)
- use perfect hash table for ooc partitioning (#7653)
✨ Enhancements
- add dt.datetime, dt.date, dt.time (#7735)
- new "row_totals" parameter for
write_excel
that adds a row-wise total column using structured references (#7751) - More ergonomic args for
min/max
(#7742) - More ergonomic args for
concat_list
(#7745) - add
Series.hist
(#7727) - add
qcut
(#7724) - add
maintain_order
option toSeries.cut
(#7723) - create series with only none list with specific dtype (#7722)
- add
maintain_order
inarr.unique
(#7721) DataFrame.top_k/ LazyFrame.top_k
(#7720)- clearer error message when replace_time_zone encounters ambiguous or non-existent datetimes (#7685)
- include
set_fmt_float
value inConfig
load/save state (#7696) - raise on descending date_range arguments (#7671)
- include
add
operator-equivalent expression (#7667) - add expression method equivalents for existing math/logical operators (#7660)
- add
is_leap_year
to temporal expressions (#7618) - full out-of core support for streaming groupby (#7630)
- clearer error message when creating duration string without integer (#7648)
- allow
scan_csv
to take a list of column names in anew_columns
param (#7642) - out-of-core
groupby/unique
of groupby on integer keys (#7604) - allow set and/or frozenset as input to
is_in
expressions (#7613)
🐞 Bug fixes
- make zip_with_same_type obligatory (#7761)
- fix melt projection pushdown node (#7752)
- fix predicate pushdown for 'unique' first/last (#7749)
- fix null propagation (#7748)
- fix init from pandas Series that has no dtype and is empty (or contains only null values) (#7716)
- avoid ambiguous time error when passing python Datetime to DataFrame constructor (#7711)
- Fix infering CSV schema when skip_rows_after_heade… (#7701)
- fix race condition in null handling of window fast… (#7695)
- address
Series
init regression from list ofnp.arange
objects (#7692) - improve error message if unavailable lazy module is queried for
__version__
attribute (#7680) - fix reversed non-existant file error msg (#7657) (#7673)
- respect time zone in groupby_rolling with negative offset (#7664)
- fix empty case str.replace (#7662)
- allow for list of datetimes with timezone(timedelta!=0) in Series constructor (#7645)
- respect time zone in rolling_* functions (#7643)
- fix schema of decimal type reads (#7652)
- detect deltalake version in show_versions (#7622)
- respect time zone in offset_by (#7626)
- fix boolean
Series
init with integer 1/0 values (#7619) - respect time zone in dt.round (#7611)
🛠️ Other improvements
- Display full argument names in __repr__ for Datetime a… (#7736)
- add
Expr.pipe
API docs link (#7734) - Add sort_by example taking one row per group (#7712)
- Clean up a few type hints/imports (#7687)
- Move
wrap_x
utils toutils
module (#7672) - Reduce number of polars.internals imports (#7628)
- Remove duplicate column from Expr.sort example (#7684)
- Move
expr
parsing to utils (#7661) - Eliminate function re-exports through
internals
(#7650) - Move last functionality out of
internals
(#7649) - More internals cleanup (#7638)
- Update lockfile (#7637)
- fix and improve type hints and function names (#7609)
- remove additional logic from scan delta (#7605)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @borchero, @chitralverma, @didriksg, @ghuls, @jakob-keller, @minimav, @ritchie46, @stinodego, @universalmind303 and @zundertj