🏆 Highlights
- Excel export support via new
write_excel
IO method (#7251) - out of core sort on multiple columns (#7244)
🚀 Performance improvements
- improve batched csv readers perf and memory perf (#7329)
- use inlined strings for field and schema (#7272)
- reuse groups in binary expressions (#7202)
✨ Enhancements
- support creation of sparklines when exporting
Excel
tables (#7333) - support sqlalchemy/pandas backed
write_database
(#7322) - add adbc database reader and writer (
DataFrame.write_database
) (#7318) - make
expr.apply
streamable in selection context (#7316) - More ergonomic
unnest
args (#7310) - initial working version of Decimal Series (#7220)
- Support explicit Binary dtype in constructor (#7305)
- implement serde for literal datetime and series (#7301)
- improve error message if mmap fails in ipc (#7300)
- add multi-threaded apply (#7277)
- add support for serializing categoricals to json (#7276)
- Add Expr.arg_true (#7056)
- don't require pyarrow for initialising Series with Python datetimes (#7273)
- Excel export support via new
write_excel
IO method (#7251) - deprecate
describe_(optimized)_plan
in favor ofexplain
(#7264) - enable min-max skipping for binary in parquet, enable min-max skipping for
is_in
exprs (#7169) - out of core sort on multiple columns (#7244)
- support nulls_last for multi-column sort (#7242)
- allow optimizations flags in describe_plan (#7233)
- implement row encoding for boolean and binary (#7218)
- allow passing utc=True when parsing time-zone-naive date strings (#7203)
- Add
**named_exprs
input forstruct
(#7208) - add sql "ARRAY_AGG" (#7204)
🐞 Bug fixes
- fix offset in threading apply (#7330)
- fix projection pushdown on join with unused join key (#7326)
- raise error on time -> datetime cast (#7325)
- raise error if output of 'apply' cannot be determined (#7317)
- make
pl.struct
mappable (#7299) - err on duplicate with_column names (#7296)
- don't panic on
str.parse_int
(#7072) - improve concat_list with empty list error message (#7236)
- fix groupby_dynamic's binning when index_column is time-zone-aware (#7278)
- fix preservation of microseconds when converting Python datetime (#7271)
- fix us precision of datetime to anyvalue conversion (#7268)
- no panic on empty cross join (#7266)
- raise error on ambiguous filter predicates (#7265)
- handle concat_list with first lit value (#7235)
- respect schema in DataFrame initialisation for time-zone-aware datetime (#7240)
- ensure
every
type is properly normalised (forgroupby_dynamic
andgroupby_rolling
) (#7238) - add test of median function in lazy mode (#7224)
- dont lose precision in pl.date_range due to floating point arithmetic (#7229)
- Conversion of negative timedelta to polars duration (#7209)
- ensure parametric testing
cols=int
definition respectsallowed_dtypes
(#7213)
🛠️ Other improvements
- Fix
read/write_database
tests (#7327) - Rename
scan_ds
toscan_pyarrow_dataset
(#7320) - don't run tests that write to disk by default (#7321)
- rename
read_sql
toread_database
(#7315) - Address
git2
vulnerability (#7309) - Correctly deprecate
DataFrame.pearson_corr
(#7307) - Skip
write_excel
doctests (#7306) - Run
pytest-xdist
with worksteal (#7304) - Rename pearson_corr & spearman_rank_corr (#7014)
- refactor(python) Split
io
module per type (#7295) - Move
_html
module to dataframe module (#7256) - Enable
strict
for ruffTCH
lints (#7234) - better select on map_dict dtype (#7217)
- add warning of mmap to ipc docstring (#7216)
- exit non-zero on fix from ruff (#7215)
- ensure that
DataFrame
andLazyFrame
init params don't diverge (#7214)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @aldanor, @alexander-beedie, @coinflip112, @csko, @dependabot, @dependabot[bot], @ghuls, @josemasar, @josh, @mslapek, @nrebena, @ozgrakkurt, @papparapa, @ptiza, @rben01, @ritchie46, @sorhawell, @stinodego, @universalmind303, @xyning and @zundertj