🚀 Performance improvements
- Provide a fallback skip batch predicate for constant batches (#21477)
- Parallelize the passing in new streaming multiscan (#21430)
✨ Enhancements
- Add lossy decoding to
read_csv
for non-utf8 encodings (#21433) - Add
DataFrame.write_iceberg
(#15018) - Add 'nulls_equal' parameter to
is_in
(#21426) - Improve numeric stability
rolling_{std, var, cov, corr}
(#21528) - IR Serde cross-filter (#21488)
- Give priority to pycapsule interface in from_dataframe (#21377)
- Support writing
Time
type in json (#21454) - Activate all optimizations in sinks (#21462)
- Add
AssertionError
variant toPolarsError
inpolars-error
(#21460) - Pass filter to inner readers in multiscan new streaming (#21436)
🐞 Bug fixes
- Categorical min/max panicking when string cache is enabled (#21552)
- Don't encode IPC record batch twice (#21525)
- Respect rewriting flag in Node rewriter (#21516)
- Correct skip batch predicate for partial statistics (#21502)
- Make the Parquet Sink properly phase aware (#21499)
- Don't divide by zero in partitioned group-by (#21498)
- Create new linearizer between rowwise new streaming sink phases (#21490)
- Don't drop rows in sinks between new streaming phases (#21489)
- Incorrect lazy schema for
Expr.list.diff
(#21484) - Give priority to pycapsule interface in from_dataframe (#21377)
- Duration Series arithmetic operations (#21425)
- Fix unwrap None panic when filtering delta with missing columns (#21453)
- Use stable sort for rolling-groupby (#21444)
- Throw exception if dataframe is too large to be compatible with Excel (#20900)
- Address regression with
read_excel
not handling URL paths correctly (#21428)
📖 Documentation
- Fix typo (#21554)
- Correct typos and grammar in Python docstrings (#21524)
- Move llm page under misc (#21550)
- Polars Cloud docs (#21548)
- Add LazyFrame.remote docs entry (#21529)
- Specify that the key column must be sorted in ascending order in
merge_sorted
(#21501) - Add Polars & LLMs page to the user guide (#21218)
- Mention that
statistics=True
doesn't enable all statistics insink_parquet()
(#21434)
🛠️ Other improvements
- Don't take ownership of IRplan in new streaming engine (#21551)
- Refactor code for re-use by streaming NDJSON source (#21520)
- Simplify the phase handling of new streaming sinks (#21530)
- Improve IPC sink node parallelism (#21505)
- Use tikv-jemallocator (#21486)
- Rename 'join_nulls' parameter to 'nulls_equal' in join functions (#21507)
- Move rolling to polars-compute (#21503)
- Remove Growable in favor of ArrayBuilder (#21500)
- Introduce a Sink Node trait in the new streaming engine (#21458)
- Add test for rolling stability sort (#21456)
- Add test for empty
.is_in
predicate filter (#21455) - Test for unique length on multiple columns (#21418)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @MarcoGorelli, @Matt711, @alexander-beedie, @banflam, @braaannigan, @coastalwhite, @dependabot[bot], @etiennebacher, @ghuls, @kevinjqliu, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @stijnherfst, @thomasjpfan and dependabot[bot]