💥 Breaking changes
- Remove, deprecate or change eager
Expr
s to be lazy compatible (#24027)
🚀 Performance improvements
- Always simplify order requirements in IR (#24192)
- Basic de-duplication of filter expressions (#24220)
- Cache the IR in
pipe_with_schema
(#24213) - Lower
arg_where
natively to streaming engine (#24088) - Lower Expr.shift to streaming engine (#24106)
- Lower order-preserving groupby to streaming engine (#24053)
✨ Enhancements
- Allow pl.Expr.log to take in an expression (#24226)
- Add caching to user credential providers (#23789)
- Expose
mkdir
parameter onwrite_parquet
(#24239) - Implement diff() in streaming engine (#24189)
- Enable Expr.diff(n) for negative n (#24200)
- Allow upcasting null-typed columns to nested column types in scans (#24185)
- Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
- Drop PyArrow requirement for
write_database
with the ADBC engine (#24136) - Add a deprecation warning for pl.Series.shift(Null) (#24114)
- Improve Debug formatting of DataType (#24056)
- Add
LazyFrame.pipe_with_schema
(#24075) - Catch additional temporal attributes in
BytecodeParser
function analysis (#24076) - Add
cum_*
as native streaming nodes (#23977) - Add peak_{min,max} support for booleans (#24068)
- Add
DataFrame.map_columns
for eager evaluation (#23821)
🐞 Bug fixes
- Correct size limits for Decimal cast (#24252)
- Unordered unions in check order observing pass (#24253)
- Fix dtype for
slice
onLiteral
in agg context (#24137) - Fix incorrect
filter(lit(True))
when scanning hive (#24237) - In-memory group_by on 128-bit integers (#24242)
- Fix panic in
gather
inside groupby with invalid indices (#24182) - Release the GIL in map_groups (#24225)
- Remove extra explode in
LazyGroupBy.{head,tail}
(#24221) - Fix panic in polars cloud CSV scan (#24197)
- Fix panic when loading categorical columns from IO plugin (#24205)
- Fix credential provider did not auto-init on partition sinks (#24188)
- Fix engine type for
concat_list
on AggScalarimplode
(#24160) - Rolling_mean handle centered weights with len(values) < window_size (#24158)
- Reading
is_in
predicate for Parquet plain strings (#24184) - Support native DuckDB connection in read_database (#24177)
- Make PyCategories pickleable (#24170)
- Remove unused unsound function
to_mutable_slice
(#24173) - PyO3 extension types giving compat_level errors (#24166)
- Allow non-elementwise by in top_k (#24164)
- Fix
sort_by
forgroup_by_dynamic
context (#24152) - Input-independent length aggregations in streaming (#24153)
- Release GIL when iterating df in to_arrow (#24151)
- Respect non-elementwise join_where conditions (#24135)
- Fix mismatched pytest test collection error (#24133)
- Resolve schema mismatch for div on Boolean (#24111)
- Fix from_repr parsing of negative durations (#24115)
- Make
group_by
/partition_by
iterator keystuple[Any, ...]
to enable tuple-unpacking (#24113) - Keep name when doing empty group-aware aggregation (#24098)
- Implode instead of
reshape_list
(#24078) - Rolling mean with weights incorrect when min_samples < window_size (#23485)
- Allow
merge_sorted
for all types (#24077) - Include datatypes in
row_encode
expression (#24074) - Include UDF materialized type in serialization (#24073)
- Correct
.rolling()
output type for non-aggregations (#24072) - Correct planner output schema for
join_asof
(#24071) - Correct output for
fold
andreduce
(#24069) - Expr.meta.output_name for struct fields (#24064)
- Ensure upcast operations on
pl.Date
default to microsecond precision (#23981) - Add peak_{min,max} support for booleans (#24068)
- Planner output type for
mean
with strange input type (#24052) - Remove, deprecate or change eager
Expr
s to be lazy compatible (#24027)
📖 Documentation
- Fix formatting of Series.value_counts examples (#24245)
- Add hint to use
DataFrame/Series
constructors infrom_arrow
docstring (#22942) - Update GPU un/supported features (#24195)
- Add
DataFrame.map_columns
to API (#24128) - Update multiple pages in the Polars Cloud user guide (#23661)
- Fix
str.find_many()
docstring example (#24092)
📦 Build system
- Drop binary support for macos_x86-64 (#24257)
🛠️ Other improvements
- Remove unnecessary parentheses (#24244)
- Make non-nested shift{,_and_fill} ops generic (#24224)
- Remove unused
Wrap
(#24214) - Allow upcasting null-typed columns to nested column types in scans (#24185)
- Automatically label a few more types of PR (#24147)
- Update toolchain (#24156)
- Add
order_sensitive
property forAExpr
(#24116) - Mark more tests as not possible on cloud (#24103)
- Turn
AggExpr::Count
from tuple to struct (#24096) - Mark tests that may fail in cloud (#24067)
- Extend read database tests to capture more ADBC functionality (#24002)
- Make CI perf failures more lenient (#24066)
- Fix hive partition string encoding in CI by upgrading
deltalake
(#24018) - Make tests with sinks run on cloud again (#24048)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @agossard, @alexander-beedie, @aparna2198, @borchero, @coastalwhite, @deanm0000, @dsprenkels, @henryharbeck, @jjurm, @kdn36, @math-hiyoko, @mcrumiller, @mroeschke, @nameexhaustion, @orlp, @r-brink, @ritchie46, @stijnherfst, @vdrn and @wence-