💥 Breaking changes
- Remove, deprecate or change eager
Expr
s to be lazy compatible (#24027)
🚀 Performance improvements
- Native streaming
int_range
withlen
orcount
(#24280) - Lower
arg_unique
natively to the streaming engine (#24279) - Move unordering optimization to end (#24286)
- Do ordering simplification step after common sub-plan elimination (#24269)
- Always simplify order requirements in IR (#24192)
- Basic de-duplication of filter expressions (#24220)
- Cache the IR in
pipe_with_schema
(#24213) - Lower
arg_where
natively to streaming engine (#24088) - Lower Expr.shift to streaming engine (#24106)
- Lower order-preserving groupby to streaming engine (#24053)
✨ Enhancements
- Add CSE for custom io sources using pointer for hashing (#24297)
- Allow pl.Expr.log to take in an expression (#24226)
- Add caching to user credential providers (#23789)
- Expose
mkdir
parameter onwrite_parquet
(#24239) - Implement diff() in streaming engine (#24189)
- Enable Expr.diff(n) for negative n (#24200)
- Allow upcasting null-typed columns to nested column types in scans (#24185)
- Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
- Drop PyArrow requirement for
write_database
with the ADBC engine (#24136) - Add a deprecation warning for pl.Series.shift(Null) (#24114)
- Improve Debug formatting of DataType (#24056)
- Add
LazyFrame.pipe_with_schema
(#24075) - Catch additional temporal attributes in
BytecodeParser
function analysis (#24076) - Add
cum_*
as native streaming nodes (#23977) - Add peak_{min,max} support for booleans (#24068)
- Add
DataFrame.map_columns
for eager evaluation (#23821)
🐞 Bug fixes
- Invalid conversion from non-bit numpy bools (#24312)
- Make
dt.epoch('s')
serializable (#24302) - Make
Expr.rechunk
serializable (#24303) - Schema mismatch for 'log' operation (#24300)
- Incorrect first/last aggregate in streaming engine (#24289)
- Fix group offsets in sliced groups (#24274)
- Panic in inexact date(time) conversion (#24268)
- Keep DSL cache after serialization and deserialization (#24265)
- Sanitize and warn about eval usage (#24262)
- Correct incorrect default in
from_pandas
overload forinclude_index
(#24258) - Unique with keep="none" in new optimization pass (#24261)
- Correct size limits for Decimal cast (#24252)
- Unordered unions in check order observing pass (#24253)
- Fix dtype for
slice
onLiteral
in agg context (#24137) - Fix incorrect
filter(lit(True))
when scanning hive (#24237) - In-memory group_by on 128-bit integers (#24242)
- Fix panic in
gather
inside groupby with invalid indices (#24182) - Release the GIL in map_groups (#24225)
- Remove extra explode in
LazyGroupBy.{head,tail}
(#24221) - Fix panic in polars cloud CSV scan (#24197)
- Fix panic when loading categorical columns from IO plugin (#24205)
- Fix credential provider did not auto-init on partition sinks (#24188)
- Fix engine type for
concat_list
on AggScalarimplode
(#24160) - Rolling_mean handle centered weights with len(values) < window_size (#24158)
- Reading
is_in
predicate for Parquet plain strings (#24184) - Support native DuckDB connection in read_database (#24177)
- Make PyCategories pickleable (#24170)
- Remove unused unsound function
to_mutable_slice
(#24173) - PyO3 extension types giving compat_level errors (#24166)
- Allow non-elementwise by in top_k (#24164)
- Fix
sort_by
forgroup_by_dynamic
context (#24152) - Input-independent length aggregations in streaming (#24153)
- Release GIL when iterating df in to_arrow (#24151)
- Respect non-elementwise join_where conditions (#24135)
- Fix mismatched pytest test collection error (#24133)
- Resolve schema mismatch for div on Boolean (#24111)
- Fix from_repr parsing of negative durations (#24115)
- Make
group_by
/partition_by
iterator keystuple[Any, ...]
to enable tuple-unpacking (#24113) - Keep name when doing empty group-aware aggregation (#24098)
- Implode instead of
reshape_list
(#24078) - Rolling mean with weights incorrect when min_samples < window_size (#23485)
- Allow
merge_sorted
for all types (#24077) - Include datatypes in
row_encode
expression (#24074) - Include UDF materialized type in serialization (#24073)
- Correct
.rolling()
output type for non-aggregations (#24072) - Correct planner output schema for
join_asof
(#24071) - Correct output for
fold
andreduce
(#24069) - Expr.meta.output_name for struct fields (#24064)
- Ensure upcast operations on
pl.Date
default to microsecond precision (#23981) - Add peak_{min,max} support for booleans (#24068)
- Planner output type for
mean
with strange input type (#24052) - Remove, deprecate or change eager
Expr
s to be lazy compatible (#24027)
📖 Documentation
- Fix few typos (#24305)
- Add missing reference to
LazyFrame.pipe_with_schema()
on the website (#24285) - Automatically register
doctest.ELLIPSIS
so we don't have to add the inline directive each time (#24146) - Update categorical comparison documentation in user guide (#24249)
- Add missing references for
Seriers.rolling_*_by
methods (#24254) - Fix formatting of Series.value_counts examples (#24245)
- Add hint to use
DataFrame/Series
constructors infrom_arrow
docstring (#22942) - Update GPU un/supported features (#24195)
- Add
DataFrame.map_columns
to API (#24128) - Update multiple pages in the Polars Cloud user guide (#23661)
- Fix
str.find_many()
docstring example (#24092)
📦 Build system
🛠️ Other improvements
- Remove PDS-H code (#24301)
- Get ready for even more cloud tests (#24292)
- Add tests for slices with caches (#24288)
- Readd ordering tests (#24284)
- Fix Makefile venv path (#24251)
- Remove unnecessary parentheses (#24244)
- Make non-nested shift{,_and_fill} ops generic (#24224)
- Remove unused
Wrap
(#24214) - Allow upcasting null-typed columns to nested column types in scans (#24185)
- Automatically label a few more types of PR (#24147)
- Update toolchain (#24156)
- Add
order_sensitive
property forAExpr
(#24116) - Mark more tests as not possible on cloud (#24103)
- Turn
AggExpr::Count
from tuple to struct (#24096) - Mark tests that may fail in cloud (#24067)
- Extend read database tests to capture more ADBC functionality (#24002)
- Make CI perf failures more lenient (#24066)
- Fix hive partition string encoding in CI by upgrading
deltalake
(#24018) - Make tests with sinks run on cloud again (#24048)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @MarcoGorelli, @NeejWeej, @agossard, @alexander-beedie, @aparna2198, @borchero, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @etiennebacher, @gab23r, @henryharbeck, @jjurm, @kdn36, @math-hiyoko, @mcrumiller, @mroeschke, @nameexhaustion, @orlp, @r-brink, @ritchie46, @stijnherfst, @vdrn and @wence-