🏆 Highlights
- Add streaming support for grouped AsOf join (#27293)
⚠️ Deprecations
- Deprecate support for dataframe interchange protocol (#27214)
🚀 Performance improvements
- Create IR slice from expr slice pushdown (#27200)
- Add streaming support for grouped AsOf join (#27293)
- Avoid unnecessary rechunk when sorting already sorted DataFrame (#27264)
- Lower basic over() to streaming primitives (#27303)
- Lower
drop_{nulls,nans}in streaminggroup_byaggregations (#27296) - Lower
entropyto streaming reductions (#27174) - Add native streaming
interpolate(#27185) - Streaming
strptimewithformat=None(#27056) - Lower
skew/kurtosisto streaming aggregations (#27176) - Post apply pyarrow filter in Polars' engine instead of pyarrow (#27192)
- Optimize
drop_nulls().{first,last}()to{first,last}(ignore_nulls=True)(#27187) - Always process pyarrow scan in batches (#27183)
- Make
cutoutputEnumand mark as elementwise (#27173) - Remove unused expression sorts (#27075)
- Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
- Take into account size per row in join sampling (#27098)
- Streaming is_first_distinct and unique(maintain_order=True) (#27052)
- Streaming
covandcorr(#27008) - Add sorted unique node to streaming engine (#26990)
- Ensure Expr.append is lowered in streaming engine (#27022)
- Collapse consecutive Sort nodes (#26965)
- Drop
maintain_order=Truerequirement insink_delta(#27007)
✨ Enhancements
- Add
ignore_nullsto{list,arr}.{any,all}(#27186) - Lock-free memory manager with spill-to-disk and fully OOC multiplexer (#26774)
- Add
is_uniqueto list/array dtypes (#27290) - Streaming pyarrow datasets sources (#27230)
- Add
pl.merge_sortedoperating on multiple frames (#27014) - Allow
group_by()without key exprs (#27141) - Change default scan/read_lines column name from "lines" to "line" (#27122)
- Make unnest() effective on all columns by default (#27029)
- Collapse consecutive Sort nodes (#26965)
🐞 Bug fixes
- Update
groupsto correct length forImplode(#27282) - Fix scan_csv missing_columns='insert' overwrote existing data with NULLs (#27297)
- Raise on non-numeric inputs in
pl.int_ranges(#27294) - Fix always-true filter conversion to Iceberg filter (#27119)
- Do not skip nulls when enumerating over rows in grouped AsOf join (#27275)
- Fix
pivotdropping data for nullonvalues (#27273) - Resolve multiple files deadlock in CSV async reader (#27073)
- Widen decimal precision on sum aggregation (#27270)
- Correct lf.remote type (#27261)
- Default
LazyFrame.map_batchesto no optimizations (#27262) - Extend
StructEvalschema context inStackOptimizer(#27243) - Preserve nulls when casting from all-null
SeriestoStruct(#27241) - Fix
scan_deltafilter on empty dataframe (#27244) - Prevent
DataFramecreation panic onlist[struct]with heterogenous types (#27217) - Named aggregation
__structifywas being ignored (#27148) - Skip
nullgroup entries when collecting AsOf-by groups (#27215) - Fix panic with empty order_by in over expression (#27088)
- Write field ID from
sink_parquet(#27196) - Fix statistics for Null columns in Parquet (#27021)
- Do not prune sort nodes containing slice with dyn predicate (#27140)
- Correct grouped
Binaryarg_min/arg_maxandStringsingle-element arg indices (#27172) - Resolve multiple files deadlock in NDJSON async reader (#27204)
- Overflow panic in interpolate nearest (#27205)
- Using checked arithmetic in
int96_to_i64_nsto prevent overflow panic (#27129) - Don't trigger csv fast count if predicate is pushed down (#27190)
- Support all integer dtypes for Series index assignment (#27188)
- Streaming sort by-expressions were lowered incorrectly (#27158)
- Replace multiprocessing.dummy.Pool with ThreadPoolExecutor (#27175)
- Reset IO metrics instead of consuming (#27156)
- Output SVG if output_path ends with '.svg' in show_graph (#27144)
- Skip extension types for min/max in describe (#27120)
- Address a potential overflow in
from_epochscaling (#27118) - Fix incorrect IO metrics on multi-phase streaming execution (#27123)
- Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
- Make the files used in docs available locally (#27121)
- Apply scalar bound in
clipwhen the Series bound contains nulls (#27087) - Ignore
ddofparameter inrolling_corrand deprecate (#27104) - Preserve casts for horizontal ops with untyped literals (#27011)
- Reject invalid input to
sql_expr(#27084) - Ensure SQL
COUNT(<lit>)expressions return the correct value (#27085) - Regression in replace_strict for enums (#27066)
- Make
test_group_by_arg_max_boolean_26978non-flaky formax_byties (#27048) - Null count for aggregated list inside count aggregation (#27032)
- Panic in streaming MergeSortedNode (#27024)
- Prevent panic in
transpose()with mixed List and non-List columns (#27038) - Set sorted flag for Boolean and Time (#27035)
- Missing
src/subdirectory to CI Python docs step (#27025) - Resolve stack overflow on
merge_sortedandunion(#27018) - Make
pl.DataFrame.fill_nullwork on columns withNulldtype (#27020) - Fix repeated word typos in comments (#26917)
- Covariance with constant is zero, not NaN (#27015)
- Don't remove
set_sortedin projection pushdown (#27006) - Infer nulls when df create from empty-struct (#26991)
- Correct suggestion in multi-expr filter error (#27003)
- Implement
agg_arg_min/agg_arg_maxforbooleandata type (#26997) - Ensure
sample()respects the global set seed (#26992)
📖 Documentation
- Add documentation for openlineage on-premises (#27334)
- Release page (#27335)
- Update uv pip install polars-on-premises cmd (#27330)
- Fix outdated
LazyGroupBy.map_groupsdocstring (#27292) - Add
deny_anonymous_usersto scheduler config (#27287) - Slurm documentation (#27259)
- Add link to concepts in index.md (#27077)
- Add docs entry for
merge_sorted(#27224) - Fix typo (#27212)
- Make the files used in docs available locally (#27121)
- Put first-time contribution requirements in its own linkable section (#27113)
- Add missing docstrings for Expr.struct.__getitem__ and Series.__setitem__ (#27092)
- Normalise
Seriesdocstring whitespace indents (#27082) - Change Polars Cloud API to 0.6.0 (#27005)
- Improve
write_parquetdocstring foruse_pyarrow(#26988)
📦 Build system
- Really do not install pyiceberg-core 0.9.0 (#27017)
🛠️ Other improvements
- Add regression test for instantiating polars DataFrame from pandas Timestamp (#27332)
- Bump Python Polars version (#27315)
- Resolve bad instantiations in test_iceberg (#27314)
- Sink DSL and callback for Iceberg (#27258)
- Wait for morsel consumption in merge_sorted streaming node (#27288)
- Use more precise internal typing (pt. iii) (#27232)
- Mark
scan_ipccache arguments as deprecated (#27216) - Consolidate reordered compare functions (#27229)
- Fix
test_dtype_concat_3735not actually iterating through numeric dtypes (#27178) - Remove dead code in
test_scan_lines(#27213) - Move/genericize
_balanced_reduceto Python utils (#27100) - Remove unused attributes (#27191)
- Avoid unnecessary recompilation due to changing env vars (#27166)
- Update nightly Rust compiler version (#27145)
- Simplify pyarrow scan and process in batches (#26982)
- Make internal typing more precise (part ii) (#27117)
- Add None & Dataframe to FrameInitTypes (#27126)
- Remove unused expression sorts (#27075)
- Improve internal typing ahead of using
ty/pyrefly(#27050) - Add explicit
ResourceWarningcoverage (#27083) - Add sinked paths callback (#26995)
- Pin maturin due to compile time regression (#27062)
- Missing
src/subdirectory to CI Python docs step (#27025) - Really do not install pyiceberg-core 0.9.0 (#27017)
- Naming for named scopes (#26999)
- Enable hypothesis tests when
POLARS_AUTO_NEW_STREAMING=1(#26818) - Fix CI by excluding missing wheel version of pyiceberg (#27001)
- Remove indirection in calling python scans (#26981)
- Polars versions (#26980)
Thank you to all our contributors for making this release possible!
@0xRozier, @EndPositive, @HCYT, @Kevin-Patyk, @MarcoGorelli, @NeejWeej, @RedZapdos123, @TNieuwdorp, @abhidotsh, @alexander-beedie, @andyjessen, @azimafroozeh, @borchero, @carnarez, @coastalwhite, @debnathshoham, @dpinol, @dsprenkels, @dydev012, @farouk-01, @gab23r, @gautamvarmadatla, @joaquinhuigomez, @kdn36, @nameexhaustion, @orlp, @ritchie46, @wence-, @xenzh, @yangsong97 and @yonatan-genai