pola-rs/polars py-1.40.0 on GitHub

🏆 Highlights

Add streaming support for grouped AsOf join (#27293)

⚠️ Deprecations

Deprecate support for dataframe interchange protocol (#27214)

🚀 Performance improvements

Create IR slice from expr slice pushdown (#27200)
Add streaming support for grouped AsOf join (#27293)
Avoid unnecessary rechunk when sorting already sorted DataFrame (#27264)
Lower basic over() to streaming primitives (#27303)
Lower drop_{nulls,nans} in streaming group_by aggregations (#27296)
Lower entropy to streaming reductions (#27174)
Add native streaming interpolate (#27185)
Streaming strptime with format=None (#27056)
Lower skew / kurtosis to streaming aggregations (#27176)
Post apply pyarrow filter in Polars' engine instead of pyarrow (#27192)
Optimize drop_nulls().{first,last}() to {first,last}(ignore_nulls=True) (#27187)
Always process pyarrow scan in batches (#27183)
Make cut output Enum and mark as elementwise (#27173)
Remove unused expression sorts (#27075)
Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
Take into account size per row in join sampling (#27098)
Streaming is_first_distinct and unique(maintain_order=True) (#27052)
Streaming cov and corr (#27008)
Add sorted unique node to streaming engine (#26990)
Ensure Expr.append is lowered in streaming engine (#27022)
Collapse consecutive Sort nodes (#26965)
Drop maintain_order=True requirement in sink_delta (#27007)

✨ Enhancements

Add ignore_nulls to {list,arr}.{any,all} (#27186)
Lock-free memory manager with spill-to-disk and fully OOC multiplexer (#26774)
Add is_unique to list/array dtypes (#27290)
Streaming pyarrow datasets sources (#27230)
Add pl.merge_sorted operating on multiple frames (#27014)
Allow group_by() without key exprs (#27141)
Change default scan/read_lines column name from "lines" to "line" (#27122)
Make unnest() effective on all columns by default (#27029)
Collapse consecutive Sort nodes (#26965)

🐞 Bug fixes

Update groups to correct length for Implode (#27282)
Fix scan_csv missing_columns='insert' overwrote existing data with NULLs (#27297)
Raise on non-numeric inputs in pl.int_ranges (#27294)
Fix always-true filter conversion to Iceberg filter (#27119)
Do not skip nulls when enumerating over rows in grouped AsOf join (#27275)
Fix pivot dropping data for null on values (#27273)
Resolve multiple files deadlock in CSV async reader (#27073)
Widen decimal precision on sum aggregation (#27270)
Correct lf.remote type (#27261)
Default LazyFrame.map_batches to no optimizations (#27262)
Extend StructEval schema context in StackOptimizer (#27243)
Preserve nulls when casting from all-null Series to Struct (#27241)
Fix scan_delta filter on empty dataframe (#27244)
Prevent DataFrame creation panic on list[struct] with heterogenous types (#27217)
Named aggregation __structify was being ignored (#27148)
Skip null group entries when collecting AsOf-by groups (#27215)
Fix panic with empty order_by in over expression (#27088)
Write field ID from sink_parquet (#27196)
Fix statistics for Null columns in Parquet (#27021)
Do not prune sort nodes containing slice with dyn predicate (#27140)
Correct grouped Binary arg_min/arg_max and String single-element arg indices (#27172)
Resolve multiple files deadlock in NDJSON async reader (#27204)
Overflow panic in interpolate nearest (#27205)
Using checked arithmetic in int96_to_i64_ns to prevent overflow panic (#27129)
Don't trigger csv fast count if predicate is pushed down (#27190)
Support all integer dtypes for Series index assignment (#27188)
Streaming sort by-expressions were lowered incorrectly (#27158)
Replace multiprocessing.dummy.Pool with ThreadPoolExecutor (#27175)
Reset IO metrics instead of consuming (#27156)
Output SVG if output_path ends with '.svg' in show_graph (#27144)
Skip extension types for min/max in describe (#27120)
Address a potential overflow in from_epoch scaling (#27118)
Fix incorrect IO metrics on multi-phase streaming execution (#27123)
Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
Make the files used in docs available locally (#27121)
Apply scalar bound in clip when the Series bound contains nulls (#27087)
Ignore ddof parameter in rolling_corr and deprecate (#27104)
Preserve casts for horizontal ops with untyped literals (#27011)
Reject invalid input to sql_expr (#27084)
Ensure SQL COUNT(<lit>) expressions return the correct value (#27085)
Regression in replace_strict for enums (#27066)
Make test_group_by_arg_max_boolean_26978 non-flaky for max_by ties (#27048)
Null count for aggregated list inside count aggregation (#27032)
Panic in streaming MergeSortedNode (#27024)
Prevent panic in transpose() with mixed List and non-List columns (#27038)
Set sorted flag for Boolean and Time (#27035)
Missing src/ subdirectory to CI Python docs step (#27025)
Resolve stack overflow on merge_sorted and union (#27018)
Make pl.DataFrame.fill_null work on columns with Null dtype (#27020)
Fix repeated word typos in comments (#26917)
Covariance with constant is zero, not NaN (#27015)
Don't remove set_sorted in projection pushdown (#27006)
Infer nulls when df create from empty-struct (#26991)
Correct suggestion in multi-expr filter error (#27003)
Implement agg_arg_min/agg_arg_max for boolean data type (#26997)
Ensure sample() respects the global set seed (#26992)

📖 Documentation

Add documentation for openlineage on-premises (#27334)
Release page (#27335)
Update uv pip install polars-on-premises cmd (#27330)
Fix outdated LazyGroupBy.map_groups docstring (#27292)
Add deny_anonymous_users to scheduler config (#27287)
Slurm documentation (#27259)
Add link to concepts in index.md (#27077)
Add docs entry for merge_sorted (#27224)
Fix typo (#27212)
Make the files used in docs available locally (#27121)
Put first-time contribution requirements in its own linkable section (#27113)
Add missing docstrings for Expr.struct.__getitem__ and Series.__setitem__ (#27092)
Normalise Series docstring whitespace indents (#27082)
Change Polars Cloud API to 0.6.0 (#27005)
Improve write_parquet docstring for use_pyarrow (#26988)

📦 Build system

Really do not install pyiceberg-core 0.9.0 (#27017)

🛠️ Other improvements

Add regression test for instantiating polars DataFrame from pandas Timestamp (#27332)
Bump Python Polars version (#27315)
Resolve bad instantiations in test_iceberg (#27314)
Sink DSL and callback for Iceberg (#27258)
Wait for morsel consumption in merge_sorted streaming node (#27288)
Use more precise internal typing (pt. iii) (#27232)
Mark scan_ipc cache arguments as deprecated (#27216)
Consolidate reordered compare functions (#27229)
Fix test_dtype_concat_3735 not actually iterating through numeric dtypes (#27178)
Remove dead code in test_scan_lines (#27213)
Move/genericize _balanced_reduce to Python utils (#27100)
Remove unused attributes (#27191)
Avoid unnecessary recompilation due to changing env vars (#27166)
Update nightly Rust compiler version (#27145)
Simplify pyarrow scan and process in batches (#26982)
Make internal typing more precise (part ii) (#27117)
Add None & Dataframe to FrameInitTypes (#27126)
Remove unused expression sorts (#27075)
Improve internal typing ahead of using ty / pyrefly (#27050)
Add explicit ResourceWarning coverage (#27083)
Add sinked paths callback (#26995)
Pin maturin due to compile time regression (#27062)
Missing src/ subdirectory to CI Python docs step (#27025)
Really do not install pyiceberg-core 0.9.0 (#27017)
Naming for named scopes (#26999)
Enable hypothesis tests when POLARS_AUTO_NEW_STREAMING=1 (#26818)
Fix CI by excluding missing wheel version of pyiceberg (#27001)
Remove indirection in calling python scans (#26981)
Polars versions (#26980)

Thank you to all our contributors for making this release possible!
@0xRozier, @EndPositive, @HCYT, @Kevin-Patyk, @MarcoGorelli, @NeejWeej, @RedZapdos123, @TNieuwdorp, @abhidotsh, @alexander-beedie, @andyjessen, @azimafroozeh, @borchero, @carnarez, @coastalwhite, @debnathshoham, @dpinol, @dsprenkels, @dydev012, @farouk-01, @gab23r, @gautamvarmadatla, @joaquinhuigomez, @kdn36, @nameexhaustion, @orlp, @ritchie46, @wence-, @xenzh, @yangsong97 and @yonatan-genai

pola-rs/polars py-1.40.0 Python Polars 1.40.0 on GitHub

🏆 Highlights

⚠️ Deprecations

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

pola-rs/polars py-1.40.0
Python Polars 1.40.0

on GitHub