pola-rs/polars py-1.35.0-beta.1 on GitHub

🚀 Performance improvements

Address group_by_dynamic slowness in sparse data (#24916)
Push filters to PyIceberg (#24910)
Native filter/drop_nulls/drop_nans in group-by context (#24897)
Implement cumulative_eval using the group-by engine (#24889)
Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
Implement native null_count, any and all group-by aggregations (#24859)
Speed up reverse in group-by context (#24855)
Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
Don't check duplicates on streaming simple projection in release mode (#24830)
Lower approx_n_unique to the streaming engine (#24821)
Duration/interval string parsing optimisation (2-5x faster) (#24771)
Use native reducer for first/last on Decimals, Categoricals and Enums (#24786)
Implement indexed method for BitMapIter::nth (#24766)
Pushdown slices on plans within unions (#24735)

✨ Enhancements

Add environment variable to roundtrip empty struct in Parquet (#24914)
Fast-count for scan_iceberg().select(len()) (#24602)
Add glob parameter to scan_ipc (#24898)
Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
Add list.agg and arr.agg (#24790)
Implement {Expr,Series}.rolling_rank() (#24776)
Don't require PyArrow for read_database_uri if ADBC engine version supports PyCapsule interface (#24029)
Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
Support MergeSorted in CSPE (#24805)
Duration/interval string parsing optimisation (2-5x faster) (#24771)
Recursively apply CSPE (#24798)
Add streaming engine per-node metrics (#24788)
Add arr.eval (#24472)
Drop PyArrow requirement for non-batched usage of read_database with the ADBC engine and support iter_batches with the ADBC engine (#24180)
Improve rolling_(sum|mean) accuracy (#24743)
Add separator to {Data,Lazy}Frame.unnest (#24716)
Add union() function for unordered concatenation (#24298)
Add name.replace to the set of column rename options (#17942)
Support np.ndarray -> AnyValue conversion (#24748)
Allow duration strings with leading "+" (#24737)
Drop now-unnecessary post-init "schema_overrides" cast on DataFrame load from list of dicts (#24739)
Add support for UInt128 to pyo3-polars (#24731)

🐞 Bug fixes

Properly release the GIL for read_parquet_metadata (#24922)
Broadcast partition_by columns in over expression (#24874)
Clear index cache on stacked df.filter expressions (#24870)
Fix 'explode' mapping strategy on scalar value (#24861)
Fix repeated with_row_index() after scan() silently ignored (#24866)
Correctly return min and max for enums in groupby aggregation (#24808)
Refactor BinaryExpr in group_by dispatch logic (#24548)
Fix aggstate for gather (#24857)
Keep scalars for length preserving functions in group_by (#24819)
Have range feature depend on dtype-array feature (#24853)
Fix duplicate select panic (#24836)
Inconsistency of list.sum() result type with None values (#24476)
Division by zero in Expr.dt.truncate (#24832)
Potential deadlock in __arrow_c_stream__ (#24831)
Allow double aggregations in group-by contexts (#24823)
Series.shrink_dtype for i128/u128 (#24833)
Fix dtype in EvalExpr (#24650)
Allow aggregations on AggState::LiteralScalar (#24820)
Dispatch to group_aware for fallible expressions with masked out elements (#24815)
Fix error for arr.sum() on small integer Array dtypes containing nulls (#24478)
Fix regression on write_database() to Snowflake due to unsupported string view type (#24622)
Fix XOR did not follow kleene when one side is unit-length (#24810)
Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
Incorrect precision in Series.str.to_decimal (#24804)
Use overlapping instead of rolling (#24787)
Fix iterable on dynamic_group_by and rolling object (#24740)
Use Kahan summation for in-memory groupby sum/mean (#24774)
Release GIL in PythonScan predicate evaluation (#24779)
Type error in bitmask::nth_set_bit_u64 (#24775)
Add Expr.sign for Decimal datatype (#24717)
Correct str.replace with missing pattern (#24768)
Ensure schema_overrides is respected when loading iterable row data (#24721)
Support decimal_comma on Decimal type in write_csv (#24718)

📖 Documentation

Add partitioning examples for sink_* methods (#24918)
Add more {unique,value}_counts examples (#24927)
Indent the versionchanged (#24783)
Relax fsspec wording (#24881)
Add pl.field into the api docs (#24846)
Fix duplicated article in SECURITY.md (#24762)
Document output name determination in when/then/otherwise (#24746)
Specify that precision=None becomes 38 for Decimal (#24742)
Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
Fix source mapping (#24736)

📦 Build system

Update pyo3 and numpy crates to version 0.26 (#24760)

🛠️ Other improvements

Re-use iterators in set_ operations (#24850)
Remove GroupByPartitioned and dispatch to streaming engine (#24903)
Turn element() into {A,}Expr::Element (#24885)
Pass ScanOptions to new_from_ipc (#24893)
Update tests to be index type agnostic (#24891)
Unset Context in Window expression (#24875)
Fix failing delta test (#24867)
Move FunctionExpr dispatch from plan to expr (#24839)
Fix SQL test giving wrong error message (#24835)
Consolidate dtype paths in ApplyExpr (#24825)
Add days_in_month to documentation (#24822)
Enable ruff D417 lint (#24814)
Turn pl.format into proper elementwise expression (#24811)
Fix remote benchmark by no-longer saving builds (#24812)
Refactor ApplyExpr in group_by context on multiple inputs (#24520)
IR text plan graph generator (#24733)
Temporarily pin pydantic to fix CI (#24797)
Extend and rename rolling groups to overlapping (#24577)
Refactor DataType proptest strategies (#24763)
Add union to documentation (#24769)

Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @nameexhaustion, @orlp, @pavelzw, @ritchie46, @thomasjpfan and @williambdean

pola-rs/polars py-1.35.0-beta.1 Python Polars 1.35.0-beta.1 on GitHub

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

pola-rs/polars py-1.35.0-beta.1
Python Polars 1.35.0-beta.1

on GitHub