🏆 Highlights
- Stabilize decimal (#25020)
🚀 Performance improvements
- Bump foldhash to 0.2.0 and hashbrown to 0.16.0 (#25014)
- Lower
uniqueto native group-by and speed upn_uniquein group-by context (#24976) - Better parallelize
take{_slice,}_unchecked(#24980) - Implement native
skewandkurtosisin group-by context (#24961) - Use native group-by aggregations for
bitwise_*operations (#24935) - Address
group_by_dynamicslowness in sparse data (#24916) - Push filters to PyIceberg (#24910)
- Native
filter/drop_nulls/drop_nansin group-by context (#24897) - Implement
cumulative_evalusing the group-by engine (#24889) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Implement native
null_count,anyandallgroup-by aggregations (#24859) - Speed up
reversein group-by context (#24855) - Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
- Don't check duplicates on streaming simple projection in release mode (#24830)
- Lower approx_n_unique to the streaming engine (#24821)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Use native reducer for
first/laston Decimals, Categoricals and Enums (#24786) - Implement indexed method for
BitMapIter::nth(#24766) - Pushdown slices on plans within unions (#24735)
✨ Enhancements
- Stabilize decimal (#25020)
- Support
ewm_mean()in streaming engine (#25003) - Improve row-count estimates (#24996)
- Remove filtered scan paths in IR when possible (#24974)
- Introduce remote Polars MCP server (#24977)
- Allow local scans on polars cloud (configurable) (#24962)
- Add
Expr.itemto strictly extract a single value from an expression (#24888) - Add environment variable to roundtrip empty struct in Parquet (#24914)
- Fast-count for
scan_iceberg().select(len())(#24602) - Add
globparameter toscan_ipc(#24898) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Add
list.aggandarr.agg(#24790) - Implement
{Expr,Series}.rolling_rank()(#24776) - Don't require PyArrow for
read_database_uriif ADBC engine version supports PyCapsule interface (#24029) - Make
Seriesinit consistent withDataFrameinit for string values declared with temporal dtype (#24785) - Support MergeSorted in CSPE (#24805)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Recursively apply CSPE (#24798)
- Add streaming engine per-node metrics (#24788)
- Add
arr.eval(#24472) - Drop PyArrow requirement for non-batched usage of
read_databasewith the ADBC engine and supportiter_batcheswith the ADBC engine (#24180) - Improve rolling_(sum|mean) accuracy (#24743)
- Add
separatorto{Data,Lazy}Frame.unnest(#24716) - Add
union()function for unordered concatenation (#24298) - Add
name.replaceto the set of column rename options (#17942) - Support
np.ndarray -> AnyValueconversion (#24748) - Allow duration strings with leading "+" (#24737)
- Drop now-unnecessary post-init "schema_overrides" cast on
DataFrameload from list of dicts (#24739) - Add support for UInt128 to pyo3-polars (#24731)
🐞 Bug fixes
- Re-enable CPU feature check before import (#25010)
- Implement
read_excelworkaround for fastexcel/calamine issue loading a column subset from a named table (#25012) - Correctness
any(ignore_nulls)and OOB inall(#25005) - Streaming any/all with ignore_nulls=False (#25008)
- Fix incorrect
join_asofon a casted expression (#25006) - Optimize memory on rolling groups in
ApplyExpr(#24709) - Fallback
Pyarrowscan to in-memory engine (#24991) - Make
Operator::swap_operandsreturn correct operators forPlus,Minus,MultiplyandDivide(#24997) - Capitalize letters after numbers in to_titlecase (#24993)
- Preserve null values in
pct_change(#24952) - Raise length mismatch on
overwith sliced groups (#24887) - Check duplicate name in transpose (#24956)
- Follow Kleene logic in
any/allfor group-by (#24940) - Do not optimize cross join to iejoin if order maintaining (#24950)
- Fix typing of
scan_parquetpartially unknown (#24928) - Properly release the GIL for
read_parquet_metadata(#24922) - Broadcast
partition_bycolumns inoverexpression (#24874) - Clear index cache on stacked
df.filterexpressions (#24870) - Fix 'explode' mapping strategy on scalar value (#24861)
- Fix repeated
with_row_index()afterscan()silently ignored (#24866) - Correctly return min and max for enums in groupby aggregation (#24808)
- Refactor
BinaryExpringroup_bydispatch logic (#24548) - Fix aggstate for
gather(#24857) - Keep scalars for length preserving functions in
group_by(#24819) - Have
rangefeature depend ondtype-arrayfeature (#24853) - Fix duplicate select panic (#24836)
- Inconsistency of list.sum() result type with None values (#24476)
- Division by zero in Expr.dt.truncate (#24832)
- Potential deadlock in __arrow_c_stream__ (#24831)
- Allow double aggregations in group-by contexts (#24823)
- Series.shrink_dtype for i128/u128 (#24833)
- Fix dtype in
EvalExpr(#24650) - Allow aggregations on
AggState::LiteralScalar(#24820) - Dispatch to
group_awarefor fallible expressions with masked out elements (#24815) - Fix error for
arr.sum()on small integer Array dtypes containing nulls (#24478) - Fix regression on
write_database()to Snowflake due to unsupported string view type (#24622) - Fix XOR did not follow kleene when one side is unit-length (#24810)
- Make
Seriesinit consistent withDataFrameinit for string values declared with temporal dtype (#24785) - Incorrect precision in Series.str.to_decimal (#24804)
- Use
overlappinginstead ofrolling(#24787) - Fix iterable on
dynamic_group_byandrollingobject (#24740) - Use Kahan summation for in-memory groupby sum/mean (#24774)
- Release GIL in PythonScan predicate evaluation (#24779)
- Type error in
bitmask::nth_set_bit_u64(#24775) - Add
Expr.signforDecimaldatatype (#24717) - Correct
str.replacewith missing pattern (#24768) - Ensure
schema_overridesis respected when loading iterable row data (#24721) - Support
decimal_commaonDecimaltype inwrite_csv(#24718)
📖 Documentation
- Introduce remote Polars MCP server (#24977)
- Add
{arr,list}.aggAPI references (#24970) - Support LLM in docs (#24958)
- Update Cloud docs with correct fn argument order (#24939)
- Update
name.replaceexamples (#24941) - Add i128 and u128 features to user guide (#24938)
- Add partitioning examples for
sink_*methods (#24918) - Add more
{unique,value}_countsexamples (#24927) - Indent the versionchanged (#24783)
- Relax fsspec wording (#24881)
- Add
pl.fieldinto the api docs (#24846) - Fix duplicated article in SECURITY.md (#24762)
- Document output name determination in when/then/otherwise (#24746)
- Specify that precision=None becomes 38 for Decimal (#24742)
- Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
- Fix source mapping (#24736)
📦 Build system
- Ensure
build_feature_flags.pyis included in artifact (#25024) - Update pyo3 and numpy crates to version 0.26 (#24760)
🛠️ Other improvements
- Fix benchmark ci (#25019)
- Fix non-deterministic test (#25009)
- Fix makefile arch detection (#25011)
- Make
LazyFrame.set_sortedinto aFunctionIR::Hint(#24981) - Remove symbolic links (#24982)
- Deprecate
Expr.agg_groups()andpl.groups()(#24919) - Dispatch to no-op rayon thread-pool from streaming (#24957)
- Unpin pydantic (#24955)
- Ensure safety of scan fast-count IR lowering in streaming (#24953)
- Re-use iterators in
set_operations (#24850) - Remove
GroupByPartitionedand dispatch to streaming engine (#24903) - Turn
element()into{A,}Expr::Element(#24885) - Pass
ScanOptionstonew_from_ipc(#24893) - Update tests to be index type agnostic (#24891)
- Unset
ContextinWindowexpression (#24875) - Fix failing delta test (#24867)
- Move
FunctionExprdispatch fromplantoexpr(#24839) - Fix SQL test giving wrong error message (#24835)
- Consolidate dtype paths in
ApplyExpr(#24825) - Add
days_in_monthto documentation (#24822) - Enable ruff D417 lint (#24814)
- Turn
pl.formatinto proper elementwise expression (#24811) - Fix remote benchmark by no-longer saving builds (#24812)
- Refactor
ApplyExpringroup_bycontext on multiple inputs (#24520) - IR text plan graph generator (#24733)
- Temporarily pin pydantic to fix CI (#24797)
- Extend and rename
rollinggroups tooverlapping(#24577) - Refactor
DataTypepropteststrategies (#24763) - Add
unionto documentation (#24769)
Thank you to all our contributors for making this release possible!
@EndPositive, @EnricoMi, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @carnarez, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @mjanssen, @nameexhaustion, @orlp, @pavelzw, @r-brink, @ritchie46, @thomasjpfan and @williambdean