🚀 Performance improvements
- Address
group_by_dynamic
slowness in sparse data (#24916) - Push filters to PyIceberg (#24910)
- Native
filter/drop_nulls/drop_nans
in group-by context (#24897) - Implement
cumulative_eval
using the group-by engine (#24889) - Prevent generation of copies of
Dataframe
s inDslPlan
serialization (#24852) - Implement native
null_count
,any
andall
group-by aggregations (#24859) - Speed up
reverse
in group-by context (#24855) - Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
- Don't check duplicates on streaming simple projection in release mode (#24830)
- Lower approx_n_unique to the streaming engine (#24821)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Use native reducer for
first/last
on Decimals, Categoricals and Enums (#24786) - Implement indexed method for
BitMapIter::nth
(#24766) - Pushdown slices on plans within unions (#24735)
✨ Enhancements
- Add environment variable to roundtrip empty struct in Parquet (#24914)
- Fast-count for
scan_iceberg().select(len())
(#24602) - Add
glob
parameter toscan_ipc
(#24898) - Prevent generation of copies of
Dataframe
s inDslPlan
serialization (#24852) - Add
list.agg
andarr.agg
(#24790) - Implement
{Expr,Series}.rolling_rank()
(#24776) - Don't require PyArrow for
read_database_uri
if ADBC engine version supports PyCapsule interface (#24029) - Make
Series
init consistent withDataFrame
init for string values declared with temporal dtype (#24785) - Support MergeSorted in CSPE (#24805)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Recursively apply CSPE (#24798)
- Add streaming engine per-node metrics (#24788)
- Add
arr.eval
(#24472) - Drop PyArrow requirement for non-batched usage of
read_database
with the ADBC engine and supportiter_batches
with the ADBC engine (#24180) - Improve rolling_(sum|mean) accuracy (#24743)
- Add
separator
to{Data,Lazy}Frame.unnest
(#24716) - Add
union()
function for unordered concatenation (#24298) - Add
name.replace
to the set of column rename options (#17942) - Support
np.ndarray -> AnyValue
conversion (#24748) - Allow duration strings with leading "+" (#24737)
- Drop now-unnecessary post-init "schema_overrides" cast on
DataFrame
load from list of dicts (#24739) - Add support for UInt128 to pyo3-polars (#24731)
🐞 Bug fixes
- Properly release the GIL for
read_parquet_metadata
(#24922) - Broadcast
partition_by
columns inover
expression (#24874) - Clear index cache on stacked
df.filter
expressions (#24870) - Fix 'explode' mapping strategy on scalar value (#24861)
- Fix repeated
with_row_index()
afterscan()
silently ignored (#24866) - Correctly return min and max for enums in groupby aggregation (#24808)
- Refactor
BinaryExpr
ingroup_by
dispatch logic (#24548) - Fix aggstate for
gather
(#24857) - Keep scalars for length preserving functions in
group_by
(#24819) - Have
range
feature depend ondtype-array
feature (#24853) - Fix duplicate select panic (#24836)
- Inconsistency of list.sum() result type with None values (#24476)
- Division by zero in Expr.dt.truncate (#24832)
- Potential deadlock in __arrow_c_stream__ (#24831)
- Allow double aggregations in group-by contexts (#24823)
- Series.shrink_dtype for i128/u128 (#24833)
- Fix dtype in
EvalExpr
(#24650) - Allow aggregations on
AggState::LiteralScalar
(#24820) - Dispatch to
group_aware
for fallible expressions with masked out elements (#24815) - Fix error for
arr.sum()
on small integer Array dtypes containing nulls (#24478) - Fix regression on
write_database()
to Snowflake due to unsupported string view type (#24622) - Fix XOR did not follow kleene when one side is unit-length (#24810)
- Make
Series
init consistent withDataFrame
init for string values declared with temporal dtype (#24785) - Incorrect precision in Series.str.to_decimal (#24804)
- Use
overlapping
instead ofrolling
(#24787) - Fix iterable on
dynamic_group_by
androlling
object (#24740) - Use Kahan summation for in-memory groupby sum/mean (#24774)
- Release GIL in PythonScan predicate evaluation (#24779)
- Type error in
bitmask::nth_set_bit_u64
(#24775) - Add
Expr.sign
forDecimal
datatype (#24717) - Correct
str.replace
with missing pattern (#24768) - Ensure
schema_overrides
is respected when loading iterable row data (#24721) - Support
decimal_comma
onDecimal
type inwrite_csv
(#24718)
📖 Documentation
- Add partitioning examples for
sink_*
methods (#24918) - Add more
{unique,value}_counts
examples (#24927) - Indent the versionchanged (#24783)
- Relax fsspec wording (#24881)
- Add
pl.field
into the api docs (#24846) - Fix duplicated article in SECURITY.md (#24762)
- Document output name determination in when/then/otherwise (#24746)
- Specify that precision=None becomes 38 for Decimal (#24742)
- Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
- Fix source mapping (#24736)
📦 Build system
- Update pyo3 and numpy crates to version 0.26 (#24760)
🛠️ Other improvements
- Re-use iterators in
set_
operations (#24850) - Remove
GroupByPartitioned
and dispatch to streaming engine (#24903) - Turn
element()
into{A,}Expr::Element
(#24885) - Pass
ScanOptions
tonew_from_ipc
(#24893) - Update tests to be index type agnostic (#24891)
- Unset
Context
inWindow
expression (#24875) - Fix failing delta test (#24867)
- Move
FunctionExpr
dispatch fromplan
toexpr
(#24839) - Fix SQL test giving wrong error message (#24835)
- Consolidate dtype paths in
ApplyExpr
(#24825) - Add
days_in_month
to documentation (#24822) - Enable ruff D417 lint (#24814)
- Turn
pl.format
into proper elementwise expression (#24811) - Fix remote benchmark by no-longer saving builds (#24812)
- Refactor
ApplyExpr
ingroup_by
context on multiple inputs (#24520) - IR text plan graph generator (#24733)
- Temporarily pin pydantic to fix CI (#24797)
- Extend and rename
rolling
groups tooverlapping
(#24577) - Refactor
DataType
proptest
strategies (#24763) - Add
union
to documentation (#24769)
Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @nameexhaustion, @orlp, @pavelzw, @ritchie46, @thomasjpfan and @williambdean