github pola-rs/polars py-1.35.0-beta.1
Python Polars 1.35.0-beta.1

pre-release10 hours ago

🚀 Performance improvements

  • Address group_by_dynamic slowness in sparse data (#24916)
  • Push filters to PyIceberg (#24910)
  • Native filter/drop_nulls/drop_nans in group-by context (#24897)
  • Implement cumulative_eval using the group-by engine (#24889)
  • Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
  • Implement native null_count, any and all group-by aggregations (#24859)
  • Speed up reverse in group-by context (#24855)
  • Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
  • Don't check duplicates on streaming simple projection in release mode (#24830)
  • Lower approx_n_unique to the streaming engine (#24821)
  • Duration/interval string parsing optimisation (2-5x faster) (#24771)
  • Use native reducer for first/last on Decimals, Categoricals and Enums (#24786)
  • Implement indexed method for BitMapIter::nth (#24766)
  • Pushdown slices on plans within unions (#24735)

✨ Enhancements

  • Add environment variable to roundtrip empty struct in Parquet (#24914)
  • Fast-count for scan_iceberg().select(len()) (#24602)
  • Add glob parameter to scan_ipc (#24898)
  • Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
  • Add list.agg and arr.agg (#24790)
  • Implement {Expr,Series}.rolling_rank() (#24776)
  • Don't require PyArrow for read_database_uri if ADBC engine version supports PyCapsule interface (#24029)
  • Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
  • Support MergeSorted in CSPE (#24805)
  • Duration/interval string parsing optimisation (2-5x faster) (#24771)
  • Recursively apply CSPE (#24798)
  • Add streaming engine per-node metrics (#24788)
  • Add arr.eval (#24472)
  • Drop PyArrow requirement for non-batched usage of read_database with the ADBC engine and support iter_batches with the ADBC engine (#24180)
  • Improve rolling_(sum|mean) accuracy (#24743)
  • Add separator to {Data,Lazy}Frame.unnest (#24716)
  • Add union() function for unordered concatenation (#24298)
  • Add name.replace to the set of column rename options (#17942)
  • Support np.ndarray -> AnyValue conversion (#24748)
  • Allow duration strings with leading "+" (#24737)
  • Drop now-unnecessary post-init "schema_overrides" cast on DataFrame load from list of dicts (#24739)
  • Add support for UInt128 to pyo3-polars (#24731)

🐞 Bug fixes

  • Properly release the GIL for read_parquet_metadata (#24922)
  • Broadcast partition_by columns in over expression (#24874)
  • Clear index cache on stacked df.filter expressions (#24870)
  • Fix 'explode' mapping strategy on scalar value (#24861)
  • Fix repeated with_row_index() after scan() silently ignored (#24866)
  • Correctly return min and max for enums in groupby aggregation (#24808)
  • Refactor BinaryExpr in group_by dispatch logic (#24548)
  • Fix aggstate for gather (#24857)
  • Keep scalars for length preserving functions in group_by (#24819)
  • Have range feature depend on dtype-array feature (#24853)
  • Fix duplicate select panic (#24836)
  • Inconsistency of list.sum() result type with None values (#24476)
  • Division by zero in Expr.dt.truncate (#24832)
  • Potential deadlock in __arrow_c_stream__ (#24831)
  • Allow double aggregations in group-by contexts (#24823)
  • Series.shrink_dtype for i128/u128 (#24833)
  • Fix dtype in EvalExpr (#24650)
  • Allow aggregations on AggState::LiteralScalar (#24820)
  • Dispatch to group_aware for fallible expressions with masked out elements (#24815)
  • Fix error for arr.sum() on small integer Array dtypes containing nulls (#24478)
  • Fix regression on write_database() to Snowflake due to unsupported string view type (#24622)
  • Fix XOR did not follow kleene when one side is unit-length (#24810)
  • Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
  • Incorrect precision in Series.str.to_decimal (#24804)
  • Use overlapping instead of rolling (#24787)
  • Fix iterable on dynamic_group_by and rolling object (#24740)
  • Use Kahan summation for in-memory groupby sum/mean (#24774)
  • Release GIL in PythonScan predicate evaluation (#24779)
  • Type error in bitmask::nth_set_bit_u64 (#24775)
  • Add Expr.sign for Decimal datatype (#24717)
  • Correct str.replace with missing pattern (#24768)
  • Ensure schema_overrides is respected when loading iterable row data (#24721)
  • Support decimal_comma on Decimal type in write_csv (#24718)

📖 Documentation

  • Add partitioning examples for sink_* methods (#24918)
  • Add more {unique,value}_counts examples (#24927)
  • Indent the versionchanged (#24783)
  • Relax fsspec wording (#24881)
  • Add pl.field into the api docs (#24846)
  • Fix duplicated article in SECURITY.md (#24762)
  • Document output name determination in when/then/otherwise (#24746)
  • Specify that precision=None becomes 38 for Decimal (#24742)
  • Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
  • Fix source mapping (#24736)

📦 Build system

  • Update pyo3 and numpy crates to version 0.26 (#24760)

🛠️ Other improvements

  • Re-use iterators in set_ operations (#24850)
  • Remove GroupByPartitioned and dispatch to streaming engine (#24903)
  • Turn element() into {A,}Expr::Element (#24885)
  • Pass ScanOptions to new_from_ipc (#24893)
  • Update tests to be index type agnostic (#24891)
  • Unset Context in Window expression (#24875)
  • Fix failing delta test (#24867)
  • Move FunctionExpr dispatch from plan to expr (#24839)
  • Fix SQL test giving wrong error message (#24835)
  • Consolidate dtype paths in ApplyExpr (#24825)
  • Add days_in_month to documentation (#24822)
  • Enable ruff D417 lint (#24814)
  • Turn pl.format into proper elementwise expression (#24811)
  • Fix remote benchmark by no-longer saving builds (#24812)
  • Refactor ApplyExpr in group_by context on multiple inputs (#24520)
  • IR text plan graph generator (#24733)
  • Temporarily pin pydantic to fix CI (#24797)
  • Extend and rename rolling groups to overlapping (#24577)
  • Refactor DataType proptest strategies (#24763)
  • Add union to documentation (#24769)

Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @nameexhaustion, @orlp, @pavelzw, @ritchie46, @thomasjpfan and @williambdean

Don't miss a new polars release

NewReleases is sending notifications on new releases.