github pola-rs/polars py-1.40.0
Python Polars 1.40.0

one day ago

🏆 Highlights

  • Add streaming support for grouped AsOf join (#27293)

⚠️ Deprecations

  • Deprecate support for dataframe interchange protocol (#27214)

🚀 Performance improvements

  • Create IR slice from expr slice pushdown (#27200)
  • Add streaming support for grouped AsOf join (#27293)
  • Avoid unnecessary rechunk when sorting already sorted DataFrame (#27264)
  • Lower basic over() to streaming primitives (#27303)
  • Lower drop_{nulls,nans} in streaming group_by aggregations (#27296)
  • Lower entropy to streaming reductions (#27174)
  • Add native streaming interpolate (#27185)
  • Streaming strptime with format=None (#27056)
  • Lower skew / kurtosis to streaming aggregations (#27176)
  • Post apply pyarrow filter in Polars' engine instead of pyarrow (#27192)
  • Optimize drop_nulls().{first,last}() to {first,last}(ignore_nulls=True) (#27187)
  • Always process pyarrow scan in batches (#27183)
  • Make cut output Enum and mark as elementwise (#27173)
  • Remove unused expression sorts (#27075)
  • Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
  • Take into account size per row in join sampling (#27098)
  • Streaming is_first_distinct and unique(maintain_order=True) (#27052)
  • Streaming cov and corr (#27008)
  • Add sorted unique node to streaming engine (#26990)
  • Ensure Expr.append is lowered in streaming engine (#27022)
  • Collapse consecutive Sort nodes (#26965)
  • Drop maintain_order=True requirement in sink_delta (#27007)

✨ Enhancements

  • Add ignore_nulls to {list,arr}.{any,all} (#27186)
  • Lock-free memory manager with spill-to-disk and fully OOC multiplexer (#26774)
  • Add is_unique to list/array dtypes (#27290)
  • Streaming pyarrow datasets sources (#27230)
  • Add pl.merge_sorted operating on multiple frames (#27014)
  • Allow group_by() without key exprs (#27141)
  • Change default scan/read_lines column name from "lines" to "line" (#27122)
  • Make unnest() effective on all columns by default (#27029)
  • Collapse consecutive Sort nodes (#26965)

🐞 Bug fixes

  • Update groups to correct length for Implode (#27282)
  • Fix scan_csv missing_columns='insert' overwrote existing data with NULLs (#27297)
  • Raise on non-numeric inputs in pl.int_ranges (#27294)
  • Fix always-true filter conversion to Iceberg filter (#27119)
  • Do not skip nulls when enumerating over rows in grouped AsOf join (#27275)
  • Fix pivot dropping data for null on values (#27273)
  • Resolve multiple files deadlock in CSV async reader (#27073)
  • Widen decimal precision on sum aggregation (#27270)
  • Correct lf.remote type (#27261)
  • Default LazyFrame.map_batches to no optimizations (#27262)
  • Extend StructEval schema context in StackOptimizer (#27243)
  • Preserve nulls when casting from all-null Series to Struct (#27241)
  • Fix scan_delta filter on empty dataframe (#27244)
  • Prevent DataFrame creation panic on list[struct] with heterogenous types (#27217)
  • Named aggregation __structify was being ignored (#27148)
  • Skip null group entries when collecting AsOf-by groups (#27215)
  • Fix panic with empty order_by in over expression (#27088)
  • Write field ID from sink_parquet (#27196)
  • Fix statistics for Null columns in Parquet (#27021)
  • Do not prune sort nodes containing slice with dyn predicate (#27140)
  • Correct grouped Binary arg_min/arg_max and String single-element arg indices (#27172)
  • Resolve multiple files deadlock in NDJSON async reader (#27204)
  • Overflow panic in interpolate nearest (#27205)
  • Using checked arithmetic in int96_to_i64_ns to prevent overflow panic (#27129)
  • Don't trigger csv fast count if predicate is pushed down (#27190)
  • Support all integer dtypes for Series index assignment (#27188)
  • Streaming sort by-expressions were lowered incorrectly (#27158)
  • Replace multiprocessing.dummy.Pool with ThreadPoolExecutor (#27175)
  • Reset IO metrics instead of consuming (#27156)
  • Output SVG if output_path ends with '.svg' in show_graph (#27144)
  • Skip extension types for min/max in describe (#27120)
  • Address a potential overflow in from_epoch scaling (#27118)
  • Fix incorrect IO metrics on multi-phase streaming execution (#27123)
  • Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
  • Make the files used in docs available locally (#27121)
  • Apply scalar bound in clip when the Series bound contains nulls (#27087)
  • Ignore ddof parameter in rolling_corr and deprecate (#27104)
  • Preserve casts for horizontal ops with untyped literals (#27011)
  • Reject invalid input to sql_expr (#27084)
  • Ensure SQL COUNT(<lit>) expressions return the correct value (#27085)
  • Regression in replace_strict for enums (#27066)
  • Make test_group_by_arg_max_boolean_26978 non-flaky for max_by ties (#27048)
  • Null count for aggregated list inside count aggregation (#27032)
  • Panic in streaming MergeSortedNode (#27024)
  • Prevent panic in transpose() with mixed List and non-List columns (#27038)
  • Set sorted flag for Boolean and Time (#27035)
  • Missing src/ subdirectory to CI Python docs step (#27025)
  • Resolve stack overflow on merge_sorted and union (#27018)
  • Make pl.DataFrame.fill_null work on columns with Null dtype (#27020)
  • Fix repeated word typos in comments (#26917)
  • Covariance with constant is zero, not NaN (#27015)
  • Don't remove set_sorted in projection pushdown (#27006)
  • Infer nulls when df create from empty-struct (#26991)
  • Correct suggestion in multi-expr filter error (#27003)
  • Implement agg_arg_min/agg_arg_max for boolean data type (#26997)
  • Ensure sample() respects the global set seed (#26992)

📖 Documentation

  • Add documentation for openlineage on-premises (#27334)
  • Release page (#27335)
  • Update uv pip install polars-on-premises cmd (#27330)
  • Fix outdated LazyGroupBy.map_groups docstring (#27292)
  • Add deny_anonymous_users to scheduler config (#27287)
  • Slurm documentation (#27259)
  • Add link to concepts in index.md (#27077)
  • Add docs entry for merge_sorted (#27224)
  • Fix typo (#27212)
  • Make the files used in docs available locally (#27121)
  • Put first-time contribution requirements in its own linkable section (#27113)
  • Add missing docstrings for Expr.struct.__getitem__ and Series.__setitem__ (#27092)
  • Normalise Series docstring whitespace indents (#27082)
  • Change Polars Cloud API to 0.6.0 (#27005)
  • Improve write_parquet docstring for use_pyarrow (#26988)

📦 Build system

  • Really do not install pyiceberg-core 0.9.0 (#27017)

🛠️ Other improvements

  • Add regression test for instantiating polars DataFrame from pandas Timestamp (#27332)
  • Bump Python Polars version (#27315)
  • Resolve bad instantiations in test_iceberg (#27314)
  • Sink DSL and callback for Iceberg (#27258)
  • Wait for morsel consumption in merge_sorted streaming node (#27288)
  • Use more precise internal typing (pt. iii) (#27232)
  • Mark scan_ipc cache arguments as deprecated (#27216)
  • Consolidate reordered compare functions (#27229)
  • Fix test_dtype_concat_3735 not actually iterating through numeric dtypes (#27178)
  • Remove dead code in test_scan_lines (#27213)
  • Move/genericize _balanced_reduce to Python utils (#27100)
  • Remove unused attributes (#27191)
  • Avoid unnecessary recompilation due to changing env vars (#27166)
  • Update nightly Rust compiler version (#27145)
  • Simplify pyarrow scan and process in batches (#26982)
  • Make internal typing more precise (part ii) (#27117)
  • Add None & Dataframe to FrameInitTypes (#27126)
  • Remove unused expression sorts (#27075)
  • Improve internal typing ahead of using ty / pyrefly (#27050)
  • Add explicit ResourceWarning coverage (#27083)
  • Add sinked paths callback (#26995)
  • Pin maturin due to compile time regression (#27062)
  • Missing src/ subdirectory to CI Python docs step (#27025)
  • Really do not install pyiceberg-core 0.9.0 (#27017)
  • Naming for named scopes (#26999)
  • Enable hypothesis tests when POLARS_AUTO_NEW_STREAMING=1 (#26818)
  • Fix CI by excluding missing wheel version of pyiceberg (#27001)
  • Remove indirection in calling python scans (#26981)
  • Polars versions (#26980)

Thank you to all our contributors for making this release possible!
@0xRozier, @EndPositive, @HCYT, @Kevin-Patyk, @MarcoGorelli, @NeejWeej, @RedZapdos123, @TNieuwdorp, @abhidotsh, @alexander-beedie, @andyjessen, @azimafroozeh, @borchero, @carnarez, @coastalwhite, @debnathshoham, @dpinol, @dsprenkels, @dydev012, @farouk-01, @gab23r, @gautamvarmadatla, @joaquinhuigomez, @kdn36, @nameexhaustion, @orlp, @ritchie46, @wence-, @xenzh, @yangsong97 and @yonatan-genai

Don't miss a new polars release

NewReleases is sending notifications on new releases.