github pola-rs/polars py-1.41.0
Python Polars 1.41.0

6 hours ago

🏆 Highlights

  • Add LazyFrame.gather (#27501)
  • Nested common subplan elimination (#27340)
  • Stabilize streaming engine (#27497)
  • Speed up parquet metadata decode with hand-written Thrift (#27427)

⚠️ Deprecations

  • Deprecate the StringCache (#27580)

🚀 Performance improvements

  • Dispatch {list,arr}.{unique,n_unique,reverse} to group_by engine (#27278)
  • Improve in-memory grouped non-null count (#27702)
  • Factor shared conjuncts out of OR-of-ANDs predicates (#27627)
  • Skip downloading IPC batches exceeding slice bounds (#27683)
  • Avoid materializing broadcast list in list.shift (#27628)
  • Optimise json_decode Datetime string parsing (#27559)
  • Speed up to_numpy C-order via cache-blocked transpose (#27522)
  • Optimize select(len()) for non-strict horizontal concat (#27516)
  • Pushdown slices to inputs on left/right/full join (#27508)
  • Don't infer CSV schema if schema is set (#27507)
  • Nested common subplan elimination (#27340)
  • Make is_in row-group pruning precise on null-containing haystacks (#27495)
  • Don't do fused-multiply-add on scalars (#27479)
  • List full fast path (#27477)
  • Make is_in row-group pruning precise on multi-value lists (#27475)
  • Add streaming GatherNode (#27465)
  • Lower non-elementwise FunctionExprIR to ColumnarFunctionNode (#27462)
  • Speed up parquet metadata decode with hand-written Thrift (#27427)

✨ Enhancements

  • Use true division for the / operator in Polars SQL (#27391)
  • Add Rust backend for Expr.has_nulls (#27590)
  • Stabilize float16 (#27607)
  • Add Expr.is_empty (#27583)
  • Add support for the SQL FILTER clause for aggregate functions, and STRING_AGG (#27564)
  • Make parquet FileMetadata prunable for IR-plan dispatch (#27535)
  • Broadcast scalar input for list.slice (#27487)
  • Add LazyFrame.gather (#27501)
  • Add null_on_oob in {Expr/Series}.gather (#27327)
  • Stabilize streaming engine (#27497)
  • Process batched arr.eval on overflow boundaries (#27496)
  • Process batched list.eval on overflow boundaries (#27483)
  • Print SLICED UNION in LazyFrame explain (#27467)

🐞 Bug fixes

  • Panic in scan of empty IPC with slice (#27708)
  • Persist object_store rebuild state in cache (#27707)
  • Sort flag on GroupsType only applies to first element (#27684)
  • Invalid unwrap_unchecked when length isn't exact (#27685)
  • Don't unwrap channel send in streaming join_asof (#27688)
  • Fix merge_sorted panic when List in frame (#27568)
  • Put AsOf join buffered Morsels back the front of the deque if we cannot process them rn (#27658)
  • Fix skip_batches logic for NaN (#27673)
  • Raise TypeError when calling next() directly on GroupBy objects (#27562)
  • Data type comparison for extension types (#27632)
  • Share last-morsel split budget across files in streaming multi-scan (#27630)
  • Bytes scalars were not being broadcast in dataframe constructor (#27621)
  • Reset the sort-options in Series::is_sorted() after row-encoding columns (#27614)
  • Rayon deadlock with re-entrant io sources (#27600)
  • Don't push negative-offset slices through HConcat (#27570)
  • Logic error in streaming is_empty (#27602)
  • Fix incorrect CSE with large is_in literal (#27575)
  • AnonymousFunction can qualify as SQL aggregator (#26986)
  • Fix CSPE panic in cloud (#27594)
  • Set merge-join streaming node to Finished if its sending port is Done (#27572)
  • Widen decimal precision on sum aggregation at runtime (#27579)
  • Fix str.to_time was raising unnecessarily when input was all nulls (#27574)
  • Prevent panic when switching from one extension dtype to another (#27566)
  • Fix DataFrame.write_database(..., if_table_exists="append", engine="adbc") not handling missing tables correctly (#26913)
  • Ensure json_decode doesn't fail for Date and Time string deserialization (#27554)
  • Incorrect RUSTFLAGS passing in Makefile (#27555)
  • Fix panic on reading IPC with 0-row compressed bitmap (#27551)
  • Set HEAD_RESPONSE_SIZE_ESTIMATE to 0 (#27548)
  • Fix lazy concat horizontal didn't raise on mismatching heights after projection pushdown (#27506)
  • Prevent join panic when suffix="" and coalesce=True (#27376)
  • Do not make a FastCount for csv if pre_slice is set (#27536)
  • Support duplicate names in over (#27544)
  • Reassign sequence numbers when distributing input morsels in streaming AsOf join node (#27538)
  • Do not reverse dataframes when sorting with all-null key columns (#27517)
  • Incorrect length check on streaming zip (#27505)
  • Remove invalid type annotation Sequence[int] from DataFrame.\_\_setitem\_\_ key (#27355)
  • Respect nulls_last for descending over(order_by) in group_by().agg() (#27486)
  • Fix perf regression in scan_csv select(len()) when collected on streaming engine (#27504)
  • Harden extend strictness (#27476)
  • Prevent deadlock when using to_arrow() in a multithreaded context (#27472)
  • Do not flatten sliced union (#27466)
  • Prevent deadlock when using to_pandas() in multithreaded context (#27451)
  • Struct rechunk bug and add Series::with_validity (#27446)
  • Handle column indexing in read_parquet/read_csv with pyarrow reader (#27397)
  • Export enum as ordered dictionary to arrow (#27432)
  • Ensure sample() respects shuffle=False (#27248)
  • Return empty DataFrame from concat_list with lit and empty column (#27305)
  • Read parquet MAP columns without LogicalType annotation (#27404)
  • Raise DuplicateError on parquet files with duplicate column names (#27399)

📖 Documentation

  • Document Expr.list.__getitem__ (#27689)
  • Add cloudpickle requirement (#27703)
  • Clarify from_arrow schema ordering (#27493)
  • Fix a typo in join_asof docstring (#27682)
  • Clarify schema column order (#27681)
  • Document horizontal string concatenation (#27542)
  • Document all valid engine options on LazyFrame collect/sink/explain methods (#27374)
  • Orchestration docs check (#27605)
  • Drop redundant Pattern 2 from Dagster integration page (#27581)
  • Update to remove Dockerhub PAT references (#27582)
  • Modernize Dagster integration example for Polars Cloud (#27560)
  • Use Polars random seed in sample example (#27537)
  • Clarify full join description (#27530)
  • Make expressions operations RNG deterministic (#27494)
  • Document struct field order (#27492)
  • Improve over:order_by description (#27520)
  • Clarify join output columns (#27449)
  • Document null propagation in pl.format (#27447)
  • Document gzip support in read_csv (#27434)
  • Add See Also sections for datetime docstrings (#27316)
  • Polars On-Prem release (#27439)
  • Rename to Polars On-Prem (#27435)
  • Clarify null handling in unique operations (#27431)
  • Document write_ipc buffer behavior with file=None (#27430)

📦 Build system

  • Also split debug info in debug-release (#27609)
  • Use split-debuginfo on linux (#27608)
  • Bump deltalake to 1.5.1 in CI (#27387)

🛠️ Other improvements

  • Remove redundant DSL::AGG::Unique (#27718)
  • Harden against async blocking deadlocks (#27653)
  • Print Python traceback when POLARS_TIMEOUT_MS is exceeded (#27657)
  • Format missed in previous PR (#27700)
  • Bump pytest and remove codspeed (#27686)
  • Remove client-side allow_local_scans option for prepare_cloud_plan (#27663)
  • Remove superfluous test (#27676)
  • Cleanup streaming flags (#27671)
  • Expose unordered concatenation in python visitor (#27666)
  • Bump deltalake and fix CI (#27660)
  • Add impl IntoAExprBuilder for ExprIR (#27656)
  • Split _expand_selector_dicts into multiple functions so return type is simple and accurate (#27618)
  • Update object_store patch repo (#27650)
  • Match NumPy signature in DataFrame.__array__ and Series.__array__ (#27634)
  • Add ImageVersion to rust-cache key (#27626)
  • Run Pyrefly on tests (#27459)
  • Fix is_empty test (#27597)
  • Fix tz type difference pandas assert, take 2 (#27596)
  • Fix CSPE panic in cloud (#27594)
  • Fix tz type difference pandas assert (#27593)
  • Add contributing note about conventional comments (#27543)
  • Nested common subplan elimination (#27340)
  • Deduplicate interns (#27470)
  • Fix merge conflict in ColumnarFunction (#27464)
  • Keep the schema ordered in scan projection pushdown (#27429)
  • Remove unused type: ignore statements (#27360)
  • Remove redundant PhysNodeKind::AsOfJoin::{left_right}_by fields (#27400)
  • Resolve type-ignores in udfs.py (#27341)
  • Bump rustls-webpki (#27382)

Thank you to all our contributors for making this release possible!
@0guban0v, @EndPositive, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @NedJWestern, @Shoeboxam, @SuryaSunil1326, @TNieuwdorp, @alexander-beedie, @aryansri05, @ashler-herrick, @azimafroozeh, @carnarez, @coastalwhite, @dependabot[bot], @dsprenkels, @gab23r, @gautamvarmadatla, @ilya-pevzner, @jonathansergio, @junnythemarksman, @kdn36, @lun3x, @nameexhaustion, @orlp, @pablogsal, @ritchie46, @uurl, @waamm, @wence-, @wmoss, @xronocode and dependabot[bot]

Don't miss a new polars release

NewReleases is sending notifications on new releases.