pola-rs/polars py-1.41.0 on GitHub

🏆 Highlights

Add LazyFrame.gather (#27501)
Nested common subplan elimination (#27340)
Stabilize streaming engine (#27497)
Speed up parquet metadata decode with hand-written Thrift (#27427)

⚠️ Deprecations

Deprecate the StringCache (#27580)

🚀 Performance improvements

Dispatch {list,arr}.{unique,n_unique,reverse} to group_by engine (#27278)
Improve in-memory grouped non-null count (#27702)
Factor shared conjuncts out of OR-of-ANDs predicates (#27627)
Skip downloading IPC batches exceeding slice bounds (#27683)
Avoid materializing broadcast list in list.shift (#27628)
Optimise json_decode Datetime string parsing (#27559)
Speed up to_numpy C-order via cache-blocked transpose (#27522)
Optimize select(len()) for non-strict horizontal concat (#27516)
Pushdown slices to inputs on left/right/full join (#27508)
Don't infer CSV schema if schema is set (#27507)
Nested common subplan elimination (#27340)
Make is_in row-group pruning precise on null-containing haystacks (#27495)
Don't do fused-multiply-add on scalars (#27479)
List full fast path (#27477)
Make is_in row-group pruning precise on multi-value lists (#27475)
Add streaming GatherNode (#27465)
Lower non-elementwise FunctionExprIR to ColumnarFunctionNode (#27462)
Speed up parquet metadata decode with hand-written Thrift (#27427)

✨ Enhancements

Use true division for the / operator in Polars SQL (#27391)
Add Rust backend for Expr.has_nulls (#27590)
Stabilize float16 (#27607)
Add Expr.is_empty (#27583)
Add support for the SQL FILTER clause for aggregate functions, and STRING_AGG (#27564)
Make parquet FileMetadata prunable for IR-plan dispatch (#27535)
Broadcast scalar input for list.slice (#27487)
Add LazyFrame.gather (#27501)
Add null_on_oob in {Expr/Series}.gather (#27327)
Stabilize streaming engine (#27497)
Process batched arr.eval on overflow boundaries (#27496)
Process batched list.eval on overflow boundaries (#27483)
Print SLICED UNION in LazyFrame explain (#27467)

🐞 Bug fixes

Panic in scan of empty IPC with slice (#27708)
Persist object_store rebuild state in cache (#27707)
Sort flag on GroupsType only applies to first element (#27684)
Invalid unwrap_unchecked when length isn't exact (#27685)
Don't unwrap channel send in streaming join_asof (#27688)
Fix merge_sorted panic when List in frame (#27568)
Put AsOf join buffered Morsels back the front of the deque if we cannot process them rn (#27658)
Fix skip_batches logic for NaN (#27673)
Raise TypeError when calling next() directly on GroupBy objects (#27562)
Data type comparison for extension types (#27632)
Share last-morsel split budget across files in streaming multi-scan (#27630)
Bytes scalars were not being broadcast in dataframe constructor (#27621)
Reset the sort-options in Series::is_sorted() after row-encoding columns (#27614)
Rayon deadlock with re-entrant io sources (#27600)
Don't push negative-offset slices through HConcat (#27570)
Logic error in streaming is_empty (#27602)
Fix incorrect CSE with large is_in literal (#27575)
AnonymousFunction can qualify as SQL aggregator (#26986)
Fix CSPE panic in cloud (#27594)
Set merge-join streaming node to Finished if its sending port is Done (#27572)
Widen decimal precision on sum aggregation at runtime (#27579)
Fix str.to_time was raising unnecessarily when input was all nulls (#27574)
Prevent panic when switching from one extension dtype to another (#27566)
Fix DataFrame.write_database(..., if_table_exists="append", engine="adbc") not handling missing tables correctly (#26913)
Ensure json_decode doesn't fail for Date and Time string deserialization (#27554)
Incorrect RUSTFLAGS passing in Makefile (#27555)
Fix panic on reading IPC with 0-row compressed bitmap (#27551)
Set HEAD_RESPONSE_SIZE_ESTIMATE to 0 (#27548)
Fix lazy concat horizontal didn't raise on mismatching heights after projection pushdown (#27506)
Prevent join panic when suffix="" and coalesce=True (#27376)
Do not make a FastCount for csv if pre_slice is set (#27536)
Support duplicate names in over (#27544)
Reassign sequence numbers when distributing input morsels in streaming AsOf join node (#27538)
Do not reverse dataframes when sorting with all-null key columns (#27517)
Incorrect length check on streaming zip (#27505)
Remove invalid type annotation Sequence[int] from DataFrame.\_\_setitem\_\_ key (#27355)
Respect nulls_last for descending over(order_by) in group_by().agg() (#27486)
Fix perf regression in scan_csv select(len()) when collected on streaming engine (#27504)
Harden extend strictness (#27476)
Prevent deadlock when using to_arrow() in a multithreaded context (#27472)
Do not flatten sliced union (#27466)
Prevent deadlock when using to_pandas() in multithreaded context (#27451)
Struct rechunk bug and add Series::with_validity (#27446)
Handle column indexing in read_parquet/read_csv with pyarrow reader (#27397)
Export enum as ordered dictionary to arrow (#27432)
Ensure sample() respects shuffle=False (#27248)
Return empty DataFrame from concat_list with lit and empty column (#27305)
Read parquet MAP columns without LogicalType annotation (#27404)
Raise DuplicateError on parquet files with duplicate column names (#27399)

📖 Documentation

Document Expr.list.__getitem__ (#27689)
Add cloudpickle requirement (#27703)
Clarify from_arrow schema ordering (#27493)
Fix a typo in join_asof docstring (#27682)
Clarify schema column order (#27681)
Document horizontal string concatenation (#27542)
Document all valid engine options on LazyFrame collect/sink/explain methods (#27374)
Orchestration docs check (#27605)
Drop redundant Pattern 2 from Dagster integration page (#27581)
Update to remove Dockerhub PAT references (#27582)
Modernize Dagster integration example for Polars Cloud (#27560)
Use Polars random seed in sample example (#27537)
Clarify full join description (#27530)
Make expressions operations RNG deterministic (#27494)
Document struct field order (#27492)
Improve over:order_by description (#27520)
Clarify join output columns (#27449)
Document null propagation in pl.format (#27447)
Document gzip support in read_csv (#27434)
Add See Also sections for datetime docstrings (#27316)
Polars On-Prem release (#27439)
Rename to Polars On-Prem (#27435)
Clarify null handling in unique operations (#27431)
Document write_ipc buffer behavior with file=None (#27430)

📦 Build system

Also split debug info in debug-release (#27609)
Use split-debuginfo on linux (#27608)
Bump deltalake to 1.5.1 in CI (#27387)

🛠️ Other improvements

Remove redundant DSL::AGG::Unique (#27718)
Harden against async blocking deadlocks (#27653)
Print Python traceback when POLARS_TIMEOUT_MS is exceeded (#27657)
Format missed in previous PR (#27700)
Bump pytest and remove codspeed (#27686)
Remove client-side allow_local_scans option for prepare_cloud_plan (#27663)
Remove superfluous test (#27676)
Cleanup streaming flags (#27671)
Expose unordered concatenation in python visitor (#27666)
Bump deltalake and fix CI (#27660)
Add impl IntoAExprBuilder for ExprIR (#27656)
Split _expand_selector_dicts into multiple functions so return type is simple and accurate (#27618)
Update object_store patch repo (#27650)
Match NumPy signature in DataFrame.__array__ and Series.__array__ (#27634)
Add ImageVersion to rust-cache key (#27626)
Run Pyrefly on tests (#27459)
Fix is_empty test (#27597)
Fix tz type difference pandas assert, take 2 (#27596)
Fix CSPE panic in cloud (#27594)
Fix tz type difference pandas assert (#27593)
Add contributing note about conventional comments (#27543)
Nested common subplan elimination (#27340)
Deduplicate interns (#27470)
Fix merge conflict in ColumnarFunction (#27464)
Keep the schema ordered in scan projection pushdown (#27429)
Remove unused type: ignore statements (#27360)
Remove redundant PhysNodeKind::AsOfJoin::{left_right}_by fields (#27400)
Resolve type-ignores in udfs.py (#27341)
Bump rustls-webpki (#27382)

Thank you to all our contributors for making this release possible!
@0guban0v, @EndPositive, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @NedJWestern, @Shoeboxam, @SuryaSunil1326, @TNieuwdorp, @alexander-beedie, @aryansri05, @ashler-herrick, @azimafroozeh, @carnarez, @coastalwhite, @dependabot[bot], @dsprenkels, @gab23r, @gautamvarmadatla, @ilya-pevzner, @jonathansergio, @junnythemarksman, @kdn36, @lun3x, @nameexhaustion, @orlp, @pablogsal, @ritchie46, @uurl, @waamm, @wence-, @wmoss, @xronocode and dependabot[bot]

pola-rs/polars py-1.41.0 Python Polars 1.41.0 on GitHub

🏆 Highlights

⚠️ Deprecations

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

pola-rs/polars py-1.41.0
Python Polars 1.41.0

on GitHub