🏆 Highlights
- Add LazyFrame.gather (#27501)
- Nested common subplan elimination (#27340)
- Stabilize streaming engine (#27497)
- Speed up parquet metadata decode with hand-written Thrift (#27427)
⚠️ Deprecations
- Deprecate the StringCache (#27580)
🚀 Performance improvements
- Dispatch
{list,arr}.{unique,n_unique,reverse}to group_by engine (#27278) - Improve in-memory grouped non-null count (#27702)
- Factor shared conjuncts out of OR-of-ANDs predicates (#27627)
- Skip downloading IPC batches exceeding slice bounds (#27683)
- Avoid materializing broadcast list in
list.shift(#27628) - Optimise
json_decodeDatetime string parsing (#27559) - Speed up
to_numpyC-order via cache-blocked transpose (#27522) - Optimize
select(len())for non-strict horizontal concat (#27516) - Pushdown slices to inputs on left/right/full join (#27508)
- Don't infer CSV schema if schema is set (#27507)
- Nested common subplan elimination (#27340)
- Make
is_inrow-group pruning precise on null-containing haystacks (#27495) - Don't do fused-multiply-add on scalars (#27479)
- List full fast path (#27477)
- Make
is_inrow-group pruning precise on multi-value lists (#27475) - Add streaming GatherNode (#27465)
- Lower non-elementwise FunctionExprIR to ColumnarFunctionNode (#27462)
- Speed up parquet metadata decode with hand-written Thrift (#27427)
✨ Enhancements
- Use true division for the
/operator in Polars SQL (#27391) - Add Rust backend for Expr.has_nulls (#27590)
- Stabilize float16 (#27607)
- Add Expr.is_empty (#27583)
- Add support for the SQL
FILTERclause for aggregate functions, andSTRING_AGG(#27564) - Make parquet
FileMetadataprunable for IR-plan dispatch (#27535) - Broadcast scalar input for
list.slice(#27487) - Add LazyFrame.gather (#27501)
- Add
null_on_oobin {Expr/Series}.gather (#27327) - Stabilize streaming engine (#27497)
- Process batched
arr.evalon overflow boundaries (#27496) - Process batched
list.evalon overflow boundaries (#27483) - Print
SLICED UNIONin LazyFrame explain (#27467)
🐞 Bug fixes
- Panic in scan of empty IPC with slice (#27708)
- Persist object_store rebuild state in cache (#27707)
- Sort flag on GroupsType only applies to first element (#27684)
- Invalid unwrap_unchecked when length isn't exact (#27685)
- Don't unwrap channel send in streaming join_asof (#27688)
- Fix
merge_sortedpanic when List in frame (#27568) - Put AsOf join buffered Morsels back the front of the deque if we cannot process them rn (#27658)
- Fix skip_batches logic for NaN (#27673)
- Raise
TypeErrorwhen callingnext()directly onGroupByobjects (#27562) - Data type comparison for extension types (#27632)
- Share last-morsel split budget across files in streaming multi-scan (#27630)
- Bytes scalars were not being broadcast in dataframe constructor (#27621)
- Reset the sort-options in
Series::is_sorted()after row-encoding columns (#27614) - Rayon deadlock with re-entrant io sources (#27600)
- Don't push negative-offset slices through
HConcat(#27570) - Logic error in streaming is_empty (#27602)
- Fix incorrect CSE with large is_in literal (#27575)
- AnonymousFunction can qualify as SQL aggregator (#26986)
- Fix CSPE panic in cloud (#27594)
- Set merge-join streaming node to
Finishedif its sending port isDone(#27572) - Widen decimal precision on sum aggregation at runtime (#27579)
- Fix
str.to_timewas raising unnecessarily when input was all nulls (#27574) - Prevent panic when switching from one extension dtype to another (#27566)
- Fix
DataFrame.write_database(..., if_table_exists="append", engine="adbc")not handling missing tables correctly (#26913) - Ensure
json_decodedoesn't fail for Date and Time string deserialization (#27554) - Incorrect RUSTFLAGS passing in Makefile (#27555)
- Fix panic on reading IPC with 0-row compressed bitmap (#27551)
- Set HEAD_RESPONSE_SIZE_ESTIMATE to 0 (#27548)
- Fix lazy concat horizontal didn't raise on mismatching heights after projection pushdown (#27506)
- Prevent join panic when
suffix=""andcoalesce=True(#27376) - Do not make a
FastCountfor csv ifpre_sliceis set (#27536) - Support duplicate names in
over(#27544) - Reassign sequence numbers when distributing input morsels in streaming AsOf join node (#27538)
- Do not reverse dataframes when sorting with all-null key columns (#27517)
- Incorrect length check on streaming zip (#27505)
- Remove invalid type annotation
Sequence[int]fromDataFrame.\_\_setitem\_\_key(#27355) - Respect
nulls_lastfor descending over(order_by) ingroup_by().agg()(#27486) - Fix perf regression in
scan_csvselect(len())when collected on streaming engine (#27504) - Harden extend strictness (#27476)
- Prevent deadlock when using
to_arrow()in a multithreaded context (#27472) - Do not flatten sliced union (#27466)
- Prevent deadlock when using
to_pandas()in multithreaded context (#27451) - Struct rechunk bug and add Series::with_validity (#27446)
- Handle column indexing in
read_parquet/read_csvwith pyarrow reader (#27397) - Export enum as ordered dictionary to arrow (#27432)
- Ensure
sample()respectsshuffle=False(#27248) - Return empty
DataFramefromconcat_listwithlitand empty column (#27305) - Read parquet
MAPcolumns withoutLogicalTypeannotation (#27404) - Raise
DuplicateErroron parquet files with duplicate column names (#27399)
📖 Documentation
- Document Expr.list.__getitem__ (#27689)
- Add cloudpickle requirement (#27703)
- Clarify from_arrow schema ordering (#27493)
- Fix a typo in
join_asofdocstring (#27682) - Clarify schema column order (#27681)
- Document horizontal string concatenation (#27542)
- Document all valid
engineoptions on LazyFrame collect/sink/explain methods (#27374) - Orchestration docs check (#27605)
- Drop redundant Pattern 2 from Dagster integration page (#27581)
- Update to remove Dockerhub PAT references (#27582)
- Modernize Dagster integration example for Polars Cloud (#27560)
- Use Polars random seed in sample example (#27537)
- Clarify full join description (#27530)
- Make expressions operations RNG deterministic (#27494)
- Document struct field order (#27492)
- Improve
over:order_bydescription (#27520) - Clarify join output columns (#27449)
- Document null propagation in pl.format (#27447)
- Document gzip support in read_csv (#27434)
- Add See Also sections for datetime docstrings (#27316)
- Polars On-Prem release (#27439)
- Rename to Polars On-Prem (#27435)
- Clarify null handling in unique operations (#27431)
- Document
write_ipcbuffer behavior withfile=None(#27430)
📦 Build system
- Also split debug info in debug-release (#27609)
- Use split-debuginfo on linux (#27608)
- Bump deltalake to 1.5.1 in CI (#27387)
🛠️ Other improvements
- Remove redundant DSL::AGG::Unique (#27718)
- Harden against async blocking deadlocks (#27653)
- Print Python traceback when POLARS_TIMEOUT_MS is exceeded (#27657)
- Format missed in previous PR (#27700)
- Bump pytest and remove codspeed (#27686)
- Remove client-side
allow_local_scansoption forprepare_cloud_plan(#27663) - Remove superfluous test (#27676)
- Cleanup streaming flags (#27671)
- Expose unordered concatenation in python visitor (#27666)
- Bump
deltalakeand fix CI (#27660) - Add
impl IntoAExprBuilderforExprIR(#27656) - Split
_expand_selector_dictsinto multiple functions so return type is simple and accurate (#27618) - Update object_store patch repo (#27650)
- Match NumPy signature in
DataFrame.__array__andSeries.__array__(#27634) - Add ImageVersion to rust-cache key (#27626)
- Run Pyrefly on tests (#27459)
- Fix is_empty test (#27597)
- Fix tz type difference pandas assert, take 2 (#27596)
- Fix CSPE panic in cloud (#27594)
- Fix tz type difference pandas assert (#27593)
- Add contributing note about conventional comments (#27543)
- Nested common subplan elimination (#27340)
- Deduplicate interns (#27470)
- Fix merge conflict in ColumnarFunction (#27464)
- Keep the schema ordered in scan projection pushdown (#27429)
- Remove unused
type: ignorestatements (#27360) - Remove redundant
PhysNodeKind::AsOfJoin::{left_right}_byfields (#27400) - Resolve type-ignores in
udfs.py(#27341) - Bump rustls-webpki (#27382)
Thank you to all our contributors for making this release possible!
@0guban0v, @EndPositive, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @NedJWestern, @Shoeboxam, @SuryaSunil1326, @TNieuwdorp, @alexander-beedie, @aryansri05, @ashler-herrick, @azimafroozeh, @carnarez, @coastalwhite, @dependabot[bot], @dsprenkels, @gab23r, @gautamvarmadatla, @ilya-pevzner, @jonathansergio, @junnythemarksman, @kdn36, @lun3x, @nameexhaustion, @orlp, @pablogsal, @ritchie46, @uurl, @waamm, @wence-, @wmoss, @xronocode and dependabot[bot]