pola-rs/polars rs-0.54.4 on GitHub

🏆 Highlights

Add LazyFrame.gather (#27501)
Nested common subplan elimination (#27340)
Stabilize streaming engine (#27497)
Speed up parquet metadata decode with hand-written Thrift (#27427)
Add streaming support for grouped AsOf join (#27293)

🚀 Performance improvements

Eliminate filters with contradictory predicates (#27775)
Update to new jemalloc (#27797)
Do not materialize ScalarColumn in Column split_at (#27782)
Avoid materializing broadcast in array.shift (#27740)
Avoid materializing broadcast list in list.sample(n) and list.sample(frac) (#27679)
Adaptive size dispatch to hashset or radix sort + capacity-aware reset in agg_n_unique (#27719)
Dispatch {list,arr}.{unique,n_unique,reverse} to group_by engine (#27278)
Improve in-memory grouped non-null count (#27702)
Factor shared conjuncts out of OR-of-ANDs predicates (#27627)
Skip downloading IPC batches exceeding slice bounds (#27683)
Faster Series::is_sorted for logical / non-primitive types (#27567)
Avoid materializing broadcast list in list.shift (#27628)
Optimise json_decode Datetime string parsing (#27559)
Speed up to_numpy C-order via cache-blocked transpose (#27522)
Optimize select(len()) for non-strict horizontal concat (#27516)
Pushdown slices to inputs on left/right/full join (#27508)
Don't infer CSV schema if schema is set (#27507)
Nested common subplan elimination (#27340)
Make is_in row-group pruning precise on null-containing haystacks (#27495)
Don't do fused-multiply-add on scalars (#27479)
List full fast path (#27477)
Make is_in row-group pruning precise on multi-value lists (#27475)
Add streaming GatherNode (#27465)
Lower non-elementwise FunctionExprIR to ColumnarFunctionNode (#27462)
Speed up parquet metadata decode with hand-written Thrift (#27427)
Skip validity mask processing in __array_ufunc__ when no inputs have nulls (#27358)
Create IR slice from expr slice pushdown (#27200)
Add streaming support for grouped AsOf join (#27293)
Avoid unnecessary rechunk when sorting already sorted DataFrame (#27264)
Lower basic over() to streaming primitives (#27303)
Lower drop_{nulls,nans} in streaming group_by aggregations (#27296)
Lower entropy to streaming reductions (#27174)
Add native streaming interpolate (#27185)
Streaming strptime with format=None (#27056)
Lower skew / kurtosis to streaming aggregations (#27176)
Post apply pyarrow filter in Polars' engine instead of pyarrow (#27192)
Optimize drop_nulls().{first,last}() to {first,last}(ignore_nulls=True) (#27187)
Always process pyarrow scan in batches (#27183)
Make cut output Enum and mark as elementwise (#27173)
Remove unused expression sorts (#27075)
Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
Take into account size per row in join sampling (#27098)
Streaming is_first_distinct and unique(maintain_order=True) (#27052)
Streaming cov and corr (#27008)
Add sorted unique node to streaming engine (#26990)
Ensure Expr.append is lowered in streaming engine (#27022)
Collapse consecutive Sort nodes (#26965)
Drop maintain_order=True requirement in sink_delta (#27007)
Lower index_of to streaming engine (#26923)
Streaming native backward_fill (#26967)
Native streaming forward_fill (#26922)
Drop unused filter column above cache (#26955)
Optimize .replace() from a single value (#26948)
Add a streaming range-join (#26790)
Lower arg_{min,max} to streaming engine (#26845)
Additional IR slice pushdown after filter pushdown (#26815)
Streaming first/last on Enum through physical (#26783)
Fast filter for scalar predicates (#26745)
Allow SimpleProjection in streaming engine to rename (#26709)
Streaming cloud download for scan_csv (#26637)
Drop columns only needed for predicates after the predicate is applied (#26703)
Run projection pushdown after predicate pushdown (#26688)
Comparison literal downcasting (#26663)
Add dynamic predicates for TopK (#26495)
Increase minimum default parquet row group prefetch to 8 (#26632)
Partial predicate conversion to PyArrow (#26567)
Streaming cloud download for scan_ndjson / scan_lines (#26563)
Grab GIL fewer times during Object join materialization (#26587)
Improve CSV and NDJSON cloud sink performance (#26545)
Tune cloud writer performance (#26518)
Allow parallel InMemorySinks in streaming engine (#26501)
Add streaming AsOf join node (#26398)

✨ Enhancements

Expose fixed-size rolling window expressions in Python visitor (#27108)
Expose IR::Scan hive parts in the python node visitor (#27829)
Expose IRFunctionExpr::DynamicPred in the python visitor (#27616)
Fix SchemaError using lazy HConcat->Sink (#27770)
Add pinning and queuing logic to polars-ooc (#27791)
Add tiered multi-file parquet metadata resolver (#27720)
Cache and shuffle DNS for cloud object_store (#27659)
Update to new jemalloc (#27797)
Allow deeper expressions (#27768)
Add is_inherently_nondeterministic helper for AExpr (#27687)
Use true division for the / operator in Polars SQL (#27391)
Add Rust backend for Expr.has_nulls (#27590)
Add block_in_place to Polars' async executor (#27612)
Stabilize float16 (#27607)
Add Expr.is_empty (#27583)
Add support for the SQL FILTER clause for aggregate functions, and STRING_AGG (#27564)
Make parquet FileMetadata prunable for IR-plan dispatch (#27535)
Broadcast scalar input for list.slice (#27487)
Add LazyFrame.gather (#27501)
Add null_on_oob in {Expr/Series}.gather (#27327)
Stabilize streaming engine (#27497)
Process batched arr.eval on overflow boundaries (#27496)
Process batched list.eval on overflow boundaries (#27483)
Print SLICED UNION in LazyFrame explain (#27467)
Cargo deny (#27363)
Add maintain_order parameter to merge_sorted (#27263)
Add ignore_nulls to {list,arr}.{any,all} (#27186)
Lock-free memory manager with spill-to-disk and fully OOC multiplexer (#26774)
Add is_unique to list/array dtypes (#27290)
Add pl.merge_sorted operating on multiple frames (#27014)
Add fast_alloc feature flag, remove default_alloc (#27206)
Add a GPU slot to OptFlags so we can control CSE (similar to streaming) (#27026)
Allow group_by() without key exprs (#27141)
Collapse consecutive Sort nodes (#26965)
Use UUIDv7 for sink_iceberg directory name generation (#26958)
Truncate large binary/utf8 Parquet statistics values (#26764)
Error if PartitionBy path provider returns absolute path that does not begin with base path, or contains '..' (#26894)
Support Delta deletion vectors in scan_delta (#26867)
Support Decimal32/64 in scan_parquet (#26941)
Support casting Duration to String in ISO 8601 format (#26860)
Add a streaming range-join (#26790)
Support Expr for holidays in business day calculations (#26193)
Parameter for pivot to always include value column name (#26730)
Raise error in .collect_schema() when arr.get() is out-of-bounds (#26866)
Extend Expr.reinterpret to all numeric types of the same size (#26401)
Add missing_columns parameter to scan_csv (#26787)
Clear no-op scan projections (#26858)
Support nested datatypes for {min,max}_by (#26849)
Support SQL ARRAY init from typed literals (#26622)
Accept table identifier string in scan_iceberg() (#26826)
Add a convenience make fresh command to the Makefile (#26809)
Add unstable LazyFrame.sink_iceberg (#26799)
Add maintain order argument on implode (#26782)
Implement predicate pushdown for aliased groupby keys (#26597)
Speed up casting primitive to bool by at least 2x (#26823)
Enable rowgroup skipping for float columns (#26805)
Add expression context to errors (#26716)
Add Decimal support for product reduction (#26725)
Support all Iceberg V2 arrow types in sink_parquet arrow_schema parameter (#26669)
Re-work behavior of arrow_schema parameter on sink_parquet (#26621)
Add contains_dtype() method for Schema (#26661)
Implement truncate as a "to_zero" rounding mode (#26677)
Expose AExpr::Rolling in the python visitor (#26715)
More generic streaming GroupBy lowering (#26696)
Add basic MemoryManager to track buffered dataframes for out-of-core support later (#26443)
Add truncate Expression for numeric values (#26666)
Better error messages for hex literal conversion issues in the SQL interface (#26657)
Add SQL support for LPAD and RPAD string functions (#26631)
Support SQL "FROM-first" SELECT query syntax (#26598)
Speed up any() and all() for nulls (#26615)
Bump Chrono to 0.4.24, enabling stricter parsing of %.3f/%.6f/%.9f specifiers (#26075)
Expose unstable assert_schema_equal in py-polars (#24869)
Allow parsing of compact ISO 8601 strings (#24629)
Streaming cloud download for scan_ndjson / scan_lines (#26563)
Configuration to cast integers to floats in cast_options for scan_parquet (#26492)
Add escaping to quotes and newlines when reading JSON object into string (#26578)
Standardise on RFC-5545 when doing datetime arithmetic on timezone-aware datetimes (#26425)
Support sas_token in Azure credential provider (#26565)
Expose HConcat options in the python node visitor (#26551)
Relax SQL requirement for derived tables and subqueries to have aliases (#26543)
Add polars-config and pl.Config.reload_env_vars() (#26524)
Record path for object store error raised from sinks (#26541)
Use CRC64NVME for checksum in aws sinks (#26522)
Add get() for binary Series (#26514)
Add streaming AsOf join node (#26398)

🐞 Bug fixes

Fix skip_batches not handling negation of bool dtype with None values (#27452)
Use block_in_place_on for calls which can come from executor thread (#27855)
Mismatch in max_threads -> pipeline configuration (#27854)
Keep maintain_order on sliced unique (#27852)
Fix SchemaError using lazy HConcat->Sink (#27770)
Fix incorrect projection height when selecting only literals (#27825)
Fix rolling aggregations with window_size=0 (#27812)
Select with expr slice and len gave incorrect len (#27824)
Prevent import panic when environment variable set to unexpected value (#27831)
Broken link to AI Policy corrected (#27793)
Update to new jemalloc (#27797)
Swap PlHashMap for PlIndexMap to make Multiplexer insertion order stable (#27785)
Compare length for inline slice as usize (#27779)
Raise length mismatch in multiple sort_by in group_by (#27772)
Respect min_samples for rolling_by ops with nulls (#27706)
Fix memory usage regression affecting TPCH Q22 (#27758)
Add POLARS_ALLOW_NESTED_CSPE env var and make nested CSPE opt-in (#27765)
Post-apply residual pyarrow predicates (#27764)
Fix loss of precision for smaller floating types(#27662) (#27732)
Filter at scan dropped in CSPE filter pushdown (#27763)
Fix portstate assertion error on is_in (#27757)
Fix incorrect when/then after forward fill / reverse in groupby (#27745)
Accept empty Thrift list encoded as bare 0x00 byte in parquet metadata (#27754)
Stabilize object store credentialprovider cache key (#27712)
Panic in scan of empty IPC with slice (#27708)
Persist object_store rebuild state in cache (#27707)
Sort flag on GroupsType only applies to first element (#27684)
Invalid unwrap_unchecked when length isn't exact (#27685)
Logic error in async executor block_in_place (#27698)
Don't unwrap channel send in streaming join_asof (#27688)
Fix merge_sorted panic when List in frame (#27568)
Put AsOf join buffered Morsels back the front of the deque if we cannot process them rn (#27658)
Fix FixedRingBuffer allocation provenance (#27669)
Fix skip_batches logic for NaN (#27673)
Raise TypeError when calling next() directly on GroupBy objects (#27562)
Data type comparison for extension types (#27632)
Fix filter_scan_ir usize integer underflow (#27633)
Share last-morsel split budget across files in streaming multi-scan (#27630)
Reset the sort-options in Series::is_sorted() after row-encoding columns (#27614)
Rayon deadlock with re-entrant io sources (#27600)
Don't push negative-offset slices through HConcat (#27570)
Logic error in streaming is_empty (#27602)
Fix incorrect CSE with large is_in literal (#27575)
AnonymousFunction can qualify as SQL aggregator (#26986)
Fix CSPE panic in cloud (#27594)
Set merge-join streaming node to Finished if its sending port is Done (#27572)
Widen decimal precision on sum aggregation at runtime (#27579)
Fix str.to_time was raising unnecessarily when input was all nulls (#27574)
Prevent panic when switching from one extension dtype to another (#27566)
Ensure json_decode doesn't fail for Date and Time string deserialization (#27554)
Incorrect RUSTFLAGS passing in Makefile (#27555)
Avoid panic on open-ended slice (#27550)
Fix panic on reading IPC with 0-row compressed bitmap (#27551)
Set HEAD_RESPONSE_SIZE_ESTIMATE to 0 (#27548)
Fix lazy concat horizontal didn't raise on mismatching heights after projection pushdown (#27506)
Prevent join panic when suffix="" and coalesce=True (#27376)
Do not make a FastCount for csv if pre_slice is set (#27536)
Support duplicate names in over (#27544)
Reassign sequence numbers when distributing input morsels in streaming AsOf join node (#27538)
Do not reverse dataframes when sorting with all-null key columns (#27517)
Incorrect length check on streaming zip (#27505)
Respect nulls_last for descending over(order_by) in group_by().agg() (#27486)
Fix perf regression in scan_csv select(len()) when collected on streaming engine (#27504)
Harden extend strictness (#27476)
Prevent deadlock when using to_arrow() in a multithreaded context (#27472)
Rebalance deep merge_sorted chains (#27065)
Do not flatten sliced union (#27466)
Prevent deadlock when using to_pandas() in multithreaded context (#27451)
Struct rechunk bug and add Series::with_validity (#27446)
Handle column indexing in read_parquet/read_csv with pyarrow reader (#27397)
Export enum as ordered dictionary to arrow (#27432)
Ensure index column is sorted in streaming rolling aggs (#27234)
Ensure sample() respects shuffle=False (#27248)
Return empty DataFrame from concat_list with lit and empty column (#27305)
Read parquet MAP columns without LogicalType annotation (#27404)
Raise DuplicateError on parquet files with duplicate column names (#27399)
Honor having predicate in GroupBy iter (#27370)
Use the physical dtype for NumUnorderedImplodeReducer arrow ListArray (#27375)
Address bug in reduce_balanced for certain input length lists affecting pl.concat (#27352)
Ensure list.sample() allows fraction > 1 when with_replacement=True (#27350)
Ensure append() errors when upcast=False (#27346)
Always rechunk sorts, prune sorts even in eager execution (#27356)
Update groups to correct length for Implode (#27282)
Fix scan_csv missing_columns='insert' overwrote existing data with NULLs (#27297)
Raise on non-numeric inputs in pl.int_ranges (#27294)
Do not skip nulls when enumerating over rows in grouped AsOf join (#27275)
Fix pivot dropping data for null on values (#27273)
Resolve multiple files deadlock in CSV async reader (#27073)
Widen decimal precision on sum aggregation (#27270)
Correct lf.remote type (#27261)
Extend StructEval schema context in StackOptimizer (#27243)
Prevent panic when casting Array to extension type with same inner type (#27220)
Preserve nulls when casting from all-null Series to Struct (#27241)
Off-by-one in lp.with_inputs length assertion (#27209)
Fix scan_delta filter on empty dataframe (#27244)
Prevent DataFrame creation panic on list[struct] with heterogenous types (#27217)
Skip null group entries when collecting AsOf-by groups (#27215)
Fix panic with empty order_by in over expression (#27088)
Write field ID from sink_parquet (#27196)
Fix statistics for Null columns in Parquet (#27021)
Do not prune sort nodes containing slice with dyn predicate (#27140)
Correct grouped Binary arg_min/arg_max and String single-element arg indices (#27172)
Fix scalar handling in str.replace during streaming (#27182)
Resolve multiple files deadlock in NDJSON async reader (#27204)
Overflow panic in interpolate nearest (#27205)
Using checked arithmetic in int96_to_i64_ns to prevent overflow panic (#27129)
Don't trigger csv fast count if predicate is pushed down (#27190)
Streaming sort by-expressions were lowered incorrectly (#27158)
Reset IO metrics instead of consuming (#27156)
Output SVG if output_path ends with '.svg' in show_graph (#27144)
Skip extension types for min/max in describe (#27120)
Fix incorrect IO metrics on multi-phase streaming execution (#27123)
Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
Make the files used in docs available locally (#27121)
Apply scalar bound in clip when the Series bound contains nulls (#27087)
Ignore ddof parameter in rolling_corr and deprecate (#27104)
Preserve casts for horizontal ops with untyped literals (#27011)
Reject invalid input to sql_expr (#27084)
Ensure SQL COUNT(<lit>) expressions return the correct value (#27085)
Regression in replace_strict for enums (#27066)
Make test_group_by_arg_max_boolean_26978 non-flaky for max_by ties (#27048)
Null count for aggregated list inside count aggregation (#27032)
Panic in streaming MergeSortedNode (#27024)
Prevent panic in transpose() with mixed List and non-List columns (#27038)
Set sorted flag for Boolean and Time (#27035)
Missing src/ subdirectory to CI Python docs step (#27025)
Resolve stack overflow on merge_sorted and union (#27018)
Make pl.DataFrame.fill_null work on columns with Null dtype (#27020)
Fix initial MutableBooleanArray::extend_constant(count, None) calls (#26813)
Fix repeated word typos in comments (#26917)
Covariance with constant is zero, not NaN (#27015)
Don't remove set_sorted in projection pushdown (#27006)
Infer nulls when df create from empty-struct (#26991)
Correct suggestion in multi-expr filter error (#27003)
Implement agg_arg_min/agg_arg_max for boolean data type (#26997)
Raise error instead of panic for unsupported pivot aggregate (#26863)
Validate fraction is between 0.0 and 1.0 in list.sample (#26964)
Informative error for multi-quantile in group_by (#26957)
Raise for duplicate columns in over() (#26968)
Preserve height when unnesting empty struct columns (#26947)
Support Decimal32/64 in scan_parquet (#26941)
Follow-up on streaming range-join PR (#26944)
Fix ColumnNotFound due to projection between filter/cache in CSPE (#26946)
Fix panic on upsample() with group_by parameter on empty DataFrame (#26936)
Fix the loop bounds in BitmapBuilder::extend_each_repeated_from_slice_unchecked (#26928)
Default engine as streaming for collect_batches (#26932)
Set stricter maintain_order in test_schema_row_index_cse (#26931)
Fix error passing Series of dates to business functions (#26927)
Propagate null in min_by / max_by for all-null by groups (#26919)
Fix panic on lazy concat->filter->slice with CSPE (#26907)
Handle empty rolling windows in streaming engine (#26903)
Prevent Boolean arithmetic with integer literals producing Unknown type in streaming engine (#26878)
Fix sink to partitioned S3 from Windows corrupted slashes (#26889)
Remove outdated warning about List columns in unique() (#26295) (#26890)
Restore pyarrow predicate conversion for is_in (#26811)
Release GIL before df.to_ndarray() to avoid deadlock (#26832)
Fix panic on CSV count_rows with FORCE_ASYNC (#26883)
Add scalar comparisons for UInt128 series (#26886)
Fix shape error not raised for 0 width inputs with non-0 height for streaming horizontal concat (#26877)
Fix streaming zip-broadcast node did not raise shape mismatch on empty recv from ready port (#26871)
Fix incorrect output list.eval with scalar expr, fix panic on list.agg with nulls (#26868)
Incorrect arg_sort with descending+limit (#26839)
Raise error in .collect_schema() when arr.get() is out-of-bounds (#26866)
Return ComputeError instead of panicking in map_groups UDF (#26665)
Issue PerformanceWarning in LazyFrame.__contains__ (#26734)
Segfault in JoinExec on deep plan (#26796)
Fix unary expressions on literal in over context (#26827)
Fix {min,max}_by in streaming engine for Boolean full {min,max} value column (#26848)
Fix debug panic on clip with nan bound (#26854)
Support grouped {arg_,}_{min,max} for Categoricals (#26856)
Throw an error if a string is passed to LazyFrame.pivot on_columns (#26852)
Preserve input float precision in rolling_cov() and rolling_corr() with mixed input types (#26820)
Preserve row count when converting zero-column DataFrame via arrow PyCapsule interface (#26835)
Prevent infinite recursion in streaming group_by fallback (#26801)
Use RowEncodingContext::Struct when determining D::Struct encoded item len (#26817)
Incorrectly applied CSE on different map_batches functions (#26822)
Fix duplicated query execution on todo panic when combining collect(engine='streaming') with POLARS_AUTO_NEW_STREAMING (#26792)
Prevent predicate pushdown across Sort with baked-in slice (#26804)
Fix panic on lazy sink_parquet created in pipe_with_schema (#26784)
Support {column_name} and {index} placeholders in pl.format string (#26771)
Do not use merge-join if nulls_last is unknown (#26778)
Normalize float zeros in Parquet column statistics (#26776)
Fix out-of-bounds for positive offset in windowed rolling (#26724)
Raise error when .get() is out-of-bounds in group by context (#26752)
Boolean bitwise_xor aggregation inverted when column contains nulls (#26749)
Parameter nulls_last was ignored in over (#26718)
Allow missing time in inexact strptime (#26714)
Return NaN when using corr() with a literal and expr (#26697)
Allow strict horizontal concat with empty df (#26345)
Fix PoisonError panic caused by reentrant usage of file cache (#26627)
Return null for int values exceeding 128-bit range with strict=False (#26674)
Incorrect boolean min/max with nulls (#26671)
Slice-slice pushdown for n_rows (#26673)
Resolve panic in Enum struct slicing (#26643)
Fix CSPE for group_by.map_groups (#26640)
Remove non-existent parameter from SQLContext typing overloads (#26658)
Replace panic with error when sorting object dtype columns (#26601)
Fix to_pandas() on empty enum Series did not preserve enum dictionary (#26610)
Rounding behaviour for f32 values with "HalfAwayFromZero" mode (#26624)
Correct arg_(min|max) for scalar columns (#26609)
Use monkeypatch.chdir in test_sink_path_slicing_utf8_boundaries_26324 (#26616)
Materialize unknown scalar int/float literals in collect_dtype() (#26595)
Return error when by= is nested type in min_by / max_by (#26593)
Fix assert_frame_not_equal() did not raise on dtype mismatch (#26590)
Respect SQL semantics for cumulative functions mapped via OVER clause (#26570)
Fix incorrect multiplexer output ordering on source token stop request (#26561)
Fix PyIceberg filter on boolean column (#26550)
Fix *_range exprs incorrectly marked as row separable (#26549)
Set dictionary_page_offset when dictionary encoding is used and point data_page_offset to the first data page (#26542)
Prevent GPU engine panic on SinkMultiple nodes (#26537)
Move query parameters to request body when retrieving Unity Catalog temporary credentials (#26539)
Implement PhysicalExpr for MinBy/MaxBy nodes (#26506)
Refactor row-encoding logic in IR join lowering into separate function (#26512)
Correctly check for path extensions (#26513)
Change AsOf join to be based on TotalOrd (#26497)
Correctly raise error on failing nested strict casts (#26499)
Prevent invalid type casts in replace_strict() (#26453)
Return null when dividing literals by 0 (#26343)

📖 Documentation

Bump to patched version (#27851)
Replace Typeform sign-up URL with new enterprise link (#27838)
Correct wrong head call (#27848)
Add Polars On-Prem 0.5.0 release (#27849)
Correct onprem license helm values (#27847)
Update connecting Polars Cloud to AWS documentation (#27823)
Broken link to AI Policy corrected (#27793)
Add release dates to the On-Prem releases page (#27787)
Improve on-prem docs (#27788)
Add query profiler video to On-Prem user guide (#27786)
Add EKS/AKS/GKE guides (#27774)
Sync from Polars Cloud (#27751)
Document Expr.list.__getitem__ (#27689)
Add cloudpickle requirement (#27703)
Clarify from_arrow schema ordering (#27493)
Clarify schema column order (#27681)
Update DataFrame construction docs for Column (#27541)
Document all valid engine options on LazyFrame collect/sink/explain methods (#27374)
Drop redundant Pattern 2 from Dagster integration page (#27581)
Update to remove Dockerhub PAT references (#27582)
Modernize Dagster integration example for Polars Cloud (#27560)
Use Polars random seed in sample example (#27537)
Make expressions operations RNG deterministic (#27494)
Document struct field order (#27492)
Add See Also sections for datetime docstrings (#27316)
Polars On-Prem release (#27439)
Rename to Polars On-Prem (#27435)
Split out openlineage docs into guide and configuration (#27371)
Add explanation on the observatory sqlite db file (#27354)
Add documentation for openlineage on-premises (#27334)
Release page (#27335)
Update uv pip install polars-on-premises cmd (#27330)
Fix outdated LazyGroupBy.map_groups docstring (#27292)
Add deny_anonymous_users to scheduler config (#27287)
Slurm documentation (#27259)
Add link to concepts in index.md (#27077)
Add docs entry for merge_sorted (#27224)
Fix typo (#27212)
Make the files used in docs available locally (#27121)
Put first-time contribution requirements in its own linkable section (#27113)
Change Polars Cloud API to 0.6.0 (#27005)
Query Profiler addition to User Guide (#26623)
Add documentation for on_columns for LazyFrame pivot (#26859)
Mention ComputeContexts create ephemeral environments by default and hint at re-use (#26692)
Remove confusing join validation note (#26795)
Fix broken AI policy link (#26728)
Create Polars Cloud Glossary (#26690)
Additional SQL documentation (#26662)
Include invalidate_caches in bisect instructions (#26641)
Add git bisect guide to contributing docs (#26634)
Updated Airflow orchestration documentation (#26585)
Improve SQL docs for EXTRACT and DATE_PART functions (#26575)
Remove reference to MutableStructArray in module doc (#26557)
Fix docstring for bitwise_count_zeros method (#26519)
Add get() for binary Series (#26514)

📦 Build system

Also split debug info in debug-release (#27609)
Use split-debuginfo on linux (#27608)
Bump deltalake to 1.5.1 in CI (#27387)
Really do not install pyiceberg-core 0.9.0 (#27017)
Bump up numpy and pyo3 to 0.28 (#26743)

🛠️ Other improvements

Add statistics to spill contexts (#27859)
Include license file in polars-ooc crate (#27864)
Changes needed for Rust 0.54.x (#27853)
Use Vec instead of PlHashMap for ProjectionInfo.map (#27856)
Reduce codegen-units (#27835)
Deduplicate thrift field-walk loops (#27790)
Harden against async blocking deadlocks (take 2) (#27767)
Added jlumbroso/free-disk-space cleaning action where relevant (#27769)
Update runtime edition to 2024 (#27746)
Remove redundant DSL::AGG::Unique (#27718)
Harden against async blocking deadlocks (#27653)
Print Python traceback when POLARS_TIMEOUT_MS is exceeded (#27657)
Remove last global static mut (#27704)
Remove unused equal_element code (#27701)
Remove unused suspect AsRef impl (#27699)
Remove Box<dyn Iterator> IntoIterator for ChunkedArray (#27697)
Remove trailing semicolons in fmt macros (#27705)
Add dynamic slice to unoptimized dispatch (#27693)
Format missed in previous PR (#27700)
Bump pytest and remove codspeed (#27686)
Store record batch row counts custom polars IPC metadata field (#27549)
Remove client-side allow_local_scans option for prepare_cloud_plan (#27663)
Remove superfluous test (#27676)
Cleanup streaming flags (#27671)
Expose unordered concatenation in python visitor (#27666)
Bump deltalake and fix CI (#27660)
Add impl IntoAExprBuilder for ExprIR (#27656)
Update object_store patch repo (#27650)
Bump up thiserror (#27648)
Move async executor and primitives to polars-async (#27629)
Add ImageVersion to rust-cache key (#27626)
Rename POOL to RAYON (#27606)
Use first_non_null for strptime infer (#27577)
Add arg mapper to unoptimized dispatch (#27599)
Fix is_empty test (#27597)
Fix tz type difference pandas assert, take 2 (#27596)
Fix CSPE panic in cloud (#27594)
Fix tz type difference pandas assert (#27593)
Add contributing note about conventional comments (#27543)
Add AnonymousColumnsUdf to UnoptimizedOperation (#27513)
Move Quantile to FunctionIRExpr (#27498)
Nested common subplan elimination (#27340)
Remove old projection pushdown code (#27499)
Refactored projection pushdown with cache handling (#27422)
Refactor CSPE (#27425)
Deduplicate interns (#27470)
Fix merge conflict in ColumnarFunction (#27464)
Schema per port for PhysNode (#27302)
Keep the schema ordered in scan projection pushdown (#27429)
Remove redundant PhysNodeKind::AsOfJoin::{left_right}_by fields (#27400)
Bump apache-avro version (#27419)
Bump rustls-webpki (#27382)
Disable debug symbols in macos coverage tests (#27361)
Cargo deny (#27363)
Add generic tree traversal with edge value propagation (#27249)
Bump Python Polars version (#27315)
Utility for identifying expr projection heights (#27198)
Sink DSL and callback for Iceberg (#27258)
Wait for morsel consumption in merge_sorted streaming node (#27288)
Mark scan_ipc cache arguments as deprecated (#27216)
Consolidate reordered compare functions (#27229)
Add zip_eq to itertools (#27210)
Remove unused attributes (#27191)
Avoid unnecessary recompilation due to changing env vars (#27166)
Update nightly Rust compiler version (#27145)
Simplify pyarrow scan and process in batches (#26982)
Make internal typing more precise (part ii) (#27117)
Remove unused expression sorts (#27075)
Add memory usage tracking to global allocator (#27103)
Add sinked paths callback (#26995)
Pin maturin due to compile time regression (#27062)
Missing src/ subdirectory to CI Python docs step (#27025)
Really do not install pyiceberg-core 0.9.0 (#27017)
Naming for named scopes (#26999)
Enable hypothesis tests when POLARS_AUTO_NEW_STREAMING=1 (#26818)
Fix CI by excluding missing wheel version of pyiceberg (#27001)
Replace clippy::never_loop with break on named scopes (#26983)
Remove indirection in calling python scans (#26981)
Polars versions (#26980)
Polars version (#26971)
Set stricter maintain_order in test_schema_row_index_cse (#26931)
Bump build deps used in ARM64 Windows release pipeline (#26892)
Use large linux-arm runner for release (#26898)
Ensure .gitignore and .typos.toml exclude "_polars_runtime*" directories (#26842)
Additional IR slice pushdown after filter pushdown (#26815)
Add private _expand_paths scan function (#26798)
Change Expr sortedness container to AExprSorted and add nulls_last to PyExpr.set_sorted() (#26781)
Move stop_and_buffer_pipe_contents into joins/utils.rs (#26810)
Replace iejoin is_supported_type macro with a closure in predicate_pushdown/join.rs (#26812)
Fix first-time contributor auto-label (#26794)
Move Series arrow export code from into.rs to arrow_export (#26775)
Automatically add first-contribution label (#26780)
Make contributing policy more strict (#26772)
Add unused argument warning to ruff rules (#26720)
Move shared streaming CSV/NDJSON code into shared mod (#26742)
Undo pub removal of to_dyn_object_store (#26722)
Remove unused proptest.rs DataFrame file (#26676)
Add test for predicate before join (#26705)
Fix file cache debug assertion failure (#26695)
Put physical_plan join formatting code into a separate function (#26691)
Remove PlanCallback from sql (#26686)
Add dtype visitor (#26628)
Bump Rust nightly compiler version (#26379)
Remove unused problematic ArrayFromIter (#26639)
Move more boolean code to polars_compute, reusing kernels (#26636)
Move ? to assignment site and use extend() in StructEvalExpr (#26635)
Cleanup assert_schema_equal (#26596)
Replace some env var reading by polars-config (#26607)
Use monkeypatch.chdir in test_sink_path_slicing_utf8_boundaries_26324 (#26616)
Remove string allocation from polars_err!(Variant: "str") (#26579)
Add wrapper for clippy so it continues on warnings (#26527)
Add Buffer::split_at / Buffer::split_off (#26583)
Use LazyFrame.clear to clear sql (#26562)
Update docs (#26560)
Add backtrace coloring (#26544)
Evaluate sql process_except_intersect during IR (#26516)
Reformat LICENSE (#26532)
Add a pipeline in which we test with POLARS_IDEAL_MORSEL_SIZE=4 (#26420)
Remove test_file and have tests create test.parquet in tmp_path (#26525)
Refactor row-encoding logic in IR join lowering into separate function (#26512)
Fix mypy pyiceberg expression errors (#26523)
Make nix flake mostly work (#26517)
Switch to custom cloud writer with IO sink metrics (#26494)
Remove Default on DataType (#26511)
Propagate object-store error information (#26406)
Have parameterized series rechunk() if not allow_chunks (#26504)
Remove dead code (RevMapping) (#26508)
Rename Arena get_many_mut to get_disjoint_mut (#26491)

Thank you to all our contributors for making this release possible!
@0guban0v, @0xRozier, @BJohnBraddock, @BitWeaverDev, @ButteryPaws, @EndPositive, @HCYT, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @NathanHu725, @NedJWestern, @NeejWeej, @NicoOhR, @RedZapdos123, @RenzoMXD, @Shoeboxam, @SuryaSunil1326, @TNieuwdorp, @Voultapher, @WaffleLapkin, @abhidotsh, @abishop1990, @alexander-beedie, @andyjessen, @ankane, @aryansri05, @ashler-herrick, @azimafroozeh, @borchero, @boris324, @cBournhonesque, @carnarez, @coastalwhite, @daizutabi, @debnathshoham, @dependabot[bot], @dpinol, @dsprenkels, @dydev012, @erandagan, @etiennebacher, @farouk-01, @florianvazelle, @gab23r, @gautamvarmadatla, @henryharbeck, @hutch3232, @ilya-pevzner, @itamarst, @jberg5, @joaquinhuigomez, @johalnes, @jonasdedden, @jonathanchang31, @jonathansergio, @jorenham, @junnythemarksman, @kanenorman, @kdn36, @leudz, @lukas-reining, @lun3x, @moktamd, @mqqz, @mroeschke, @mzjp2, @nameexhaustion, @nicholaslegrand102, @ohmdelta, @orlp, @pablogsal, @pragun-ananda, @qxzcode, @ritchie46, @spock-yh, @stakeswky, @tmimmanuel, @tolleybot, @toreerdmann, @toroleapinc, @uurl, @veeceey, @waamm, @wence-, @wmoss, @xenzh, @xronocode, @yangsong97, @yonatan-genai, @yuuuxt and dependabot[bot]

pola-rs/polars rs-0.54.4 Rust Polars 0.54.4 on GitHub