github pola-rs/polars rs-0.54.4
Rust Polars 0.54.4

5 hours ago

🏆 Highlights

  • Add LazyFrame.gather (#27501)
  • Nested common subplan elimination (#27340)
  • Stabilize streaming engine (#27497)
  • Speed up parquet metadata decode with hand-written Thrift (#27427)
  • Add streaming support for grouped AsOf join (#27293)

🚀 Performance improvements

  • Eliminate filters with contradictory predicates (#27775)
  • Update to new jemalloc (#27797)
  • Do not materialize ScalarColumn in Column split_at (#27782)
  • Avoid materializing broadcast in array.shift (#27740)
  • Avoid materializing broadcast list in list.sample(n) and list.sample(frac) (#27679)
  • Adaptive size dispatch to hashset or radix sort + capacity-aware reset in agg_n_unique (#27719)
  • Dispatch {list,arr}.{unique,n_unique,reverse} to group_by engine (#27278)
  • Improve in-memory grouped non-null count (#27702)
  • Factor shared conjuncts out of OR-of-ANDs predicates (#27627)
  • Skip downloading IPC batches exceeding slice bounds (#27683)
  • Faster Series::is_sorted for logical / non-primitive types (#27567)
  • Avoid materializing broadcast list in list.shift (#27628)
  • Optimise json_decode Datetime string parsing (#27559)
  • Speed up to_numpy C-order via cache-blocked transpose (#27522)
  • Optimize select(len()) for non-strict horizontal concat (#27516)
  • Pushdown slices to inputs on left/right/full join (#27508)
  • Don't infer CSV schema if schema is set (#27507)
  • Nested common subplan elimination (#27340)
  • Make is_in row-group pruning precise on null-containing haystacks (#27495)
  • Don't do fused-multiply-add on scalars (#27479)
  • List full fast path (#27477)
  • Make is_in row-group pruning precise on multi-value lists (#27475)
  • Add streaming GatherNode (#27465)
  • Lower non-elementwise FunctionExprIR to ColumnarFunctionNode (#27462)
  • Speed up parquet metadata decode with hand-written Thrift (#27427)
  • Skip validity mask processing in __array_ufunc__ when no inputs have nulls (#27358)
  • Create IR slice from expr slice pushdown (#27200)
  • Add streaming support for grouped AsOf join (#27293)
  • Avoid unnecessary rechunk when sorting already sorted DataFrame (#27264)
  • Lower basic over() to streaming primitives (#27303)
  • Lower drop_{nulls,nans} in streaming group_by aggregations (#27296)
  • Lower entropy to streaming reductions (#27174)
  • Add native streaming interpolate (#27185)
  • Streaming strptime with format=None (#27056)
  • Lower skew / kurtosis to streaming aggregations (#27176)
  • Post apply pyarrow filter in Polars' engine instead of pyarrow (#27192)
  • Optimize drop_nulls().{first,last}() to {first,last}(ignore_nulls=True) (#27187)
  • Always process pyarrow scan in batches (#27183)
  • Make cut output Enum and mark as elementwise (#27173)
  • Remove unused expression sorts (#27075)
  • Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
  • Take into account size per row in join sampling (#27098)
  • Streaming is_first_distinct and unique(maintain_order=True) (#27052)
  • Streaming cov and corr (#27008)
  • Add sorted unique node to streaming engine (#26990)
  • Ensure Expr.append is lowered in streaming engine (#27022)
  • Collapse consecutive Sort nodes (#26965)
  • Drop maintain_order=True requirement in sink_delta (#27007)
  • Lower index_of to streaming engine (#26923)
  • Streaming native backward_fill (#26967)
  • Native streaming forward_fill (#26922)
  • Drop unused filter column above cache (#26955)
  • Optimize .replace() from a single value (#26948)
  • Add a streaming range-join (#26790)
  • Lower arg_{min,max} to streaming engine (#26845)
  • Additional IR slice pushdown after filter pushdown (#26815)
  • Streaming first/last on Enum through physical (#26783)
  • Fast filter for scalar predicates (#26745)
  • Allow SimpleProjection in streaming engine to rename (#26709)
  • Streaming cloud download for scan_csv (#26637)
  • Drop columns only needed for predicates after the predicate is applied (#26703)
  • Run projection pushdown after predicate pushdown (#26688)
  • Comparison literal downcasting (#26663)
  • Add dynamic predicates for TopK (#26495)
  • Increase minimum default parquet row group prefetch to 8 (#26632)
  • Partial predicate conversion to PyArrow (#26567)
  • Streaming cloud download for scan_ndjson / scan_lines (#26563)
  • Grab GIL fewer times during Object join materialization (#26587)
  • Improve CSV and NDJSON cloud sink performance (#26545)
  • Tune cloud writer performance (#26518)
  • Allow parallel InMemorySinks in streaming engine (#26501)
  • Add streaming AsOf join node (#26398)

✨ Enhancements

  • Expose fixed-size rolling window expressions in Python visitor (#27108)
  • Expose IR::Scan hive parts in the python node visitor (#27829)
  • Expose IRFunctionExpr::DynamicPred in the python visitor (#27616)
  • Fix SchemaError using lazy HConcat->Sink (#27770)
  • Add pinning and queuing logic to polars-ooc (#27791)
  • Add tiered multi-file parquet metadata resolver (#27720)
  • Cache and shuffle DNS for cloud object_store (#27659)
  • Update to new jemalloc (#27797)
  • Allow deeper expressions (#27768)
  • Add is_inherently_nondeterministic helper for AExpr (#27687)
  • Use true division for the / operator in Polars SQL (#27391)
  • Add Rust backend for Expr.has_nulls (#27590)
  • Add block_in_place to Polars' async executor (#27612)
  • Stabilize float16 (#27607)
  • Add Expr.is_empty (#27583)
  • Add support for the SQL FILTER clause for aggregate functions, and STRING_AGG (#27564)
  • Make parquet FileMetadata prunable for IR-plan dispatch (#27535)
  • Broadcast scalar input for list.slice (#27487)
  • Add LazyFrame.gather (#27501)
  • Add null_on_oob in {Expr/Series}.gather (#27327)
  • Stabilize streaming engine (#27497)
  • Process batched arr.eval on overflow boundaries (#27496)
  • Process batched list.eval on overflow boundaries (#27483)
  • Print SLICED UNION in LazyFrame explain (#27467)
  • Cargo deny (#27363)
  • Add maintain_order parameter to merge_sorted (#27263)
  • Add ignore_nulls to {list,arr}.{any,all} (#27186)
  • Lock-free memory manager with spill-to-disk and fully OOC multiplexer (#26774)
  • Add is_unique to list/array dtypes (#27290)
  • Add pl.merge_sorted operating on multiple frames (#27014)
  • Add fast_alloc feature flag, remove default_alloc (#27206)
  • Add a GPU slot to OptFlags so we can control CSE (similar to streaming) (#27026)
  • Allow group_by() without key exprs (#27141)
  • Collapse consecutive Sort nodes (#26965)
  • Use UUIDv7 for sink_iceberg directory name generation (#26958)
  • Truncate large binary/utf8 Parquet statistics values (#26764)
  • Error if PartitionBy path provider returns absolute path that does not begin with base path, or contains '..' (#26894)
  • Support Delta deletion vectors in scan_delta (#26867)
  • Support Decimal32/64 in scan_parquet (#26941)
  • Support casting Duration to String in ISO 8601 format (#26860)
  • Add a streaming range-join (#26790)
  • Support Expr for holidays in business day calculations (#26193)
  • Parameter for pivot to always include value column name (#26730)
  • Raise error in .collect_schema() when arr.get() is out-of-bounds (#26866)
  • Extend Expr.reinterpret to all numeric types of the same size (#26401)
  • Add missing_columns parameter to scan_csv (#26787)
  • Clear no-op scan projections (#26858)
  • Support nested datatypes for {min,max}_by (#26849)
  • Support SQL ARRAY init from typed literals (#26622)
  • Accept table identifier string in scan_iceberg() (#26826)
  • Add a convenience make fresh command to the Makefile (#26809)
  • Add unstable LazyFrame.sink_iceberg (#26799)
  • Add maintain order argument on implode (#26782)
  • Implement predicate pushdown for aliased groupby keys (#26597)
  • Speed up casting primitive to bool by at least 2x (#26823)
  • Enable rowgroup skipping for float columns (#26805)
  • Add expression context to errors (#26716)
  • Add Decimal support for product reduction (#26725)
  • Support all Iceberg V2 arrow types in sink_parquet arrow_schema parameter (#26669)
  • Re-work behavior of arrow_schema parameter on sink_parquet (#26621)
  • Add contains_dtype() method for Schema (#26661)
  • Implement truncate as a "to_zero" rounding mode (#26677)
  • Expose AExpr::Rolling in the python visitor (#26715)
  • More generic streaming GroupBy lowering (#26696)
  • Add basic MemoryManager to track buffered dataframes for out-of-core support later (#26443)
  • Add truncate Expression for numeric values (#26666)
  • Better error messages for hex literal conversion issues in the SQL interface (#26657)
  • Add SQL support for LPAD and RPAD string functions (#26631)
  • Support SQL "FROM-first" SELECT query syntax (#26598)
  • Speed up any() and all() for nulls (#26615)
  • Bump Chrono to 0.4.24, enabling stricter parsing of %.3f/%.6f/%.9f specifiers (#26075)
  • Expose unstable assert_schema_equal in py-polars (#24869)
  • Allow parsing of compact ISO 8601 strings (#24629)
  • Streaming cloud download for scan_ndjson / scan_lines (#26563)
  • Configuration to cast integers to floats in cast_options for scan_parquet (#26492)
  • Add escaping to quotes and newlines when reading JSON object into string (#26578)
  • Standardise on RFC-5545 when doing datetime arithmetic on timezone-aware datetimes (#26425)
  • Support sas_token in Azure credential provider (#26565)
  • Expose HConcat options in the python node visitor (#26551)
  • Relax SQL requirement for derived tables and subqueries to have aliases (#26543)
  • Add polars-config and pl.Config.reload_env_vars() (#26524)
  • Record path for object store error raised from sinks (#26541)
  • Use CRC64NVME for checksum in aws sinks (#26522)
  • Add get() for binary Series (#26514)
  • Add streaming AsOf join node (#26398)

🐞 Bug fixes

  • Fix skip_batches not handling negation of bool dtype with None values (#27452)
  • Use block_in_place_on for calls which can come from executor thread (#27855)
  • Mismatch in max_threads -> pipeline configuration (#27854)
  • Keep maintain_order on sliced unique (#27852)
  • Fix SchemaError using lazy HConcat->Sink (#27770)
  • Fix incorrect projection height when selecting only literals (#27825)
  • Fix rolling aggregations with window_size=0 (#27812)
  • Select with expr slice and len gave incorrect len (#27824)
  • Prevent import panic when environment variable set to unexpected value (#27831)
  • Broken link to AI Policy corrected (#27793)
  • Update to new jemalloc (#27797)
  • Swap PlHashMap for PlIndexMap to make Multiplexer insertion order stable (#27785)
  • Compare length for inline slice as usize (#27779)
  • Raise length mismatch in multiple sort_by in group_by (#27772)
  • Respect min_samples for rolling_by ops with nulls (#27706)
  • Fix memory usage regression affecting TPCH Q22 (#27758)
  • Add POLARS_ALLOW_NESTED_CSPE env var and make nested CSPE opt-in (#27765)
  • Post-apply residual pyarrow predicates (#27764)
  • Fix loss of precision for smaller floating types(#27662) (#27732)
  • Filter at scan dropped in CSPE filter pushdown (#27763)
  • Fix portstate assertion error on is_in (#27757)
  • Fix incorrect when/then after forward fill / reverse in groupby (#27745)
  • Accept empty Thrift list encoded as bare 0x00 byte in parquet metadata (#27754)
  • Stabilize object store credentialprovider cache key (#27712)
  • Panic in scan of empty IPC with slice (#27708)
  • Persist object_store rebuild state in cache (#27707)
  • Sort flag on GroupsType only applies to first element (#27684)
  • Invalid unwrap_unchecked when length isn't exact (#27685)
  • Logic error in async executor block_in_place (#27698)
  • Don't unwrap channel send in streaming join_asof (#27688)
  • Fix merge_sorted panic when List in frame (#27568)
  • Put AsOf join buffered Morsels back the front of the deque if we cannot process them rn (#27658)
  • Fix FixedRingBuffer allocation provenance (#27669)
  • Fix skip_batches logic for NaN (#27673)
  • Raise TypeError when calling next() directly on GroupBy objects (#27562)
  • Data type comparison for extension types (#27632)
  • Fix filter_scan_ir usize integer underflow (#27633)
  • Share last-morsel split budget across files in streaming multi-scan (#27630)
  • Reset the sort-options in Series::is_sorted() after row-encoding columns (#27614)
  • Rayon deadlock with re-entrant io sources (#27600)
  • Don't push negative-offset slices through HConcat (#27570)
  • Logic error in streaming is_empty (#27602)
  • Fix incorrect CSE with large is_in literal (#27575)
  • AnonymousFunction can qualify as SQL aggregator (#26986)
  • Fix CSPE panic in cloud (#27594)
  • Set merge-join streaming node to Finished if its sending port is Done (#27572)
  • Widen decimal precision on sum aggregation at runtime (#27579)
  • Fix str.to_time was raising unnecessarily when input was all nulls (#27574)
  • Prevent panic when switching from one extension dtype to another (#27566)
  • Ensure json_decode doesn't fail for Date and Time string deserialization (#27554)
  • Incorrect RUSTFLAGS passing in Makefile (#27555)
  • Avoid panic on open-ended slice (#27550)
  • Fix panic on reading IPC with 0-row compressed bitmap (#27551)
  • Set HEAD_RESPONSE_SIZE_ESTIMATE to 0 (#27548)
  • Fix lazy concat horizontal didn't raise on mismatching heights after projection pushdown (#27506)
  • Prevent join panic when suffix="" and coalesce=True (#27376)
  • Do not make a FastCount for csv if pre_slice is set (#27536)
  • Support duplicate names in over (#27544)
  • Reassign sequence numbers when distributing input morsels in streaming AsOf join node (#27538)
  • Do not reverse dataframes when sorting with all-null key columns (#27517)
  • Incorrect length check on streaming zip (#27505)
  • Respect nulls_last for descending over(order_by) in group_by().agg() (#27486)
  • Fix perf regression in scan_csv select(len()) when collected on streaming engine (#27504)
  • Harden extend strictness (#27476)
  • Prevent deadlock when using to_arrow() in a multithreaded context (#27472)
  • Rebalance deep merge_sorted chains (#27065)
  • Do not flatten sliced union (#27466)
  • Prevent deadlock when using to_pandas() in multithreaded context (#27451)
  • Struct rechunk bug and add Series::with_validity (#27446)
  • Handle column indexing in read_parquet/read_csv with pyarrow reader (#27397)
  • Export enum as ordered dictionary to arrow (#27432)
  • Ensure index column is sorted in streaming rolling aggs (#27234)
  • Ensure sample() respects shuffle=False (#27248)
  • Return empty DataFrame from concat_list with lit and empty column (#27305)
  • Read parquet MAP columns without LogicalType annotation (#27404)
  • Raise DuplicateError on parquet files with duplicate column names (#27399)
  • Honor having predicate in GroupBy iter (#27370)
  • Use the physical dtype for NumUnorderedImplodeReducer arrow ListArray (#27375)
  • Address bug in reduce_balanced for certain input length lists affecting pl.concat (#27352)
  • Ensure list.sample() allows fraction > 1 when with_replacement=True (#27350)
  • Ensure append() errors when upcast=False (#27346)
  • Always rechunk sorts, prune sorts even in eager execution (#27356)
  • Update groups to correct length for Implode (#27282)
  • Fix scan_csv missing_columns='insert' overwrote existing data with NULLs (#27297)
  • Raise on non-numeric inputs in pl.int_ranges (#27294)
  • Do not skip nulls when enumerating over rows in grouped AsOf join (#27275)
  • Fix pivot dropping data for null on values (#27273)
  • Resolve multiple files deadlock in CSV async reader (#27073)
  • Widen decimal precision on sum aggregation (#27270)
  • Correct lf.remote type (#27261)
  • Extend StructEval schema context in StackOptimizer (#27243)
  • Prevent panic when casting Array to extension type with same inner type (#27220)
  • Preserve nulls when casting from all-null Series to Struct (#27241)
  • Off-by-one in lp.with_inputs length assertion (#27209)
  • Fix scan_delta filter on empty dataframe (#27244)
  • Prevent DataFrame creation panic on list[struct] with heterogenous types (#27217)
  • Skip null group entries when collecting AsOf-by groups (#27215)
  • Fix panic with empty order_by in over expression (#27088)
  • Write field ID from sink_parquet (#27196)
  • Fix statistics for Null columns in Parquet (#27021)
  • Do not prune sort nodes containing slice with dyn predicate (#27140)
  • Correct grouped Binary arg_min/arg_max and String single-element arg indices (#27172)
  • Fix scalar handling in str.replace during streaming (#27182)
  • Resolve multiple files deadlock in NDJSON async reader (#27204)
  • Overflow panic in interpolate nearest (#27205)
  • Using checked arithmetic in int96_to_i64_ns to prevent overflow panic (#27129)
  • Don't trigger csv fast count if predicate is pushed down (#27190)
  • Streaming sort by-expressions were lowered incorrectly (#27158)
  • Reset IO metrics instead of consuming (#27156)
  • Output SVG if output_path ends with '.svg' in show_graph (#27144)
  • Skip extension types for min/max in describe (#27120)
  • Fix incorrect IO metrics on multi-phase streaming execution (#27123)
  • Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
  • Make the files used in docs available locally (#27121)
  • Apply scalar bound in clip when the Series bound contains nulls (#27087)
  • Ignore ddof parameter in rolling_corr and deprecate (#27104)
  • Preserve casts for horizontal ops with untyped literals (#27011)
  • Reject invalid input to sql_expr (#27084)
  • Ensure SQL COUNT(<lit>) expressions return the correct value (#27085)
  • Regression in replace_strict for enums (#27066)
  • Make test_group_by_arg_max_boolean_26978 non-flaky for max_by ties (#27048)
  • Null count for aggregated list inside count aggregation (#27032)
  • Panic in streaming MergeSortedNode (#27024)
  • Prevent panic in transpose() with mixed List and non-List columns (#27038)
  • Set sorted flag for Boolean and Time (#27035)
  • Missing src/ subdirectory to CI Python docs step (#27025)
  • Resolve stack overflow on merge_sorted and union (#27018)
  • Make pl.DataFrame.fill_null work on columns with Null dtype (#27020)
  • Fix initial MutableBooleanArray::extend_constant(count, None) calls (#26813)
  • Fix repeated word typos in comments (#26917)
  • Covariance with constant is zero, not NaN (#27015)
  • Don't remove set_sorted in projection pushdown (#27006)
  • Infer nulls when df create from empty-struct (#26991)
  • Correct suggestion in multi-expr filter error (#27003)
  • Implement agg_arg_min/agg_arg_max for boolean data type (#26997)
  • Raise error instead of panic for unsupported pivot aggregate (#26863)
  • Validate fraction is between 0.0 and 1.0 in list.sample (#26964)
  • Informative error for multi-quantile in group_by (#26957)
  • Raise for duplicate columns in over() (#26968)
  • Preserve height when unnesting empty struct columns (#26947)
  • Support Decimal32/64 in scan_parquet (#26941)
  • Follow-up on streaming range-join PR (#26944)
  • Fix ColumnNotFound due to projection between filter/cache in CSPE (#26946)
  • Fix panic on upsample() with group_by parameter on empty DataFrame (#26936)
  • Fix the loop bounds in BitmapBuilder::extend_each_repeated_from_slice_unchecked (#26928)
  • Default engine as streaming for collect_batches (#26932)
  • Set stricter maintain_order in test_schema_row_index_cse (#26931)
  • Fix error passing Series of dates to business functions (#26927)
  • Propagate null in min_by / max_by for all-null by groups (#26919)
  • Fix panic on lazy concat->filter->slice with CSPE (#26907)
  • Handle empty rolling windows in streaming engine (#26903)
  • Prevent Boolean arithmetic with integer literals producing Unknown type in streaming engine (#26878)
  • Fix sink to partitioned S3 from Windows corrupted slashes (#26889)
  • Remove outdated warning about List columns in unique() (#26295) (#26890)
  • Restore pyarrow predicate conversion for is_in (#26811)
  • Release GIL before df.to_ndarray() to avoid deadlock (#26832)
  • Fix panic on CSV count_rows with FORCE_ASYNC (#26883)
  • Add scalar comparisons for UInt128 series (#26886)
  • Fix shape error not raised for 0 width inputs with non-0 height for streaming horizontal concat (#26877)
  • Fix streaming zip-broadcast node did not raise shape mismatch on empty recv from ready port (#26871)
  • Fix incorrect output list.eval with scalar expr, fix panic on list.agg with nulls (#26868)
  • Incorrect arg_sort with descending+limit (#26839)
  • Raise error in .collect_schema() when arr.get() is out-of-bounds (#26866)
  • Return ComputeError instead of panicking in map_groups UDF (#26665)
  • Issue PerformanceWarning in LazyFrame.__contains__ (#26734)
  • Segfault in JoinExec on deep plan (#26796)
  • Fix unary expressions on literal in over context (#26827)
  • Fix {min,max}_by in streaming engine for Boolean full {min,max} value column (#26848)
  • Fix debug panic on clip with nan bound (#26854)
  • Support grouped {arg_,}_{min,max} for Categoricals (#26856)
  • Throw an error if a string is passed to LazyFrame.pivot on_columns (#26852)
  • Preserve input float precision in rolling_cov() and rolling_corr() with mixed input types (#26820)
  • Preserve row count when converting zero-column DataFrame via arrow PyCapsule interface (#26835)
  • Prevent infinite recursion in streaming group_by fallback (#26801)
  • Use RowEncodingContext::Struct when determining D::Struct encoded item len (#26817)
  • Incorrectly applied CSE on different map_batches functions (#26822)
  • Fix duplicated query execution on todo panic when combining collect(engine='streaming') with POLARS_AUTO_NEW_STREAMING (#26792)
  • Prevent predicate pushdown across Sort with baked-in slice (#26804)
  • Fix panic on lazy sink_parquet created in pipe_with_schema (#26784)
  • Support {column_name} and {index} placeholders in pl.format string (#26771)
  • Do not use merge-join if nulls_last is unknown (#26778)
  • Normalize float zeros in Parquet column statistics (#26776)
  • Fix out-of-bounds for positive offset in windowed rolling (#26724)
  • Raise error when .get() is out-of-bounds in group by context (#26752)
  • Boolean bitwise_xor aggregation inverted when column contains nulls (#26749)
  • Parameter nulls_last was ignored in over (#26718)
  • Allow missing time in inexact strptime (#26714)
  • Return NaN when using corr() with a literal and expr (#26697)
  • Allow strict horizontal concat with empty df (#26345)
  • Fix PoisonError panic caused by reentrant usage of file cache (#26627)
  • Return null for int values exceeding 128-bit range with strict=False (#26674)
  • Incorrect boolean min/max with nulls (#26671)
  • Slice-slice pushdown for n_rows (#26673)
  • Resolve panic in Enum struct slicing (#26643)
  • Fix CSPE for group_by.map_groups (#26640)
  • Remove non-existent parameter from SQLContext typing overloads (#26658)
  • Replace panic with error when sorting object dtype columns (#26601)
  • Fix to_pandas() on empty enum Series did not preserve enum dictionary (#26610)
  • Rounding behaviour for f32 values with "HalfAwayFromZero" mode (#26624)
  • Correct arg_(min|max) for scalar columns (#26609)
  • Use monkeypatch.chdir in test_sink_path_slicing_utf8_boundaries_26324 (#26616)
  • Materialize unknown scalar int/float literals in collect_dtype() (#26595)
  • Return error when by= is nested type in min_by / max_by (#26593)
  • Fix assert_frame_not_equal() did not raise on dtype mismatch (#26590)
  • Respect SQL semantics for cumulative functions mapped via OVER clause (#26570)
  • Fix incorrect multiplexer output ordering on source token stop request (#26561)
  • Fix PyIceberg filter on boolean column (#26550)
  • Fix *_range exprs incorrectly marked as row separable (#26549)
  • Set dictionary_page_offset when dictionary encoding is used and point data_page_offset to the first data page (#26542)
  • Prevent GPU engine panic on SinkMultiple nodes (#26537)
  • Move query parameters to request body when retrieving Unity Catalog temporary credentials (#26539)
  • Implement PhysicalExpr for MinBy/MaxBy nodes (#26506)
  • Refactor row-encoding logic in IR join lowering into separate function (#26512)
  • Correctly check for path extensions (#26513)
  • Change AsOf join to be based on TotalOrd (#26497)
  • Correctly raise error on failing nested strict casts (#26499)
  • Prevent invalid type casts in replace_strict() (#26453)
  • Return null when dividing literals by 0 (#26343)

📖 Documentation

  • Bump to patched version (#27851)
  • Replace Typeform sign-up URL with new enterprise link (#27838)
  • Correct wrong head call (#27848)
  • Add Polars On-Prem 0.5.0 release (#27849)
  • Correct onprem license helm values (#27847)
  • Update connecting Polars Cloud to AWS documentation (#27823)
  • Broken link to AI Policy corrected (#27793)
  • Add release dates to the On-Prem releases page (#27787)
  • Improve on-prem docs (#27788)
  • Add query profiler video to On-Prem user guide (#27786)
  • Add EKS/AKS/GKE guides (#27774)
  • Sync from Polars Cloud (#27751)
  • Document Expr.list.__getitem__ (#27689)
  • Add cloudpickle requirement (#27703)
  • Clarify from_arrow schema ordering (#27493)
  • Clarify schema column order (#27681)
  • Update DataFrame construction docs for Column (#27541)
  • Document all valid engine options on LazyFrame collect/sink/explain methods (#27374)
  • Drop redundant Pattern 2 from Dagster integration page (#27581)
  • Update to remove Dockerhub PAT references (#27582)
  • Modernize Dagster integration example for Polars Cloud (#27560)
  • Use Polars random seed in sample example (#27537)
  • Make expressions operations RNG deterministic (#27494)
  • Document struct field order (#27492)
  • Add See Also sections for datetime docstrings (#27316)
  • Polars On-Prem release (#27439)
  • Rename to Polars On-Prem (#27435)
  • Split out openlineage docs into guide and configuration (#27371)
  • Add explanation on the observatory sqlite db file (#27354)
  • Add documentation for openlineage on-premises (#27334)
  • Release page (#27335)
  • Update uv pip install polars-on-premises cmd (#27330)
  • Fix outdated LazyGroupBy.map_groups docstring (#27292)
  • Add deny_anonymous_users to scheduler config (#27287)
  • Slurm documentation (#27259)
  • Add link to concepts in index.md (#27077)
  • Add docs entry for merge_sorted (#27224)
  • Fix typo (#27212)
  • Make the files used in docs available locally (#27121)
  • Put first-time contribution requirements in its own linkable section (#27113)
  • Change Polars Cloud API to 0.6.0 (#27005)
  • Query Profiler addition to User Guide (#26623)
  • Add documentation for on_columns for LazyFrame pivot (#26859)
  • Mention ComputeContexts create ephemeral environments by default and hint at re-use (#26692)
  • Remove confusing join validation note (#26795)
  • Fix broken AI policy link (#26728)
  • Create Polars Cloud Glossary (#26690)
  • Additional SQL documentation (#26662)
  • Include invalidate_caches in bisect instructions (#26641)
  • Add git bisect guide to contributing docs (#26634)
  • Updated Airflow orchestration documentation (#26585)
  • Improve SQL docs for EXTRACT and DATE_PART functions (#26575)
  • Remove reference to MutableStructArray in module doc (#26557)
  • Fix docstring for bitwise_count_zeros method (#26519)
  • Add get() for binary Series (#26514)

📦 Build system

  • Also split debug info in debug-release (#27609)
  • Use split-debuginfo on linux (#27608)
  • Bump deltalake to 1.5.1 in CI (#27387)
  • Really do not install pyiceberg-core 0.9.0 (#27017)
  • Bump up numpy and pyo3 to 0.28 (#26743)

🛠️ Other improvements

  • Add statistics to spill contexts (#27859)
  • Include license file in polars-ooc crate (#27864)
  • Changes needed for Rust 0.54.x (#27853)
  • Use Vec instead of PlHashMap for ProjectionInfo.map (#27856)
  • Reduce codegen-units (#27835)
  • Deduplicate thrift field-walk loops (#27790)
  • Harden against async blocking deadlocks (take 2) (#27767)
  • Added jlumbroso/free-disk-space cleaning action where relevant (#27769)
  • Update runtime edition to 2024 (#27746)
  • Remove redundant DSL::AGG::Unique (#27718)
  • Harden against async blocking deadlocks (#27653)
  • Print Python traceback when POLARS_TIMEOUT_MS is exceeded (#27657)
  • Remove last global static mut (#27704)
  • Remove unused equal_element code (#27701)
  • Remove unused suspect AsRef impl (#27699)
  • Remove Box<dyn Iterator> IntoIterator for ChunkedArray (#27697)
  • Remove trailing semicolons in fmt macros (#27705)
  • Add dynamic slice to unoptimized dispatch (#27693)
  • Format missed in previous PR (#27700)
  • Bump pytest and remove codspeed (#27686)
  • Store record batch row counts custom polars IPC metadata field (#27549)
  • Remove client-side allow_local_scans option for prepare_cloud_plan (#27663)
  • Remove superfluous test (#27676)
  • Cleanup streaming flags (#27671)
  • Expose unordered concatenation in python visitor (#27666)
  • Bump deltalake and fix CI (#27660)
  • Add impl IntoAExprBuilder for ExprIR (#27656)
  • Update object_store patch repo (#27650)
  • Bump up thiserror (#27648)
  • Move async executor and primitives to polars-async (#27629)
  • Add ImageVersion to rust-cache key (#27626)
  • Rename POOL to RAYON (#27606)
  • Use first_non_null for strptime infer (#27577)
  • Add arg mapper to unoptimized dispatch (#27599)
  • Fix is_empty test (#27597)
  • Fix tz type difference pandas assert, take 2 (#27596)
  • Fix CSPE panic in cloud (#27594)
  • Fix tz type difference pandas assert (#27593)
  • Add contributing note about conventional comments (#27543)
  • Add AnonymousColumnsUdf to UnoptimizedOperation (#27513)
  • Move Quantile to FunctionIRExpr (#27498)
  • Nested common subplan elimination (#27340)
  • Remove old projection pushdown code (#27499)
  • Refactored projection pushdown with cache handling (#27422)
  • Refactor CSPE (#27425)
  • Deduplicate interns (#27470)
  • Fix merge conflict in ColumnarFunction (#27464)
  • Schema per port for PhysNode (#27302)
  • Keep the schema ordered in scan projection pushdown (#27429)
  • Remove redundant PhysNodeKind::AsOfJoin::{left_right}_by fields (#27400)
  • Bump apache-avro version (#27419)
  • Bump rustls-webpki (#27382)
  • Disable debug symbols in macos coverage tests (#27361)
  • Cargo deny (#27363)
  • Add generic tree traversal with edge value propagation (#27249)
  • Bump Python Polars version (#27315)
  • Utility for identifying expr projection heights (#27198)
  • Sink DSL and callback for Iceberg (#27258)
  • Wait for morsel consumption in merge_sorted streaming node (#27288)
  • Mark scan_ipc cache arguments as deprecated (#27216)
  • Consolidate reordered compare functions (#27229)
  • Add zip_eq to itertools (#27210)
  • Remove unused attributes (#27191)
  • Avoid unnecessary recompilation due to changing env vars (#27166)
  • Update nightly Rust compiler version (#27145)
  • Simplify pyarrow scan and process in batches (#26982)
  • Make internal typing more precise (part ii) (#27117)
  • Remove unused expression sorts (#27075)
  • Add memory usage tracking to global allocator (#27103)
  • Add sinked paths callback (#26995)
  • Pin maturin due to compile time regression (#27062)
  • Missing src/ subdirectory to CI Python docs step (#27025)
  • Really do not install pyiceberg-core 0.9.0 (#27017)
  • Naming for named scopes (#26999)
  • Enable hypothesis tests when POLARS_AUTO_NEW_STREAMING=1 (#26818)
  • Fix CI by excluding missing wheel version of pyiceberg (#27001)
  • Replace clippy::never_loop with break on named scopes (#26983)
  • Remove indirection in calling python scans (#26981)
  • Polars versions (#26980)
  • Polars version (#26971)
  • Set stricter maintain_order in test_schema_row_index_cse (#26931)
  • Bump build deps used in ARM64 Windows release pipeline (#26892)
  • Use large linux-arm runner for release (#26898)
  • Ensure .gitignore and .typos.toml exclude "_polars_runtime*" directories (#26842)
  • Additional IR slice pushdown after filter pushdown (#26815)
  • Add private _expand_paths scan function (#26798)
  • Change Expr sortedness container to AExprSorted and add nulls_last to PyExpr.set_sorted() (#26781)
  • Move stop_and_buffer_pipe_contents into joins/utils.rs (#26810)
  • Replace iejoin is_supported_type macro with a closure in predicate_pushdown/join.rs (#26812)
  • Fix first-time contributor auto-label (#26794)
  • Move Series arrow export code from into.rs to arrow_export (#26775)
  • Automatically add first-contribution label (#26780)
  • Make contributing policy more strict (#26772)
  • Add unused argument warning to ruff rules (#26720)
  • Move shared streaming CSV/NDJSON code into shared mod (#26742)
  • Undo pub removal of to_dyn_object_store (#26722)
  • Remove unused proptest.rs DataFrame file (#26676)
  • Add test for predicate before join (#26705)
  • Fix file cache debug assertion failure (#26695)
  • Put physical_plan join formatting code into a separate function (#26691)
  • Remove PlanCallback from sql (#26686)
  • Add dtype visitor (#26628)
  • Bump Rust nightly compiler version (#26379)
  • Remove unused problematic ArrayFromIter (#26639)
  • Move more boolean code to polars_compute, reusing kernels (#26636)
  • Move ? to assignment site and use extend() in StructEvalExpr (#26635)
  • Cleanup assert_schema_equal (#26596)
  • Replace some env var reading by polars-config (#26607)
  • Use monkeypatch.chdir in test_sink_path_slicing_utf8_boundaries_26324 (#26616)
  • Remove string allocation from polars_err!(Variant: "str") (#26579)
  • Add wrapper for clippy so it continues on warnings (#26527)
  • Add Buffer::split_at / Buffer::split_off (#26583)
  • Use LazyFrame.clear to clear sql (#26562)
  • Update docs (#26560)
  • Add backtrace coloring (#26544)
  • Evaluate sql process_except_intersect during IR (#26516)
  • Reformat LICENSE (#26532)
  • Add a pipeline in which we test with POLARS_IDEAL_MORSEL_SIZE=4 (#26420)
  • Remove test_file and have tests create test.parquet in tmp_path (#26525)
  • Refactor row-encoding logic in IR join lowering into separate function (#26512)
  • Fix mypy pyiceberg expression errors (#26523)
  • Make nix flake mostly work (#26517)
  • Switch to custom cloud writer with IO sink metrics (#26494)
  • Remove Default on DataType (#26511)
  • Propagate object-store error information (#26406)
  • Have parameterized series rechunk() if not allow_chunks (#26504)
  • Remove dead code (RevMapping) (#26508)
  • Rename Arena get_many_mut to get_disjoint_mut (#26491)

Thank you to all our contributors for making this release possible!
@0guban0v, @0xRozier, @BJohnBraddock, @BitWeaverDev, @ButteryPaws, @EndPositive, @HCYT, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @NathanHu725, @NedJWestern, @NeejWeej, @NicoOhR, @RedZapdos123, @RenzoMXD, @Shoeboxam, @SuryaSunil1326, @TNieuwdorp, @Voultapher, @WaffleLapkin, @abhidotsh, @abishop1990, @alexander-beedie, @andyjessen, @ankane, @aryansri05, @ashler-herrick, @azimafroozeh, @borchero, @boris324, @cBournhonesque, @carnarez, @coastalwhite, @daizutabi, @debnathshoham, @dependabot[bot], @dpinol, @dsprenkels, @dydev012, @erandagan, @etiennebacher, @farouk-01, @florianvazelle, @gab23r, @gautamvarmadatla, @henryharbeck, @hutch3232, @ilya-pevzner, @itamarst, @jberg5, @joaquinhuigomez, @johalnes, @jonasdedden, @jonathanchang31, @jonathansergio, @jorenham, @junnythemarksman, @kanenorman, @kdn36, @leudz, @lukas-reining, @lun3x, @moktamd, @mqqz, @mroeschke, @mzjp2, @nameexhaustion, @nicholaslegrand102, @ohmdelta, @orlp, @pablogsal, @pragun-ananda, @qxzcode, @ritchie46, @spock-yh, @stakeswky, @tmimmanuel, @tolleybot, @toreerdmann, @toroleapinc, @uurl, @veeceey, @waamm, @wence-, @wmoss, @xenzh, @xronocode, @yangsong97, @yonatan-genai, @yuuuxt and dependabot[bot]

Don't miss a new polars release

NewReleases is sending notifications on new releases.