🏆 Highlights
- Add LazyFrame.gather (#27501)
- Nested common subplan elimination (#27340)
- Stabilize streaming engine (#27497)
- Speed up parquet metadata decode with hand-written Thrift (#27427)
- Add streaming support for grouped AsOf join (#27293)
🚀 Performance improvements
- Eliminate filters with contradictory predicates (#27775)
- Update to new jemalloc (#27797)
- Do not materialize
ScalarColumnin Columnsplit_at(#27782) - Avoid materializing broadcast in
array.shift(#27740) - Avoid materializing broadcast list in
list.sample(n)andlist.sample(frac)(#27679) - Adaptive size dispatch to hashset or radix sort + capacity-aware reset in
agg_n_unique(#27719) - Dispatch
{list,arr}.{unique,n_unique,reverse}to group_by engine (#27278) - Improve in-memory grouped non-null count (#27702)
- Factor shared conjuncts out of OR-of-ANDs predicates (#27627)
- Skip downloading IPC batches exceeding slice bounds (#27683)
- Faster
Series::is_sortedfor logical / non-primitive types (#27567) - Avoid materializing broadcast list in
list.shift(#27628) - Optimise
json_decodeDatetime string parsing (#27559) - Speed up
to_numpyC-order via cache-blocked transpose (#27522) - Optimize
select(len())for non-strict horizontal concat (#27516) - Pushdown slices to inputs on left/right/full join (#27508)
- Don't infer CSV schema if schema is set (#27507)
- Nested common subplan elimination (#27340)
- Make
is_inrow-group pruning precise on null-containing haystacks (#27495) - Don't do fused-multiply-add on scalars (#27479)
- List full fast path (#27477)
- Make
is_inrow-group pruning precise on multi-value lists (#27475) - Add streaming GatherNode (#27465)
- Lower non-elementwise FunctionExprIR to ColumnarFunctionNode (#27462)
- Speed up parquet metadata decode with hand-written Thrift (#27427)
- Skip validity mask processing in __array_ufunc__ when no inputs have nulls (#27358)
- Create IR slice from expr slice pushdown (#27200)
- Add streaming support for grouped AsOf join (#27293)
- Avoid unnecessary rechunk when sorting already sorted DataFrame (#27264)
- Lower basic over() to streaming primitives (#27303)
- Lower
drop_{nulls,nans}in streaminggroup_byaggregations (#27296) - Lower
entropyto streaming reductions (#27174) - Add native streaming
interpolate(#27185) - Streaming
strptimewithformat=None(#27056) - Lower
skew/kurtosisto streaming aggregations (#27176) - Post apply pyarrow filter in Polars' engine instead of pyarrow (#27192)
- Optimize
drop_nulls().{first,last}()to{first,last}(ignore_nulls=True)(#27187) - Always process pyarrow scan in batches (#27183)
- Make
cutoutputEnumand mark as elementwise (#27173) - Remove unused expression sorts (#27075)
- Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
- Take into account size per row in join sampling (#27098)
- Streaming is_first_distinct and unique(maintain_order=True) (#27052)
- Streaming
covandcorr(#27008) - Add sorted unique node to streaming engine (#26990)
- Ensure Expr.append is lowered in streaming engine (#27022)
- Collapse consecutive Sort nodes (#26965)
- Drop
maintain_order=Truerequirement insink_delta(#27007) - Lower
index_ofto streaming engine (#26923) - Streaming native
backward_fill(#26967) - Native streaming
forward_fill(#26922) - Drop unused filter column above cache (#26955)
- Optimize
.replace()from a single value (#26948) - Add a streaming range-join (#26790)
- Lower
arg_{min,max}to streaming engine (#26845) - Additional IR slice pushdown after filter pushdown (#26815)
- Streaming first/last on Enum through physical (#26783)
- Fast filter for scalar predicates (#26745)
- Allow SimpleProjection in streaming engine to rename (#26709)
- Streaming cloud download for
scan_csv(#26637) - Drop columns only needed for predicates after the predicate is applied (#26703)
- Run projection pushdown after predicate pushdown (#26688)
- Comparison literal downcasting (#26663)
- Add dynamic predicates for TopK (#26495)
- Increase minimum default parquet row group prefetch to 8 (#26632)
- Partial predicate conversion to PyArrow (#26567)
- Streaming cloud download for
scan_ndjson/scan_lines(#26563) - Grab GIL fewer times during Object join materialization (#26587)
- Improve CSV and NDJSON cloud sink performance (#26545)
- Tune cloud writer performance (#26518)
- Allow parallel InMemorySinks in streaming engine (#26501)
- Add streaming
AsOfjoin node (#26398)
✨ Enhancements
- Expose fixed-size rolling window expressions in Python visitor (#27108)
- Expose
IR::Scanhive parts in the python node visitor (#27829) - Expose
IRFunctionExpr::DynamicPredin the python visitor (#27616) - Fix SchemaError using lazy HConcat->Sink (#27770)
- Add pinning and queuing logic to polars-ooc (#27791)
- Add tiered multi-file parquet metadata resolver (#27720)
- Cache and shuffle DNS for cloud object_store (#27659)
- Update to new jemalloc (#27797)
- Allow deeper expressions (#27768)
- Add
is_inherently_nondeterministichelper forAExpr(#27687) - Use true division for the
/operator in Polars SQL (#27391) - Add Rust backend for Expr.has_nulls (#27590)
- Add block_in_place to Polars' async executor (#27612)
- Stabilize float16 (#27607)
- Add Expr.is_empty (#27583)
- Add support for the SQL
FILTERclause for aggregate functions, andSTRING_AGG(#27564) - Make parquet
FileMetadataprunable for IR-plan dispatch (#27535) - Broadcast scalar input for
list.slice(#27487) - Add LazyFrame.gather (#27501)
- Add
null_on_oobin {Expr/Series}.gather (#27327) - Stabilize streaming engine (#27497)
- Process batched
arr.evalon overflow boundaries (#27496) - Process batched
list.evalon overflow boundaries (#27483) - Print
SLICED UNIONin LazyFrame explain (#27467) - Cargo deny (#27363)
- Add
maintain_orderparameter tomerge_sorted(#27263) - Add
ignore_nullsto{list,arr}.{any,all}(#27186) - Lock-free memory manager with spill-to-disk and fully OOC multiplexer (#26774)
- Add
is_uniqueto list/array dtypes (#27290) - Add
pl.merge_sortedoperating on multiple frames (#27014) - Add fast_alloc feature flag, remove default_alloc (#27206)
- Add a GPU slot to OptFlags so we can control CSE (similar to streaming) (#27026)
- Allow
group_by()without key exprs (#27141) - Collapse consecutive Sort nodes (#26965)
- Use UUIDv7 for sink_iceberg directory name generation (#26958)
- Truncate large binary/utf8 Parquet statistics values (#26764)
- Error if PartitionBy path provider returns absolute path that does not begin with base path, or contains '..' (#26894)
- Support Delta deletion vectors in
scan_delta(#26867) - Support Decimal32/64 in scan_parquet (#26941)
- Support casting Duration to String in ISO 8601 format (#26860)
- Add a streaming range-join (#26790)
- Support Expr for holidays in business day calculations (#26193)
- Parameter for pivot to always include value column name (#26730)
- Raise error in
.collect_schema()whenarr.get()is out-of-bounds (#26866) - Extend
Expr.reinterpretto all numeric types of the same size (#26401) - Add missing_columns parameter to scan_csv (#26787)
- Clear no-op scan projections (#26858)
- Support nested datatypes for
{min,max}_by(#26849) - Support SQL
ARRAYinit from typed literals (#26622) - Accept table identifier string in
scan_iceberg()(#26826) - Add a convenience
make freshcommand to the Makefile (#26809) - Add unstable
LazyFrame.sink_iceberg(#26799) - Add maintain order argument on implode (#26782)
- Implement predicate pushdown for aliased groupby keys (#26597)
- Speed up casting primitive to bool by at least 2x (#26823)
- Enable rowgroup skipping for float columns (#26805)
- Add expression context to errors (#26716)
- Add Decimal support for product reduction (#26725)
- Support all Iceberg V2 arrow types in sink_parquet arrow_schema parameter (#26669)
- Re-work behavior of arrow_schema parameter on sink_parquet (#26621)
- Add
contains_dtype()method forSchema(#26661) - Implement
truncateas a "to_zero" rounding mode (#26677) - Expose
AExpr::Rollingin the python visitor (#26715) - More generic streaming GroupBy lowering (#26696)
- Add basic MemoryManager to track buffered dataframes for out-of-core support later (#26443)
- Add
truncateExpression for numeric values (#26666) - Better error messages for hex literal conversion issues in the SQL interface (#26657)
- Add SQL support for
LPADandRPADstring functions (#26631) - Support SQL "FROM-first"
SELECTquery syntax (#26598) - Speed up any() and all() for nulls (#26615)
- Bump Chrono to 0.4.24, enabling stricter parsing of
%.3f/%.6f/%.9fspecifiers (#26075) - Expose unstable
assert_schema_equalin py-polars (#24869) - Allow parsing of compact ISO 8601 strings (#24629)
- Streaming cloud download for
scan_ndjson/scan_lines(#26563) - Configuration to cast integers to floats in
cast_optionsforscan_parquet(#26492) - Add escaping to quotes and newlines when reading JSON object into string (#26578)
- Standardise on RFC-5545 when doing datetime arithmetic on timezone-aware datetimes (#26425)
- Support
sas_tokenin Azure credential provider (#26565) - Expose HConcat options in the python node visitor (#26551)
- Relax SQL requirement for derived tables and subqueries to have aliases (#26543)
- Add polars-config and pl.Config.reload_env_vars() (#26524)
- Record path for object store error raised from sinks (#26541)
- Use CRC64NVME for checksum in aws sinks (#26522)
- Add
get()for binary Series (#26514) - Add streaming
AsOfjoin node (#26398)
🐞 Bug fixes
- Fix skip_batches not handling negation of bool dtype with None values (#27452)
- Use block_in_place_on for calls which can come from executor thread (#27855)
- Mismatch in max_threads -> pipeline configuration (#27854)
- Keep maintain_order on sliced unique (#27852)
- Fix SchemaError using lazy HConcat->Sink (#27770)
- Fix incorrect projection height when selecting only literals (#27825)
- Fix rolling aggregations with window_size=0 (#27812)
- Select with expr slice and len gave incorrect len (#27824)
- Prevent import panic when environment variable set to unexpected value (#27831)
- Broken link to AI Policy corrected (#27793)
- Update to new jemalloc (#27797)
- Swap PlHashMap for PlIndexMap to make Multiplexer insertion order stable (#27785)
- Compare length for inline slice as usize (#27779)
- Raise length mismatch in multiple
sort_byingroup_by(#27772) - Respect min_samples for rolling_by ops with nulls (#27706)
- Fix memory usage regression affecting TPCH Q22 (#27758)
- Add
POLARS_ALLOW_NESTED_CSPEenv var and make nested CSPE opt-in (#27765) - Post-apply residual pyarrow predicates (#27764)
- Fix loss of precision for smaller floating types(#27662) (#27732)
- Filter at scan dropped in CSPE filter pushdown (#27763)
- Fix portstate assertion error on is_in (#27757)
- Fix incorrect when/then after forward fill / reverse in groupby (#27745)
- Accept empty Thrift list encoded as bare 0x00 byte in parquet metadata (#27754)
- Stabilize object store
credentialprovidercache key (#27712) - Panic in scan of empty IPC with slice (#27708)
- Persist object_store rebuild state in cache (#27707)
- Sort flag on GroupsType only applies to first element (#27684)
- Invalid unwrap_unchecked when length isn't exact (#27685)
- Logic error in async executor block_in_place (#27698)
- Don't unwrap channel send in streaming join_asof (#27688)
- Fix
merge_sortedpanic when List in frame (#27568) - Put AsOf join buffered Morsels back the front of the deque if we cannot process them rn (#27658)
- Fix
FixedRingBufferallocation provenance (#27669) - Fix skip_batches logic for NaN (#27673)
- Raise
TypeErrorwhen callingnext()directly onGroupByobjects (#27562) - Data type comparison for extension types (#27632)
- Fix
filter_scan_irusize integer underflow (#27633) - Share last-morsel split budget across files in streaming multi-scan (#27630)
- Reset the sort-options in
Series::is_sorted()after row-encoding columns (#27614) - Rayon deadlock with re-entrant io sources (#27600)
- Don't push negative-offset slices through
HConcat(#27570) - Logic error in streaming is_empty (#27602)
- Fix incorrect CSE with large is_in literal (#27575)
- AnonymousFunction can qualify as SQL aggregator (#26986)
- Fix CSPE panic in cloud (#27594)
- Set merge-join streaming node to
Finishedif its sending port isDone(#27572) - Widen decimal precision on sum aggregation at runtime (#27579)
- Fix
str.to_timewas raising unnecessarily when input was all nulls (#27574) - Prevent panic when switching from one extension dtype to another (#27566)
- Ensure
json_decodedoesn't fail for Date and Time string deserialization (#27554) - Incorrect RUSTFLAGS passing in Makefile (#27555)
- Avoid panic on open-ended slice (#27550)
- Fix panic on reading IPC with 0-row compressed bitmap (#27551)
- Set HEAD_RESPONSE_SIZE_ESTIMATE to 0 (#27548)
- Fix lazy concat horizontal didn't raise on mismatching heights after projection pushdown (#27506)
- Prevent join panic when
suffix=""andcoalesce=True(#27376) - Do not make a
FastCountfor csv ifpre_sliceis set (#27536) - Support duplicate names in
over(#27544) - Reassign sequence numbers when distributing input morsels in streaming AsOf join node (#27538)
- Do not reverse dataframes when sorting with all-null key columns (#27517)
- Incorrect length check on streaming zip (#27505)
- Respect
nulls_lastfor descending over(order_by) ingroup_by().agg()(#27486) - Fix perf regression in
scan_csvselect(len())when collected on streaming engine (#27504) - Harden extend strictness (#27476)
- Prevent deadlock when using
to_arrow()in a multithreaded context (#27472) - Rebalance deep merge_sorted chains (#27065)
- Do not flatten sliced union (#27466)
- Prevent deadlock when using
to_pandas()in multithreaded context (#27451) - Struct rechunk bug and add Series::with_validity (#27446)
- Handle column indexing in
read_parquet/read_csvwith pyarrow reader (#27397) - Export enum as ordered dictionary to arrow (#27432)
- Ensure index column is sorted in streaming rolling aggs (#27234)
- Ensure
sample()respectsshuffle=False(#27248) - Return empty
DataFramefromconcat_listwithlitand empty column (#27305) - Read parquet
MAPcolumns withoutLogicalTypeannotation (#27404) - Raise
DuplicateErroron parquet files with duplicate column names (#27399) - Honor
havingpredicate inGroupByiter (#27370) - Use the physical dtype for
NumUnorderedImplodeReducerarrowListArray(#27375) - Address bug in
reduce_balancedfor certain input length lists affectingpl.concat(#27352) - Ensure
list.sample()allowsfraction> 1 whenwith_replacement=True(#27350) - Ensure
append()errors whenupcast=False(#27346) - Always rechunk sorts, prune sorts even in eager execution (#27356)
- Update
groupsto correct length forImplode(#27282) - Fix scan_csv missing_columns='insert' overwrote existing data with NULLs (#27297)
- Raise on non-numeric inputs in
pl.int_ranges(#27294) - Do not skip nulls when enumerating over rows in grouped AsOf join (#27275)
- Fix
pivotdropping data for nullonvalues (#27273) - Resolve multiple files deadlock in CSV async reader (#27073)
- Widen decimal precision on sum aggregation (#27270)
- Correct lf.remote type (#27261)
- Extend
StructEvalschema context inStackOptimizer(#27243) - Prevent panic when casting
Arrayto extension type with same inner type (#27220) - Preserve nulls when casting from all-null
SeriestoStruct(#27241) - Off-by-one in
lp.with_inputslength assertion (#27209) - Fix
scan_deltafilter on empty dataframe (#27244) - Prevent
DataFramecreation panic onlist[struct]with heterogenous types (#27217) - Skip
nullgroup entries when collecting AsOf-by groups (#27215) - Fix panic with empty order_by in over expression (#27088)
- Write field ID from
sink_parquet(#27196) - Fix statistics for Null columns in Parquet (#27021)
- Do not prune sort nodes containing slice with dyn predicate (#27140)
- Correct grouped
Binaryarg_min/arg_maxandStringsingle-element arg indices (#27172) - Fix scalar handling in
str.replaceduring streaming (#27182) - Resolve multiple files deadlock in NDJSON async reader (#27204)
- Overflow panic in interpolate nearest (#27205)
- Using checked arithmetic in
int96_to_i64_nsto prevent overflow panic (#27129) - Don't trigger csv fast count if predicate is pushed down (#27190)
- Streaming sort by-expressions were lowered incorrectly (#27158)
- Reset IO metrics instead of consuming (#27156)
- Output SVG if output_path ends with '.svg' in show_graph (#27144)
- Skip extension types for min/max in describe (#27120)
- Fix incorrect IO metrics on multi-phase streaming execution (#27123)
- Use delta stats for mixed hive and non-hive predicate pushdown (#27102)
- Make the files used in docs available locally (#27121)
- Apply scalar bound in
clipwhen the Series bound contains nulls (#27087) - Ignore
ddofparameter inrolling_corrand deprecate (#27104) - Preserve casts for horizontal ops with untyped literals (#27011)
- Reject invalid input to
sql_expr(#27084) - Ensure SQL
COUNT(<lit>)expressions return the correct value (#27085) - Regression in replace_strict for enums (#27066)
- Make
test_group_by_arg_max_boolean_26978non-flaky formax_byties (#27048) - Null count for aggregated list inside count aggregation (#27032)
- Panic in streaming MergeSortedNode (#27024)
- Prevent panic in
transpose()with mixed List and non-List columns (#27038) - Set sorted flag for Boolean and Time (#27035)
- Missing
src/subdirectory to CI Python docs step (#27025) - Resolve stack overflow on
merge_sortedandunion(#27018) - Make
pl.DataFrame.fill_nullwork on columns withNulldtype (#27020) - Fix initial MutableBooleanArray::extend_constant(count, None) calls (#26813)
- Fix repeated word typos in comments (#26917)
- Covariance with constant is zero, not NaN (#27015)
- Don't remove
set_sortedin projection pushdown (#27006) - Infer nulls when df create from empty-struct (#26991)
- Correct suggestion in multi-expr filter error (#27003)
- Implement
agg_arg_min/agg_arg_maxforbooleandata type (#26997) - Raise error instead of panic for unsupported pivot aggregate (#26863)
- Validate fraction is between
0.0and1.0inlist.sample(#26964) - Informative error for multi-quantile in
group_by(#26957) - Raise for duplicate columns in
over()(#26968) - Preserve height when unnesting empty struct columns (#26947)
- Support Decimal32/64 in scan_parquet (#26941)
- Follow-up on streaming range-join PR (#26944)
- Fix ColumnNotFound due to projection between filter/cache in CSPE (#26946)
- Fix panic on upsample() with group_by parameter on empty DataFrame (#26936)
- Fix the loop bounds in
BitmapBuilder::extend_each_repeated_from_slice_unchecked(#26928) - Default engine as streaming for
collect_batches(#26932) - Set stricter
maintain_orderintest_schema_row_index_cse(#26931) - Fix error passing
Seriesof dates to business functions (#26927) - Propagate null in
min_by/max_byfor all-null by groups (#26919) - Fix panic on lazy concat->filter->slice with CSPE (#26907)
- Handle empty rolling windows in streaming engine (#26903)
- Prevent
Booleanarithmetic with integer literals producingUnknowntype in streaming engine (#26878) - Fix sink to partitioned S3 from Windows corrupted slashes (#26889)
- Remove outdated warning about List columns in unique() (#26295) (#26890)
- Restore pyarrow predicate conversion for is_in (#26811)
- Release GIL before df.to_ndarray() to avoid deadlock (#26832)
- Fix panic on CSV count_rows with FORCE_ASYNC (#26883)
- Add scalar comparisons for
UInt128series (#26886) - Fix shape error not raised for 0 width inputs with non-0 height for streaming horizontal concat (#26877)
- Fix streaming zip-broadcast node did not raise shape mismatch on empty recv from ready port (#26871)
- Fix incorrect output list.eval with scalar expr, fix panic on list.agg with nulls (#26868)
- Incorrect arg_sort with descending+limit (#26839)
- Raise error in
.collect_schema()whenarr.get()is out-of-bounds (#26866) - Return ComputeError instead of panicking in map_groups UDF (#26665)
- Issue PerformanceWarning in
LazyFrame.__contains__(#26734) - Segfault in
JoinExecon deep plan (#26796) - Fix unary expressions on literal in
overcontext (#26827) - Fix
{min,max}_byin streaming engine for Boolean full{min,max}value column (#26848) - Fix debug panic on clip with nan bound (#26854)
- Support grouped
{arg_,}_{min,max}for Categoricals (#26856) - Throw an error if a string is passed to LazyFrame.pivot
on_columns(#26852) - Preserve input float precision in
rolling_cov()androlling_corr()with mixed input types (#26820) - Preserve row count when converting zero-column DataFrame via arrow PyCapsule interface (#26835)
- Prevent infinite recursion in streaming
group_byfallback (#26801) - Use
RowEncodingContext::Structwhen determiningD::Structencoded item len (#26817) - Incorrectly applied CSE on different map_batches functions (#26822)
- Fix duplicated query execution on todo panic when combining
collect(engine='streaming')withPOLARS_AUTO_NEW_STREAMING(#26792) - Prevent predicate pushdown across Sort with baked-in slice (#26804)
- Fix panic on lazy sink_parquet created in pipe_with_schema (#26784)
- Support
{column_name}and{index}placeholders in pl.format string (#26771) - Do not use merge-join if
nulls_lastis unknown (#26778) - Normalize float zeros in Parquet column statistics (#26776)
- Fix out-of-bounds for positive offset in windowed
rolling(#26724) - Raise error when
.get()is out-of-bounds in group by context (#26752) - Boolean
bitwise_xoraggregation inverted when column contains nulls (#26749) - Parameter nulls_last was ignored in over (#26718)
- Allow missing time in inexact strptime (#26714)
- Return
NaNwhen usingcorr()with a literal and expr (#26697) - Allow strict horizontal concat with empty df (#26345)
- Fix
PoisonErrorpanic caused by reentrant usage of file cache (#26627) - Return null for int values exceeding 128-bit range with
strict=False(#26674) - Incorrect boolean min/max with nulls (#26671)
- Slice-slice pushdown for n_rows (#26673)
- Resolve panic in
Enumstruct slicing (#26643) - Fix CSPE for group_by.map_groups (#26640)
- Remove non-existent parameter from
SQLContexttyping overloads (#26658) - Replace panic with error when sorting object dtype columns (#26601)
- Fix
to_pandas()on empty enum Series did not preserve enum dictionary (#26610) - Rounding behaviour for
f32values with "HalfAwayFromZero" mode (#26624) - Correct arg_(min|max) for scalar columns (#26609)
- Use monkeypatch.chdir in test_sink_path_slicing_utf8_boundaries_26324 (#26616)
- Materialize unknown scalar int/float literals in
collect_dtype()(#26595) - Return error when
by=is nested type inmin_by/max_by(#26593) - Fix
assert_frame_not_equal()did not raise on dtype mismatch (#26590) - Respect SQL semantics for cumulative functions mapped via
OVERclause (#26570) - Fix incorrect multiplexer output ordering on source token stop request (#26561)
- Fix PyIceberg filter on boolean column (#26550)
- Fix
*_rangeexprs incorrectly marked as row separable (#26549) - Set
dictionary_page_offsetwhen dictionary encoding is used and pointdata_page_offsetto the first data page (#26542) - Prevent GPU engine panic on SinkMultiple nodes (#26537)
- Move query parameters to request body when retrieving Unity Catalog temporary credentials (#26539)
- Implement
PhysicalExprforMinBy/MaxBynodes (#26506) - Refactor row-encoding logic in IR join lowering into separate function (#26512)
- Correctly check for path extensions (#26513)
- Change AsOf join to be based on
TotalOrd(#26497) - Correctly raise error on failing nested strict casts (#26499)
- Prevent invalid type casts in
replace_strict()(#26453) - Return
nullwhen dividing literals by0(#26343)
📖 Documentation
- Bump to patched version (#27851)
- Replace Typeform sign-up URL with new enterprise link (#27838)
- Correct wrong head call (#27848)
- Add Polars On-Prem 0.5.0 release (#27849)
- Correct onprem license helm values (#27847)
- Update connecting Polars Cloud to AWS documentation (#27823)
- Broken link to AI Policy corrected (#27793)
- Add release dates to the On-Prem releases page (#27787)
- Improve on-prem docs (#27788)
- Add query profiler video to On-Prem user guide (#27786)
- Add EKS/AKS/GKE guides (#27774)
- Sync from Polars Cloud (#27751)
- Document Expr.list.__getitem__ (#27689)
- Add cloudpickle requirement (#27703)
- Clarify from_arrow schema ordering (#27493)
- Clarify schema column order (#27681)
- Update DataFrame construction docs for Column (#27541)
- Document all valid
engineoptions on LazyFrame collect/sink/explain methods (#27374) - Drop redundant Pattern 2 from Dagster integration page (#27581)
- Update to remove Dockerhub PAT references (#27582)
- Modernize Dagster integration example for Polars Cloud (#27560)
- Use Polars random seed in sample example (#27537)
- Make expressions operations RNG deterministic (#27494)
- Document struct field order (#27492)
- Add See Also sections for datetime docstrings (#27316)
- Polars On-Prem release (#27439)
- Rename to Polars On-Prem (#27435)
- Split out openlineage docs into guide and configuration (#27371)
- Add explanation on the observatory sqlite db file (#27354)
- Add documentation for openlineage on-premises (#27334)
- Release page (#27335)
- Update uv pip install polars-on-premises cmd (#27330)
- Fix outdated
LazyGroupBy.map_groupsdocstring (#27292) - Add
deny_anonymous_usersto scheduler config (#27287) - Slurm documentation (#27259)
- Add link to concepts in index.md (#27077)
- Add docs entry for
merge_sorted(#27224) - Fix typo (#27212)
- Make the files used in docs available locally (#27121)
- Put first-time contribution requirements in its own linkable section (#27113)
- Change Polars Cloud API to 0.6.0 (#27005)
- Query Profiler addition to User Guide (#26623)
- Add documentation for on_columns for LazyFrame pivot (#26859)
- Mention ComputeContexts create ephemeral environments by default and hint at re-use (#26692)
- Remove confusing join validation note (#26795)
- Fix broken AI policy link (#26728)
- Create Polars Cloud Glossary (#26690)
- Additional SQL documentation (#26662)
- Include invalidate_caches in bisect instructions (#26641)
- Add git bisect guide to contributing docs (#26634)
- Updated Airflow orchestration documentation (#26585)
- Improve SQL docs for
EXTRACTandDATE_PARTfunctions (#26575) - Remove reference to
MutableStructArrayin module doc (#26557) - Fix docstring for bitwise_count_zeros method (#26519)
- Add
get()for binary Series (#26514)
📦 Build system
- Also split debug info in debug-release (#27609)
- Use split-debuginfo on linux (#27608)
- Bump deltalake to 1.5.1 in CI (#27387)
- Really do not install pyiceberg-core 0.9.0 (#27017)
- Bump up numpy and pyo3 to 0.28 (#26743)
🛠️ Other improvements
- Add statistics to spill contexts (#27859)
- Include license file in polars-ooc crate (#27864)
- Changes needed for Rust 0.54.x (#27853)
- Use
Vecinstead ofPlHashMapforProjectionInfo.map(#27856) - Reduce
codegen-units(#27835) - Deduplicate thrift field-walk loops (#27790)
- Harden against async blocking deadlocks (take 2) (#27767)
- Added
jlumbroso/free-disk-spacecleaning action where relevant (#27769) - Update runtime edition to 2024 (#27746)
- Remove redundant DSL::AGG::Unique (#27718)
- Harden against async blocking deadlocks (#27653)
- Print Python traceback when POLARS_TIMEOUT_MS is exceeded (#27657)
- Remove last global static mut (#27704)
- Remove unused equal_element code (#27701)
- Remove unused suspect AsRef impl (#27699)
- Remove Box<dyn Iterator> IntoIterator for ChunkedArray (#27697)
- Remove trailing semicolons in fmt macros (#27705)
- Add dynamic slice to unoptimized dispatch (#27693)
- Format missed in previous PR (#27700)
- Bump pytest and remove codspeed (#27686)
- Store record batch row counts custom polars IPC metadata field (#27549)
- Remove client-side
allow_local_scansoption forprepare_cloud_plan(#27663) - Remove superfluous test (#27676)
- Cleanup streaming flags (#27671)
- Expose unordered concatenation in python visitor (#27666)
- Bump
deltalakeand fix CI (#27660) - Add
impl IntoAExprBuilderforExprIR(#27656) - Update object_store patch repo (#27650)
- Bump up thiserror (#27648)
- Move async executor and primitives to polars-async (#27629)
- Add ImageVersion to rust-cache key (#27626)
- Rename POOL to RAYON (#27606)
- Use first_non_null for strptime infer (#27577)
- Add arg mapper to unoptimized dispatch (#27599)
- Fix is_empty test (#27597)
- Fix tz type difference pandas assert, take 2 (#27596)
- Fix CSPE panic in cloud (#27594)
- Fix tz type difference pandas assert (#27593)
- Add contributing note about conventional comments (#27543)
- Add AnonymousColumnsUdf to UnoptimizedOperation (#27513)
- Move Quantile to FunctionIRExpr (#27498)
- Nested common subplan elimination (#27340)
- Remove old projection pushdown code (#27499)
- Refactored projection pushdown with cache handling (#27422)
- Refactor CSPE (#27425)
- Deduplicate interns (#27470)
- Fix merge conflict in ColumnarFunction (#27464)
- Schema per port for PhysNode (#27302)
- Keep the schema ordered in scan projection pushdown (#27429)
- Remove redundant
PhysNodeKind::AsOfJoin::{left_right}_byfields (#27400) - Bump
apache-avroversion (#27419) - Bump rustls-webpki (#27382)
- Disable debug symbols in macos coverage tests (#27361)
- Cargo deny (#27363)
- Add generic tree traversal with edge value propagation (#27249)
- Bump Python Polars version (#27315)
- Utility for identifying expr projection heights (#27198)
- Sink DSL and callback for Iceberg (#27258)
- Wait for morsel consumption in merge_sorted streaming node (#27288)
- Mark
scan_ipccache arguments as deprecated (#27216) - Consolidate reordered compare functions (#27229)
- Add zip_eq to itertools (#27210)
- Remove unused attributes (#27191)
- Avoid unnecessary recompilation due to changing env vars (#27166)
- Update nightly Rust compiler version (#27145)
- Simplify pyarrow scan and process in batches (#26982)
- Make internal typing more precise (part ii) (#27117)
- Remove unused expression sorts (#27075)
- Add memory usage tracking to global allocator (#27103)
- Add sinked paths callback (#26995)
- Pin maturin due to compile time regression (#27062)
- Missing
src/subdirectory to CI Python docs step (#27025) - Really do not install pyiceberg-core 0.9.0 (#27017)
- Naming for named scopes (#26999)
- Enable hypothesis tests when
POLARS_AUTO_NEW_STREAMING=1(#26818) - Fix CI by excluding missing wheel version of pyiceberg (#27001)
- Replace
clippy::never_loopwith break on named scopes (#26983) - Remove indirection in calling python scans (#26981)
- Polars versions (#26980)
- Polars version (#26971)
- Set stricter
maintain_orderintest_schema_row_index_cse(#26931) - Bump build deps used in ARM64 Windows release pipeline (#26892)
- Use large linux-arm runner for release (#26898)
- Ensure
.gitignoreand.typos.tomlexclude"_polars_runtime*"directories (#26842) - Additional IR slice pushdown after filter pushdown (#26815)
- Add private
_expand_pathsscan function (#26798) - Change
Exprsortedness container toAExprSortedand addnulls_lasttoPyExpr.set_sorted()(#26781) - Move
stop_and_buffer_pipe_contentsintojoins/utils.rs(#26810) - Replace iejoin
is_supported_typemacro with a closure inpredicate_pushdown/join.rs(#26812) - Fix first-time contributor auto-label (#26794)
- Move Series arrow export code from
into.rstoarrow_export(#26775) - Automatically add first-contribution label (#26780)
- Make contributing policy more strict (#26772)
- Add unused argument warning to ruff rules (#26720)
- Move shared streaming CSV/NDJSON code into shared mod (#26742)
- Undo pub removal of to_dyn_object_store (#26722)
- Remove unused
proptest.rsDataFramefile (#26676) - Add test for predicate before join (#26705)
- Fix file cache debug assertion failure (#26695)
- Put physical_plan join formatting code into a separate function (#26691)
- Remove PlanCallback from sql (#26686)
- Add dtype visitor (#26628)
- Bump Rust nightly compiler version (#26379)
- Remove unused problematic ArrayFromIter (#26639)
- Move more boolean code to polars_compute, reusing kernels (#26636)
- Move
?to assignment site and useextend()inStructEvalExpr(#26635) - Cleanup
assert_schema_equal(#26596) - Replace some env var reading by polars-config (#26607)
- Use monkeypatch.chdir in test_sink_path_slicing_utf8_boundaries_26324 (#26616)
- Remove string allocation from
polars_err!(Variant: "str")(#26579) - Add wrapper for clippy so it continues on warnings (#26527)
- Add
Buffer::split_at/Buffer::split_off(#26583) - Use
LazyFrame.clearto clear sql (#26562) - Update docs (#26560)
- Add backtrace coloring (#26544)
- Evaluate sql
process_except_intersectduring IR (#26516) - Reformat LICENSE (#26532)
- Add a pipeline in which we test with
POLARS_IDEAL_MORSEL_SIZE=4(#26420) - Remove
test_fileand have tests createtest.parquetintmp_path(#26525) - Refactor row-encoding logic in IR join lowering into separate function (#26512)
- Fix mypy pyiceberg expression errors (#26523)
- Make nix flake mostly work (#26517)
- Switch to custom cloud writer with IO sink metrics (#26494)
- Remove Default on DataType (#26511)
- Propagate object-store error information (#26406)
- Have parameterized series
rechunk()ifnot allow_chunks(#26504) - Remove dead code (
RevMapping) (#26508) - Rename Arena
get_many_muttoget_disjoint_mut(#26491)
Thank you to all our contributors for making this release possible!
@0guban0v, @0xRozier, @BJohnBraddock, @BitWeaverDev, @ButteryPaws, @EndPositive, @HCYT, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @NathanHu725, @NedJWestern, @NeejWeej, @NicoOhR, @RedZapdos123, @RenzoMXD, @Shoeboxam, @SuryaSunil1326, @TNieuwdorp, @Voultapher, @WaffleLapkin, @abhidotsh, @abishop1990, @alexander-beedie, @andyjessen, @ankane, @aryansri05, @ashler-herrick, @azimafroozeh, @borchero, @boris324, @cBournhonesque, @carnarez, @coastalwhite, @daizutabi, @debnathshoham, @dependabot[bot], @dpinol, @dsprenkels, @dydev012, @erandagan, @etiennebacher, @farouk-01, @florianvazelle, @gab23r, @gautamvarmadatla, @henryharbeck, @hutch3232, @ilya-pevzner, @itamarst, @jberg5, @joaquinhuigomez, @johalnes, @jonasdedden, @jonathanchang31, @jonathansergio, @jorenham, @junnythemarksman, @kanenorman, @kdn36, @leudz, @lukas-reining, @lun3x, @moktamd, @mqqz, @mroeschke, @mzjp2, @nameexhaustion, @nicholaslegrand102, @ohmdelta, @orlp, @pablogsal, @pragun-ananda, @qxzcode, @ritchie46, @spock-yh, @stakeswky, @tmimmanuel, @tolleybot, @toreerdmann, @toroleapinc, @uurl, @veeceey, @waamm, @wence-, @wmoss, @xenzh, @xronocode, @yangsong97, @yonatan-genai, @yuuuxt and dependabot[bot]