⚠️ Deprecations
- Deprecate
retries=nin favor ofstorage_options={"max_retries": n}(#26155)
🚀 Performance improvements
- Enable zero-copy object_store
putupload for IPC sink (#26288) - Resolve file schema's and metadata concurrently (#26325)
- Run elementwise CSEE for the streaming engine (#26278)
- Disable morsel splitting for fast-count on streaming engine (#26245)
- Implement streaming decompression for scan_ndjson and scan_lines (#26200)
- Improve string slicing performance (#26206)
- Refactor
scan_deltato use python dataset interface (#26190) - Add dedicated kernel for group-by
arg_max/arg_min(#26093) - Add streaming merge-join (#25964)
- Generalize Bitmap::new_zeroed opt for Buffer::zeroed (#26142)
- Reduce fs stat calls in path expansion (#26173)
- Lower streaming group_by n_unique to unique().len() (#26109)
✨ Enhancements
- Avoid OOM for scan_ndjson and scan_lines if input is compressed and negative slice (#26396)
- Support annoymous agg in-mem (#26376)
- Add unstable
arrow_schemaparameter tosink_parquet(#26323) - Improve error message formatting for structs (#26349)
- Remove parquet field overwrites (#26236)
- Enable zero-copy object_store
putupload for IPC sink (#26288) - Improved disambiguation for qualified wildcard columns in SQL projections (#26301)
- Expose
upload_concurrencythrough env var (#26263) - Allow quantile to compute multiple quantiles at once (#25516)
- Allow empty LazyFrame in
LazyFrame.group_by(...).map_groups(#26275) - Use delta file statistics for batch predicate pushdown (#26242)
- Add streaming UnorderedUnion (#26240)
- Implement compression support for sink_ndjson (#26212)
- Add unstable record batch statistics flags to
{sink/scan}_ipc(#26254) - Support CSE for python UDFs on the same address (#26253)
- Cloud retry/backoff configuration via
storage_options(#26204) - Use same sort order for expanded paths across local / cloud / directory / glob (#26191)
- Add streaming merge-join (#25964)
- Serialize optimization flags for cloud plan (#26168)
- Add compression support to write_csv and sink_csv (#26111)
- Add
scan_lines(#26112) - Support regex in
str.split(#26060) - Add unstable IPC Statistics read/write to
scan_ipc/sink_ipc(#26079) - Add unstable
heightparameter toDataFrame/LazyFrame(#26014) - Remove old partition sink API (#26100)
- Expose ArrowStreamExportable on python collect batches iterator (#26074)
- Add nulls support for all rolling_by operations (#26081)
🐞 Bug fixes
- Correct off-by-one in RLE row counting for nullable dictionary-encoded columns (#26411)
- Support very large integers in env var limits (#26399)
- Fix PlPath panic from incorrect slicing of UTF8 boundaries (#26389)
- Fix Float dtype for spearman correlation (#26392)
- Fix optimizer panic in right joins with type coercion (#26365)
- Don't serialize retry config from local environment vars (#26289)
- Fix
PartitionBywith scalar key expressions anddiff()(#26370) - Add {Float16, Float32} -> Float32 lossless upcast (#26373)
- Fix panic using
with_columnsandcollect_all(#26366) - Add multi-page support for writing dictionary-encoded Parquet columns (#26360)
- Ensure slice advancement when skipping non-inlinable values in
is_inwith inlinable needles (#26361) - Pin
xlsx2csvversion temporarily (#26352) - Bugs in ViewArray total_bytes_len (#26328)
- Overflow in i128::abs in Decimal fits check (#26341)
- Make Expr.hash on Categorical mapping-independent (#26340)
- Clone shared GroupBy node before mutation in physical plan creation (#26327)
- Fixed "sheet_name" typing for
read_odsandread_excel(#26317) - Improve Polars dtype inference from Python
Uniontyping (#26303) - Consider the "current location" of an item when computing
rolling_rank_by(#26287) - Reset
is_count_starflag between queries in collect_all (#26256) - Fix incorrect is_between filter on scan_parquet (#26284)
- Make polars compatible with ty (#26270)
- Lower AnonymousStreamingAgg in group-by as aggregate (#26258)
- Avoid overflow in
pl.durationscalar arguments case (#26213) - Broadcast arr.get on single array with multiple indices (#26219)
- Fix panic on CSPE with sorts (#26231)
- Eager
DataFrame.slicewith negative offset andlength=None(#26215) - Use correct schema side for streaming merge join lowering (#26218)
- Overflow panic in
scan_csvwith multiple files andskip_rows + n_rowslarger than total row count (#26128) - Respect
allow_objectflag after cache (#26196) - Raise error on non-elementwise PartitionBy keys (#26194)
- Allow ordered categorical dictionary in scan_parquet (#26180)
- Allow excess bytes on IPC bitmap compressed length (#26176)
- Address a macOS-specific compile issue (#26172)
- Fix deadlock on
hash_rows()of 0-width DataFrame (#26154) - Fix NameError filtering pyarrow dataset (#26166)
- Fix concat_arr panic when using categoricals/enums (#26146)
- Fix NDJSON/scan_lines negative slice splitting with extremely long lines (#26132)
- Incorrect group_by min/max fast path (#26139)
- Remove a source of non-determinism from lowering (#26137)
- Error when
with_row_indexorunpivotcreate duplicate columns on aLazyFrame(#26107) - Panics on shift with head (#26099)
📖 Documentation
- Fix
Expr.getreferencing incorrect dtype forindexparameter (#26364) - Fix
Expr.quantileformatting (#26351) - Drop
sphinx-llms-txtextension (#26285) - Remove deprecated
cublet_id(#26260) - Update for new release (#26255)
- Update MCP server section with new URL (#26241)
- Fix unmatched paren and punctuation in pandas migration guide (#26251)
- Add observatory database_path to docs (#26201)
- Note plugins in Python user-defined functions (#26138)
📦 Build system
- Address remaining Python 3.14 issues with
make requirements-all(#26195) - Address a macOS-specific compile issue (#26172)
🛠️ Other improvements
- Ensure local doctests skip
from_torchif module not installed (#26405) - Change linked timezones in test suite to canonical timezones (#26310)
- Implement various deprecations (#26314)
- Rename
Operator::DividetoRustDivide(#26339) - Properly disable the Pyodide tests (#26382)
- Remove unused field (#26367)
- Fix runtime nesting (#26359)
- Remove
xlsx2csvdependency pin (#26355) - Use outer runtime if exists in to_alp (#26353)
- Make CategoricalMapping::new pub(crate) to avoid misuse (#26308)
- Clarify IPC buffer read limit/length paramter (#26334)
- Add dtype test coverage for delta predicate filter (#26291)
- Add AI policy (#26286)
- Unpin "pandas<3" in dev dependencies (#26249)
- Remove all non CSV fast-count paths (#26233)
- Pin pandas to 2.x for now (#26221)
- Remove unnecessary xfail (#26199)
- Ensure optimization flag modification happens local (#26185)
- Simplify IcebergDataset (#26165)
- Reorganize unit tests into logical subdirectories (#26149)
- Lint leftover fixme (#26122)
- Improve backtrace for
POLARS_PANIC_ON_ERR(#26125) - Fix Python docs build (#26117)
- Disable
unused-ignoremypy lint (#26110) - Ignore mypy warning (#26105)
- Raise error on
file://hostname/path(#26061) - Disable debug info for docs workflow (#26086)
- Update docs for next polars cloud release (#26091)
- Support Python 3.14 in dev environment (#26073)
Thank you to all our contributors for making this release possible!
@Atarust, @EndPositive, @Kevin-Patyk, @LeeviLindgren, @MarcoGorelli, @Matt711, @MrAttoAttoAtto, @Voultapher, @WaffleLapkin, @agossard, @alex-gregory-ds, @alexander-beedie, @azimafroozeh, @bayoumi17m, @c-peters, @carnarez, @dependabot[bot], @dsprenkels, @hallmason17, @hamdanal, @ion-elgreco, @kdn36, @lun3x, @mcrumiller, @nameexhaustion, @orlp, @qxzcode, @r-brink, @ritchie46, @sweb and dependabot[bot]