🚀 Performance improvements
- Speed up
SQLinterface "ORDER BY" clauses (#26037) - Add fast kernel for is_nan and use it for numpy NaN->null conversion (#26034)
- Optimize ArrayFromIter implementations for ObjectArray (#25712)
- New streaming NDJSON sink pipeline (#25948)
- New streaming CSV sink pipeline (#25900)
- Dispatch partitioned usage of
sink_*functions to new-streaming by default (#25910) - Replace ryu with faster zmij (#25885)
- Reduce memory usage for .item() count in grouped first/last (#25787)
- Skip schema inference if schema provided for
scan_csv/ndjson(#25757) - Add width-aware chunking to prevent degradation with wide data (#25764)
- Use new sink pipeline for write/sink_ipc (#25746)
- Reduce memory usage when scanning multiple parquet files in streaming (#25747)
- Don't call cluster_with_columns optimization if not needed (#25724)
✨ Enhancements
- Add new
pl.PartitionByAPI (#26004) - ArrowStreamExportable and sink_delta (#25994)
- Release musl builds (#25894)
- Implement streaming decompression for CSV
COUNT(*)fast path (#25988) - Add nulls support for rolling_mean_by (#25917)
- Add lazy
collect_all(#25991) - Add streaming decompression for NDJSON schema inference (#25992)
- Improved handling of unqualified SQL
JOINcolumns that are ambiguous (#25761) - Drop Python 3.9 support (#25984)
- Expose record batch size in
{sink,write}_ipc(#25958) - Add
null_on_oobparameter toexpr.get(#25957) - Suggest correct timezone if timezone validation fails (#25937)
- Support streaming IPC scan from S3 object store (#25868)
- Implement streaming CSV schema inference (#25911)
- Support hashing of meta expressions (#25916)
- Improve
SQLContextrecognition of possible table objects in the Python globals (#25749) - Add pl.Expr.(min|max)_by (#25905)
- Improve MemSlice Debug impl (#25913)
- Implement or fix json encode/decode for (U)Int128, Categorical, Enum, Decimal (#25896)
- Expand scatter to more dtypes (#25874)
- Implement streaming CSV decompression (#25842)
- Add Series
sqlmethod for API consistency (#25792) - Mark Polars as safe for free-threading (#25677)
- Support Binary and Decimal in arg_(min|max) (#25839)
- Allow Decimal parsing in str.json_decode (#25797)
- Add
shiftsupport for Object data type (#25769) - Add missing
Series.arr.mean(#25774) - Allow scientific notation when parsing Decimals (#25711)
🐞 Bug fixes
- Release GIL on collect_batches (#26033)
- Missing buffer update in String is_in Parquet pushdown (#26019)
- Make
struct.with_fieldsdata model coherent (#25610) - Incorrect output order for order sensitive operations after join_asof (#25990)
- Use SeriesExport for pyo3-polars FFI (#26000)
- Add pl.Schema to type signature for DataFrame.cast (#25983)
- Don't write Parquet min/max statistics for i128 (#25986)
- Ensure chunk consistency in in-memory join (#25979)
- Fix varying block metadata length in IPC reader (#25975)
- Implement collect_batches properly in Rust (#25918)
- Fix panic on arithmetic with bools in list (#25898)
- Convert to index type with strict cast in some places (#25912)
- Empty dataframe in streaming non-strict hconcat (#25903)
- Infer large u64 in json as i128 (#25904)
- Set http client timeouts to 10 minutes (#25902)
- Correct lexicographic ordering for Parquet BYTE_ARRAY statistics (#25886)
- Raise error on duplicate
group_bynames inupsample()(#25811) - Correctly export view buffer sizes nested in Extension types (#25853)
- Fix
DataFrame.estimated_sizenot handling overlapping chunks correctly (#25775) - Ensure Kahan sum does not introduce NaN from infinities (#25850)
- Trim excess bytes in parquet decode (#25829)
- Fix panic/deadlock sinking parquet with rows larger than 64MB estimated size (#25836)
- Fix quantile
midpointinterpolation (#25824) - Don't use cast when converting from physical in list.get (#25831)
- Invalid null count on int -> categorical cast (#25816)
- Update groups in
list.eval(#25826) - Use downcast before FFI conversion in PythonScan (#25815)
- Double-counting of row metrics (#25810)
- Cast nulls to expected type in streaming union node (#25802)
- Incorrect slice pushdown into map_groups (#25809)
- Fix panic writing parquet with single bool column (#25807)
- Fix upsample with
group_byincorrectly introduced NULLs on group key columns (#25794) - Panic in top_k pruning (#25798)
- Fix incorrect
collect_schemafor unpivot followed by join (#25782) - Verify
arrnamespace is called from array column (#25650) - Ensure
LazyFrame.serialize()unchanged aftercollect_schema()(#25780) - Function map_(rows|elements) with return_dtype = pl.Object (#25753)
- Fix incorrect cargo sub-feature (#25738)
📖 Documentation
- Fix display of deprecation warning (#26010)
- Document null behaviour for
rank(#25887) - Add
QUALIFYclause andSUBSTRINGfunction to the SQL docs (#25779) - Update mixed-offset datetime parsing example in user guide (#25915)
- Update bare-metal docs for mounted anonymous results (#25801)
- Fix credential parameter name in cloud-storage.py (#25788)
- Configuration options update (#25756)
🛠️ Other improvements
- Update rust compiler (#26017)
- Improve csv test coverage (#25980)
- Ramp up CSV read size (#25997)
- Mark
lazyparameter tocollect_allas unstable (#25999) - Update
ruffaction and simplify version handling (#25940) - Run python lint target as part of pre-commit (#25982)
- Disable HTTP timeout for receiving response body (#25970)
- Fix mypy lint (#25963)
- Add AI contribution policy (#25956)
- Fix failing scan delta S3 test (#25932)
- Improve MemSlice Debug impl (#25913)
- Remove and deprecate batched csv reader (#25884)
- Remove unused AnonymousScan functions (#25872)
- Filter DeprecationWarning from pyparsing indirectly through pyiceberg (#25854)
- Various small improvements (#25835)
- Clear venv with appropriate version of Python (#25851)
- Skip schema inference if schema provided for
scan_csv/ndjson(#25757) - Ensure proper async connection cleanup on DB test exit (#25766)
- Ensure we uninstall other Polars runtimes in CI (#25739)
- Make 'make requirements' more robust (#25693)
- Remove duplicate compression level types (#25723)
Thank you to all our contributors for making this release possible!
@AndreaBozzo, @EndPositive, @Kevin-Patyk, @MarcoGorelli, @Voultapher, @alexander-beedie, @anosrepenilno, @arlyon, @azimafroozeh, @carnarez, @dependabot[bot], @dsprenkels, @edizeqiri, @eitanf, @gab23r, @henryharbeck, @hutch3232, @ion-elgreco, @jqnatividad, @kdn36, @lun3x, @m1guelperez, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @sachinn854, @yonikremer and dependabot[bot]