pola-rs/polars py-1.37.0 on GitHub

🚀 Performance improvements

Speed up SQL interface "ORDER BY" clauses (#26037)
Add fast kernel for is_nan and use it for numpy NaN->null conversion (#26034)
Optimize ArrayFromIter implementations for ObjectArray (#25712)
New streaming NDJSON sink pipeline (#25948)
New streaming CSV sink pipeline (#25900)
Dispatch partitioned usage of sink_* functions to new-streaming by default (#25910)
Replace ryu with faster zmij (#25885)
Reduce memory usage for .item() count in grouped first/last (#25787)
Skip schema inference if schema provided for scan_csv/ndjson (#25757)
Add width-aware chunking to prevent degradation with wide data (#25764)
Use new sink pipeline for write/sink_ipc (#25746)
Reduce memory usage when scanning multiple parquet files in streaming (#25747)
Don't call cluster_with_columns optimization if not needed (#25724)

✨ Enhancements

Add new pl.PartitionBy API (#26004)
ArrowStreamExportable and sink_delta (#25994)
Release musl builds (#25894)
Implement streaming decompression for CSV COUNT(*) fast path (#25988)
Add nulls support for rolling_mean_by (#25917)
Add lazy collect_all (#25991)
Add streaming decompression for NDJSON schema inference (#25992)
Improved handling of unqualified SQL JOIN columns that are ambiguous (#25761)
Drop Python 3.9 support (#25984)
Expose record batch size in {sink,write}_ipc (#25958)
Add null_on_oob parameter to expr.get (#25957)
Suggest correct timezone if timezone validation fails (#25937)
Support streaming IPC scan from S3 object store (#25868)
Implement streaming CSV schema inference (#25911)
Support hashing of meta expressions (#25916)
Improve SQLContext recognition of possible table objects in the Python globals (#25749)
Add pl.Expr.(min|max)_by (#25905)
Improve MemSlice Debug impl (#25913)
Implement or fix json encode/decode for (U)Int128, Categorical, Enum, Decimal (#25896)
Expand scatter to more dtypes (#25874)
Implement streaming CSV decompression (#25842)
Add Series sql method for API consistency (#25792)
Mark Polars as safe for free-threading (#25677)
Support Binary and Decimal in arg_(min|max) (#25839)
Allow Decimal parsing in str.json_decode (#25797)
Add shift support for Object data type (#25769)
Add missing Series.arr.mean (#25774)
Allow scientific notation when parsing Decimals (#25711)

🐞 Bug fixes

Release GIL on collect_batches (#26033)
Missing buffer update in String is_in Parquet pushdown (#26019)
Make struct.with_fields data model coherent (#25610)
Incorrect output order for order sensitive operations after join_asof (#25990)
Use SeriesExport for pyo3-polars FFI (#26000)
Add pl.Schema to type signature for DataFrame.cast (#25983)
Don't write Parquet min/max statistics for i128 (#25986)
Ensure chunk consistency in in-memory join (#25979)
Fix varying block metadata length in IPC reader (#25975)
Implement collect_batches properly in Rust (#25918)
Fix panic on arithmetic with bools in list (#25898)
Convert to index type with strict cast in some places (#25912)
Empty dataframe in streaming non-strict hconcat (#25903)
Infer large u64 in json as i128 (#25904)
Set http client timeouts to 10 minutes (#25902)
Correct lexicographic ordering for Parquet BYTE_ARRAY statistics (#25886)
Raise error on duplicate group_by names in upsample() (#25811)
Correctly export view buffer sizes nested in Extension types (#25853)
Fix DataFrame.estimated_size not handling overlapping chunks correctly (#25775)
Ensure Kahan sum does not introduce NaN from infinities (#25850)
Trim excess bytes in parquet decode (#25829)
Fix panic/deadlock sinking parquet with rows larger than 64MB estimated size (#25836)
Fix quantile midpoint interpolation (#25824)
Don't use cast when converting from physical in list.get (#25831)
Invalid null count on int -> categorical cast (#25816)
Update groups in list.eval (#25826)
Use downcast before FFI conversion in PythonScan (#25815)
Double-counting of row metrics (#25810)
Cast nulls to expected type in streaming union node (#25802)
Incorrect slice pushdown into map_groups (#25809)
Fix panic writing parquet with single bool column (#25807)
Fix upsample with group_by incorrectly introduced NULLs on group key columns (#25794)
Panic in top_k pruning (#25798)
Fix incorrect collect_schema for unpivot followed by join (#25782)
Verify arr namespace is called from array column (#25650)
Ensure LazyFrame.serialize() unchanged after collect_schema() (#25780)
Function map_(rows|elements) with return_dtype = pl.Object (#25753)
Fix incorrect cargo sub-feature (#25738)

📖 Documentation

Fix display of deprecation warning (#26010)
Document null behaviour for rank (#25887)
Add QUALIFY clause and SUBSTRING function to the SQL docs (#25779)
Update mixed-offset datetime parsing example in user guide (#25915)
Update bare-metal docs for mounted anonymous results (#25801)
Fix credential parameter name in cloud-storage.py (#25788)
Configuration options update (#25756)

🛠️ Other improvements

Update rust compiler (#26017)
Improve csv test coverage (#25980)
Ramp up CSV read size (#25997)
Mark lazy parameter to collect_all as unstable (#25999)
Update ruff action and simplify version handling (#25940)
Run python lint target as part of pre-commit (#25982)
Disable HTTP timeout for receiving response body (#25970)
Fix mypy lint (#25963)
Add AI contribution policy (#25956)
Fix failing scan delta S3 test (#25932)
Improve MemSlice Debug impl (#25913)
Remove and deprecate batched csv reader (#25884)
Remove unused AnonymousScan functions (#25872)
Filter DeprecationWarning from pyparsing indirectly through pyiceberg (#25854)
Various small improvements (#25835)
Clear venv with appropriate version of Python (#25851)
Skip schema inference if schema provided for scan_csv/ndjson (#25757)
Ensure proper async connection cleanup on DB test exit (#25766)
Ensure we uninstall other Polars runtimes in CI (#25739)
Make 'make requirements' more robust (#25693)
Remove duplicate compression level types (#25723)

Thank you to all our contributors for making this release possible!
@AndreaBozzo, @EndPositive, @Kevin-Patyk, @MarcoGorelli, @Voultapher, @alexander-beedie, @anosrepenilno, @arlyon, @azimafroozeh, @carnarez, @dependabot[bot], @dsprenkels, @edizeqiri, @eitanf, @gab23r, @henryharbeck, @hutch3232, @ion-elgreco, @jqnatividad, @kdn36, @lun3x, @m1guelperez, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @sachinn854, @yonikremer and dependabot[bot]

pola-rs/polars py-1.37.0 Python Polars 1.37.0 on GitHub

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

pola-rs/polars py-1.37.0
Python Polars 1.37.0

on GitHub