github pola-rs/polars py-1.37.0
Python Polars 1.37.0

one day ago

🚀 Performance improvements

  • Speed up SQL interface "ORDER BY" clauses (#26037)
  • Add fast kernel for is_nan and use it for numpy NaN->null conversion (#26034)
  • Optimize ArrayFromIter implementations for ObjectArray (#25712)
  • New streaming NDJSON sink pipeline (#25948)
  • New streaming CSV sink pipeline (#25900)
  • Dispatch partitioned usage of sink_* functions to new-streaming by default (#25910)
  • Replace ryu with faster zmij (#25885)
  • Reduce memory usage for .item() count in grouped first/last (#25787)
  • Skip schema inference if schema provided for scan_csv/ndjson (#25757)
  • Add width-aware chunking to prevent degradation with wide data (#25764)
  • Use new sink pipeline for write/sink_ipc (#25746)
  • Reduce memory usage when scanning multiple parquet files in streaming (#25747)
  • Don't call cluster_with_columns optimization if not needed (#25724)

✨ Enhancements

  • Add new pl.PartitionBy API (#26004)
  • ArrowStreamExportable and sink_delta (#25994)
  • Release musl builds (#25894)
  • Implement streaming decompression for CSV COUNT(*) fast path (#25988)
  • Add nulls support for rolling_mean_by (#25917)
  • Add lazy collect_all (#25991)
  • Add streaming decompression for NDJSON schema inference (#25992)
  • Improved handling of unqualified SQL JOIN columns that are ambiguous (#25761)
  • Drop Python 3.9 support (#25984)
  • Expose record batch size in {sink,write}_ipc (#25958)
  • Add null_on_oob parameter to expr.get (#25957)
  • Suggest correct timezone if timezone validation fails (#25937)
  • Support streaming IPC scan from S3 object store (#25868)
  • Implement streaming CSV schema inference (#25911)
  • Support hashing of meta expressions (#25916)
  • Improve SQLContext recognition of possible table objects in the Python globals (#25749)
  • Add pl.Expr.(min|max)_by (#25905)
  • Improve MemSlice Debug impl (#25913)
  • Implement or fix json encode/decode for (U)Int128, Categorical, Enum, Decimal (#25896)
  • Expand scatter to more dtypes (#25874)
  • Implement streaming CSV decompression (#25842)
  • Add Series sql method for API consistency (#25792)
  • Mark Polars as safe for free-threading (#25677)
  • Support Binary and Decimal in arg_(min|max) (#25839)
  • Allow Decimal parsing in str.json_decode (#25797)
  • Add shift support for Object data type (#25769)
  • Add missing Series.arr.mean (#25774)
  • Allow scientific notation when parsing Decimals (#25711)

🐞 Bug fixes

  • Release GIL on collect_batches (#26033)
  • Missing buffer update in String is_in Parquet pushdown (#26019)
  • Make struct.with_fields data model coherent (#25610)
  • Incorrect output order for order sensitive operations after join_asof (#25990)
  • Use SeriesExport for pyo3-polars FFI (#26000)
  • Add pl.Schema to type signature for DataFrame.cast (#25983)
  • Don't write Parquet min/max statistics for i128 (#25986)
  • Ensure chunk consistency in in-memory join (#25979)
  • Fix varying block metadata length in IPC reader (#25975)
  • Implement collect_batches properly in Rust (#25918)
  • Fix panic on arithmetic with bools in list (#25898)
  • Convert to index type with strict cast in some places (#25912)
  • Empty dataframe in streaming non-strict hconcat (#25903)
  • Infer large u64 in json as i128 (#25904)
  • Set http client timeouts to 10 minutes (#25902)
  • Correct lexicographic ordering for Parquet BYTE_ARRAY statistics (#25886)
  • Raise error on duplicate group_by names in upsample() (#25811)
  • Correctly export view buffer sizes nested in Extension types (#25853)
  • Fix DataFrame.estimated_size not handling overlapping chunks correctly (#25775)
  • Ensure Kahan sum does not introduce NaN from infinities (#25850)
  • Trim excess bytes in parquet decode (#25829)
  • Fix panic/deadlock sinking parquet with rows larger than 64MB estimated size (#25836)
  • Fix quantile midpoint interpolation (#25824)
  • Don't use cast when converting from physical in list.get (#25831)
  • Invalid null count on int -> categorical cast (#25816)
  • Update groups in list.eval (#25826)
  • Use downcast before FFI conversion in PythonScan (#25815)
  • Double-counting of row metrics (#25810)
  • Cast nulls to expected type in streaming union node (#25802)
  • Incorrect slice pushdown into map_groups (#25809)
  • Fix panic writing parquet with single bool column (#25807)
  • Fix upsample with group_by incorrectly introduced NULLs on group key columns (#25794)
  • Panic in top_k pruning (#25798)
  • Fix incorrect collect_schema for unpivot followed by join (#25782)
  • Verify arr namespace is called from array column (#25650)
  • Ensure LazyFrame.serialize() unchanged after collect_schema() (#25780)
  • Function map_(rows|elements) with return_dtype = pl.Object (#25753)
  • Fix incorrect cargo sub-feature (#25738)

📖 Documentation

  • Fix display of deprecation warning (#26010)
  • Document null behaviour for rank (#25887)
  • Add QUALIFY clause and SUBSTRING function to the SQL docs (#25779)
  • Update mixed-offset datetime parsing example in user guide (#25915)
  • Update bare-metal docs for mounted anonymous results (#25801)
  • Fix credential parameter name in cloud-storage.py (#25788)
  • Configuration options update (#25756)

🛠️ Other improvements

  • Update rust compiler (#26017)
  • Improve csv test coverage (#25980)
  • Ramp up CSV read size (#25997)
  • Mark lazy parameter to collect_all as unstable (#25999)
  • Update ruff action and simplify version handling (#25940)
  • Run python lint target as part of pre-commit (#25982)
  • Disable HTTP timeout for receiving response body (#25970)
  • Fix mypy lint (#25963)
  • Add AI contribution policy (#25956)
  • Fix failing scan delta S3 test (#25932)
  • Improve MemSlice Debug impl (#25913)
  • Remove and deprecate batched csv reader (#25884)
  • Remove unused AnonymousScan functions (#25872)
  • Filter DeprecationWarning from pyparsing indirectly through pyiceberg (#25854)
  • Various small improvements (#25835)
  • Clear venv with appropriate version of Python (#25851)
  • Skip schema inference if schema provided for scan_csv/ndjson (#25757)
  • Ensure proper async connection cleanup on DB test exit (#25766)
  • Ensure we uninstall other Polars runtimes in CI (#25739)
  • Make 'make requirements' more robust (#25693)
  • Remove duplicate compression level types (#25723)

Thank you to all our contributors for making this release possible!
@AndreaBozzo, @EndPositive, @Kevin-Patyk, @MarcoGorelli, @Voultapher, @alexander-beedie, @anosrepenilno, @arlyon, @azimafroozeh, @carnarez, @dependabot[bot], @dsprenkels, @edizeqiri, @eitanf, @gab23r, @henryharbeck, @hutch3232, @ion-elgreco, @jqnatividad, @kdn36, @lun3x, @m1guelperez, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @sachinn854, @yonikremer and dependabot[bot]

Don't miss a new polars release

NewReleases is sending notifications on new releases.