github pola-rs/polars py-1.20.0
Python Polars 1.20.0

one day ago

⚠️ Deprecations

  • Make parameter of str.to_decimal keyword-only (#20570)

🚀 Performance improvements

  • Extend functionality on BitmapBuilder and use in Growables (#20754)
  • Specialize first/last agg for simple types in new-streaming engine (#20728)
  • Use PyO3 to convert between Python and Rust datetimes (#20660)
  • Improve state caching and parallelism of window functions (#20689)
  • Broadcast without materialization in concat_arr (#20681)
  • Cache rolling groups (#20675)
  • Use downcast_ref instead of dtype equality in <dyn SeriesTrait as AsRef<ChunkedArray<T>> (#20664)
  • Fix performance regression for DataFrame serialization/pickling (#20641)
  • Make Parquet verify_dict_indices SIMD (#20623)
  • Move to zlib-rs by default and use zstd::with_buffer (#20614)
  • Skip filter expansion in eager (#20586)
  • Improve unique pred-pd (#20569)

✨ Enhancements

  • Allow different python versions for pickle (#20740)
  • Add SQL support for the NORMALIZE string function (#20705)
  • Add 'allow_exact_matches' join_asof' (#20723)
  • Add new-streaming first/last aggregations (#20716)
  • Add Parquet Sink to new streaming engine (#20690)
  • Make automatic use of Azure storage account keys opt-in (#20652)
  • Reduce scan_csv() (and friends') memory usage when using BytesIO (#20649)
  • Improve GroupsProxy/GroupsPosition to be sliceable and cheaply cloneable (#20673)
  • Add str.normalize() (#20483)
  • Allow more group_by agg expressions in the new streaming engine (#20663)
  • Support loading Excel Table objects by name (#20654)
  • Support writing to file objects from write_excel (#20638)
  • Raise DuplicateError if given a pyarrow Table object with duplicate column names (#20624)
  • Support writing partitioned parquet to cloud (#20590)
  • Add hint to error message for extra struct field in JSON (#20612)
  • Add index_of() function to Series and Expr (#19894)
  • Update sqlparser-rs, enabling "LEFT" keyword to be optional for anti/semi joins in SQL queries (#20576)
  • Add cat.starts_with/cat.ends_with (#20257)

🐞 Bug fixes

  • Avoid blocking on async runtime when resolving cloud scans (#20750)
  • Fix allow_invalid_certificates being ignored in storage_options (#20744)
  • Incorrect output type for map_groups returning all-NULL column (#20743)
  • Fix unique(maintain_order=True) raising InvalidOperationError for null array (#20737)
  • Don't collapse into a Nested Loop Join if the cross join maintains order (#20729)
  • Don't serialize credentials provider (#20741)
  • Fix Series.n_unique raising for list of struct (#20724)
  • Fix incorrect top-k by sorted column, fix head() returning extra rows (#20722)
  • Add outer validity to AnyValueBufferTrusted for structs (#20713)
  • Don't partition group-by with non-scalar literals in agg (#20704)
  • Fix xor operation of selector with Expr (#20702)
  • Incorrect view buffer dedup (#20691)
  • Only verify Parquet ConvertedType if no LogicalType is given (#20682)
  • Validate length of schema_overrides in read_csv (#20672)
  • Fix map_elements ignoring skip_nulls=True for struct dtype (#20668)
  • Check for MAP-GROUPS in cloud-eligible (#20662)
  • Fix empty output of to_arrow() on filtered unit height DataFrame (#20656)
  • Add .default to azure credential provider scope URL (#20651)
  • Fix join_asof panicking for invalid tolerance input (#20643)
  • Incorrect flag check on is_elementwise (#20646)
  • Don't panic but set null type if type is unknown (#20647)
  • Fix performance regression for DataFrame serialization/pickling (#20641)
  • Fix Int128 dtype serialization (#20629)
  • Ensure read_excel and read_ods support reading from raw bytes for all engines (#20636)
  • Ensure that SQL LIKE and ILIKE operators support multi-line matches (#20613)
  • Properly broadcast in sort_by (#20434)
  • Properly load nested Parquet Statistics (#20610)
  • AWS environment config was not loaded when credential provider was used (#20611)
  • Fix order observability of group-by-dyn (#20615)
  • Soundness when loading Parquet string statistics (#20585)
  • Fix error filtering after with_columns() on unit height LazyFrame (#20584)
  • Propagate tenant_id to CredentialProviderAzure if given (#20583)
  • Restore symbols on Apple by bumping nightly version (#20563)
  • Fix type annotation of str.strip_chars_* methods (#20565)
  • Fix variable name in error message for "unsupported data type" in rolling and upsampling operations (#20553)

📖 Documentation

  • Add more information for cross joins (#20753)
  • Fix typo in sql functions (cosinus -> cosine) (#20676)
  • Add links to read_excel "engine_options" and "read_options" docstring (#20661)
  • Fix small typo in plugins (polars-dt -> polars-st) (#20657)
  • Add polars-h3 and polars-st to plugin list (#20653)
  • Add docs reference for Field (#20625)
  • Update DataFrame join examples (#20587)
  • Miscellaneous minor updates/fixes (#20573)
  • Update "group_by_rolling" (deprecated) to "rolling" in user guide (#20548)

📦 Build system

  • Update to official release of PyO3 0.23.4 (#20683)
  • Officially support Python 3.13 (#20549)

🛠️ Other improvements

  • Fix remote benchmark script (#20755)
  • Fix tests (#20745)
  • Simplify hive predicate handling in NEW_MULTIFILE (#20730)
  • Add tests for various open issues (#20720)
  • Fixes an Excel test following new fastexcel release (#20703)
  • Add tests for various open issues that have been fixed (#20680)
  • Don't include debug symbols in benchmark run (#20571)
  • Implement CSV, IPC and NDJson in the MultiScanExec node (#20648)
  • Don't rely on argument order of optimization_toggle (#20622)
  • Fix Python deps installation in remote-benchmark workflow (#20619)
  • Fix flaky categorical test (#20591)
  • Bump multiversion from 0.7 to 0.8 (#20543)
  • Remove unused nested function in LazyFrame.fill_null (#20558)
  • Improve bin size info (#20551)

Thank you to all our contributors for making this release possible!
@Jesse-Bakker, @MarcoGorelli, @MoizesCBF, @SamuelAllain, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @eitsupi, @etiennebacher, @itamarst, @jqnatividad, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @ritchie46 and @stinodego

Don't miss a new polars release

NewReleases is sending notifications on new releases.