github pola-rs/polars py-1.10.0
Python Polars 1.10.0

19 hours ago

🚀 Performance improvements

  • Add/fix unordered row decode, change unordered format (#19284)
  • Fast decision for Parquet dictionary encoding (#19256)
  • Make date_range / datetime_range ~10x faster for constant durations (#19216)
  • Batch utf8-validation in csv 18% / 25% on 1.9.0 (#19124)
  • Use two-pass algorithm for csv to ensure correctness and SIMDize more ~17% (#19088)

✨ Enhancements

  • Add SQL support for bit_count and bitwise &, |, and xor operators (#19114)
  • Add credential provider utility classes for AWS, GCP (#19297)
  • Support decoding Float16 in Parquet (#19278)
  • Experimental credential_provider argument for scan_parquet (#19271)
  • Allow DeltaTable input to scan_delta and read_delta (#19229)
  • New quantile interpolation method & QUANTILE_DISC function in SQL (#19139)
  • Conserve Parquet SortingColumns for ints (#19251)
  • Low level flight interface (#19239)
  • Improved list arithmetic support (#19162)
  • Add Expr.struct.unnest() as alias for Expr.struct.field("*") (#19212)
  • Add 'drop_empty_rows' parameter for read_ods (#19202)
  • Add 'drop_empty_rows' parameter for read_excel (#18253)
  • Expose LTS CPU in show_versions() (#19193)
  • Check Python version when deserializing UDFs (#19175)
  • Raise an error when users try to use Polars API in a fork()-without-execve() child (#19149)
  • Quantile function in SQL (#18047)
  • Improve scalar strict message (#19117)
  • Add Series::{first, last, approx_n_unique} (#19093)
  • Allow for rolling_*_by to use index count as window (#19071)
  • Delay deserialization of python function until physical plan (#19069)
  • Add cum(_min/_max) for pl.Boolean (#19061)

🐞 Bug fixes

  • Don't produce duplicate column names in Series.to_dummies (#19326)
  • Use of HAVING outside of GROUP BY should raise a suitable SQLSyntaxError (#19320)
  • More accurate from_dicts typing/signature (#19322)
  • Fix empty array gather (#19316)
  • Merge categorical rev-map in unpivot (#19313)
  • DataFrame descending sorting by single list element (#19233)
  • Fix cse union schema (#19305)
  • Correctly load Parquet statistics for f16 (#19296)
  • Error on invalid query (#19303)
  • Fix enum scalar output (#19301)
  • Fix list gather invalid fast path (#19299)
  • Fix quoting style of decimal csv output (#19298)
  • Don't vertically parallelize literal select (#19295)
  • Fix struct reshape fast path (#19294)
  • Also split on forward slashes during hive path inference on Windows (#19282)
  • Don't cse as_struct (#19280)
  • Only apply string parsing to String dtype (#19222)
  • Make the SQLAlchemy connection check more robust (#19270)
  • Ensure that read_database takes advantage of Arrow return from a duckdb_engine connection when using a SQLAlchemy Selectable (#19255)
  • Compilation error missing use JsonLineReader (#19244)
  • Don't remember Parquet statistics if filtered (#19248)
  • Do not check dtypes of non-projected columns for parquet (#19254)
  • Parquet predicate pushdown for lit(_) != (#19246)
  • Use all chunks in Series from arrow struct (#19218)
  • Don't trigger row limit in array construction (#19215)
  • Fix struct literals (#19214)
  • Plotting was not interacting well with Altair schema wrappers (#19213)
  • Fixing infer_schema for DataType::Null (#19201)
  • Migrate to PyO3 0.22 and released verion of rust-numpy crate (#19199)
  • Add 'drop_empty_rows' parameter for read_excel (#18253)
  • Don't unwrap() expansion (#19196)
  • Properly handle non-nullable nested Parquet (#19192)
  • Fix invalid list collection in expression engine (#19191)
  • Fix use of "hidden_columns" parameter in write_excel (#19029)
  • Implement to_arrow functionality properly for Arrays (#19077)
  • Remove incorrect warning when using an IO[bytes] instance (#19154)
  • Don't fail test if e.g. jax has been used first, since jax installs a fork handler that warns (#19178)
  • Fix incorrect (eq|ne)_missing on List/Array types (#19155)
  • Properly broadcast Struct when then validity (#19148)
  • Allow partial name overlap in join_where resolution (#19128)
  • Fix floordiv / modulo with scalar 0 on LHS (#19143)
  • Ensure aligned chunks in OOC sort (#19118)
  • Recursively align when converting to ArrowArray (#19097)
  • Raise on invalid shape of shape 1, empty combination (#19113)
  • Use two-pass algorithm for csv to ensure correctness and SIMDize more ~17% (#19088)
  • Allow converting DatetimeOwned to ChunkedArray (#19094)
  • Throw proper error for empty char params in scan_csv (#19100)
  • Ensure parquet schema arg is propagated to IR (#19084)
  • Only rewrite numeric ineq joins (#19083)
  • Check validity of columns of keys/aggs in dsl->ir (#19082)
  • Bitwise aggregations should ignore null values (#19067)
  • Remove failing datetime subclass test (#19068)
  • Don't ignore multiple columns in LazyFrame.unnest (#19035)

📖 Documentation

  • Remove ecosystem viz section since there is one in misc already (#18408)
  • Fix typo in custom expressions docs (#19292)
  • Add SQL docs for new QUANTILE_CONT and QUANTILE_DISC functions (#19272)
  • Add marimo to ecosystem.md (#19250)
  • Improve DataFrame.write_database docstring (#19189)
  • Link to main website from banner (#19177)
  • Fix example of as_struct (#19116)
  • Clarify difference between bitwise/logical ops (#19180)
  • Add non-equi joins to, and revise, joins docs page (#19127)
  • Add Series.first,last,approx_n_unique to docs (#19146)
  • Annotate Config kwarg options (#18988)
  • Revise and improve 'Concepts' section (#19087)

🛠️ Other improvements

  • Add/fix unordered row decode, change unordered format (#19284)
  • Move from parquet-format-safe to polars-parquet-format (#19275)
  • Skip flaky test (#19242)
  • Add more tests for list arithmetic (#19225)
  • Remove unused IPC async (#19223)
  • Make get_list_builder infallible (#19217)
  • Migrate to PyO3 0.22 and released verion of rust-numpy crate (#19199)
  • Make expression output type known (#19195)
  • Revert "feat(python): Raise an error when users try to use Polars API in a fork()-without-execve() child (#19149) (#19188)
  • Zero-Field Structs and DataFrame with Height Property (#19123)
  • Make pl.repeat part of the IR (#19152)
  • Expose IEJoin IR node to python (#19104)
  • Clean remove_prefix since python3.9 is now the minimum Python (#19070)
  • Add new streaming engine to CI (#19051)

Thank you to all our contributors for making this release possible!
@Bidek56, @MarcoGorelli, @Rashik-raj, @adamreeve, @alexander-beedie, @alonme, @balbok0, @coastalwhite, @deanm0000, @dependabot, @dependabot[bot], @eitsupi, @etrotta, @itamarst, @jbutterwick, @joelostblom, @kenkoooo, @khalidmammadov, @laurentS, @mcrumiller, @mscolnick, @nameexhaustion, @orlp, @pomo-mondreganto, @ritchie46, @rodrigogiraoserrao, @siddharth-vi, @stinodego, @sunadase and @wence-

Don't miss a new polars release

NewReleases is sending notifications on new releases.