pola-rs/polars py-1.13.0 on GitHub

🚀 Performance improvements

Improve DataFrame.sort().limit/top_k performance (#19731)
Improve cloud scan performance (#19728)
Fix quadratic 'with_columns' behavior (#19701)
Improve hive partition pruning with datetime predicates from SQL (#19680)
Allow for arbitrary skips in Parquet Dictionary Decoding (#19649)
Reorder conditions in is_leap_year (#19602)
Rechunk in DataFrame.rows if needed (#19628)
Dispatch Parquet Primitive PLAIN decoding to faster kernels when possible (#19611)
Use faster iteration in 'starts_with'/'ends_with' (#19583)
Branchless Parquet Prefiltering (#19190)
Reduce size of IdxVec from 24 -> 16 bytes (#19550)

✨ Enhancements

Try to support native SAP HANA driver via read_database (#19733)
Implement max/min methods for dtypes (#19494)
Improve n_chunks typing (#19727)
Improve hive partition pruning with datetime predicates from SQL (#19680)
Identify inefficient use of Python string removeprefix, removesuffix, and zfill in map_elements (#19672)
Automatically use boto3 / google-auth if installed when scanning cloud (#19677)
Identify inefficient use of Python string replace in map_elements (#19668)
Parallel IPC sink for the new streaming engine (#19622)
Add SQL support for RIGHT JOIN, fix an issue with wildcard aliasing (#19626)
Add show_graph to display a GraphViz plot for expressions (#19365)
Streamline use of predicates connected by & with IEJoin (join_where) (#19552)
Support use of is_between range predicate with IEJoin operations (join_where) (#19547)

🐞 Bug fixes

Use cls for to_python (#19726)
Fix validation for inner and left join when join_nulls unflaged (#19698)
SQL ELSE clause should be implicitly NULL when omitted (#19714)
Improve n_chunks typing (#19727)
Ensure NoDataError raised consistently between engines for Excel reads (#19712)
In group_by_dynamic, period and every were getting applied in reverse order for the window upper boundary (#19706)
Only allow list.to_struct to be elementwise when width is fixed (#19688)
Make Array arithmetic ops fully elementwise (#19682)
Address inconsistency with use of Python types in frame-level cast (#19657)
Update line-splitting logic in batched CSV reader (#19508)
Fix incorrect lazy schema for explode() in agg() (#19629)
Fix fill null types (#19656)
Fix filter incorrectly pushed past struct unnest when unnested column name matches upper column name (#19638)
Fix typing for SchemaDefinition (#19647)
Ensure mean_horizontal raises on non-numeric input (#19648)
Reorder conditions in is_leap_year (#19602)
Copy height in .vstack() for empty dataframes (#19641) (#19642)
Correct wildcard and input expansion for some more functions (#19588)
Allow .struct.with_fields inside list.eval (#19617)
Sortedness was incorrectly being preserved in dt.offset_by when offsetting by non-constant durations in the timezone-naive case (#19616)
Fix incorrect scan_parquet().with_row_index() with non-zero slice or with streaming collect (#19609)
Fix mask and validity confusion in Parquet String decoding (#19614)
Parquet decoding of nested dictionary values (#19605)
Do not attempt to load default credentials when credential_provider is given (#19589)
Fix gather len in group-by state (#19586)
Added input validation for explode operation in the array namespace (#19163)
Improve error message (#19546)
Fix predicate pushdown into inequality joins (#19582)
Correct categorical namespace error message (#19558)
Fix performance regression for sort/gather on list/array columns (#19564)
Ignore quoted newlines when skipping lines in CSV (#19543)
Incorrect gather for FixedSizeList with outer validity but no inner validities (#19489)
Make Duration parsing fallible and not panic (#19490)

📖 Documentation

Revise and rework user-guide/expressions (#19360)
Update Excel page of user guide to refer to fastexcel as the default engine (#19691)
Alter examples for round_sig_figs to make behaviour clearer (#19667)
Assorted fixes to Rust API docs (#19664)
Improve replace and replace_all docstring explanation of the "$" character with reference to capture groups (vs use as a literal) (#19529)
Add credential provider section and examples to user guide (#19487)
Fix various instances of repeated words in docs and comments (#19516)

📦 Build system

Bump Rust toolchain to nightly-2024-10-28 (#19492)

🛠️ Other improvements

Remove unused Excel code (#19710)
Use Column for the {try,}_apply_columns{_par,} functions on DataFrame (#19683)
Remove more @scalar-opt (#19666)
Move Series bitops to std::ops::Bit... (#19673)
Mark test_parquet.py test_dict_slices as slow (#19675)
Get Column into polars-expr (#19660)
Streamline internal SQL join condition processing (#19658)
Factor out logic for re-use by new streaming CSV source (#19637)
Configure grouped Dependabot updates (#19604)
Fix PyO3 error in CI (#19545)
Update nightly compiler version (#19590)
Added input validation for explode operation in the array namespace (#19163)
Fix lint (#19584)
Add a Column::Partitioned variant (#19557)
Move to fast-float2 (#19578)
Only run remote bench on rust changes (#19581)
Remove unsafe *_release functions (#19554)
Fix test_rolling_by_integer not using parameterized dtype (#19555)
Add mindebug-dev rust profile (#19524)
Add CI step to process benchmark results (#19530)
Add CI benchmark on merge (#19518)
Skip client check with env var (#19517)
Improve makefile build commands (#19498)

Thank you to all our contributors for making this release possible!
@3tilley, @HansBambel, @MarcoGorelli, @alexander-beedie, @barak1412, @braaannigan, @cmdlineluser, @coastalwhite, @corwinjoy, @dependabot, @dependabot[bot], @eitsupi, @janpipek, @jqnatividad, @letkemann, @max-muoto, @nameexhaustion, @orlp, @ritchie46, @rodrigogiraoserrao, @siddharth-vi, @stinodego and @wence-

pola-rs/polars py-1.13.0 Python Polars 1.13.0 on GitHub

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

pola-rs/polars py-1.13.0
Python Polars 1.13.0

on GitHub