pola-rs/polars py-1.7.0 on GitHub

🏆 Highlights

Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)

🚀 Performance improvements

Back arrow arrays with SharedStorage which can have non-refcounted static slices (#18666)
Don't traverse file list twice for extension validation (#18620)
Remove cloning of ColumnChunkMetadata (#18615)
Add upfront partitioning in ColumnChunkMetadata (#18584)
Enable Parquet parallel=prefiltered for auto (#18514)
Change PlSmallStr impl from Arc<str> to compact_str (#18508)
Added optimizer rules for is_null().all() and similar expressions to use null_count() (#18359)

✨ Enhancements

Update BytecodeParser for upcoming Python 3.13 (#18677)
Add tooltip by default to charts (#18625)
Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
Support shortcut eval of common boolean filters in SQL interface "WHERE" clause (#18571)
Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)
Make expressions containing Python UDFs serializable (#18135)

🐞 Bug fixes

Use IO[bytes] instead of BytesIO in DataFrame.write_parquet() (#18652)
Scalar checks (#18627)
Scanning hive partitioned files where hive columns are partially included in the file (#18626)
Enable "polars-json/timezones" feature from "polars-io" (#18635)
Use Buffer<T> in ObjectSeries, fixes variety of offset bugs (#18637)
Properly slice validity mask on pl.Object series (#18631)
Raise if single argument form in replace/replace_strict is not a mapping (#18492)
Fix group first value after group-by slice (#18603)
Allow for date/datetime subclasses (e.g. pd.Timestamp, FreezeGun) in pl.lit (#18497)
Fix output type for list.eval in certain cases (#18570)
Fix map_elements for List return dtypes (#18567)
Check for duplicate column names in read_database cursor result, raising DuplicateError if found (#18548)
Do not remove double-sort if maintain_order=True (#18561)
Empty any_horizontal should be false, not true (#18545)
Fix type inference error in map_elements for List types (#18542)
Address incorrect align_frames result when the alignment column contains NULL values (#18521)
Fix advertised version in source builds (#18523)
Handle Parquet projection pushdown with only row index (#18520)
DataFrame write_database not passing down "engine_options" when using ADBC (#18451)
Properly raise on invalid selector expressions (#18511)
Wrong output column name in or and xor operations (#18512)
Normalize by default in Series.entropy like Expr.entropy does (#18493)
Various schema corrections (#18474)
Don't drop objects on empty buffers (#18469)
Expr.sign should preserve dtype (#18446)
Ensure assert_frame_not_equal and assert_series_not_equal raise on mismatched input types (#18402)
Fixed Worksheet definition in write_excel type annotations (#18452)

📖 Documentation

Update join_where docs to clarify behaviour (#18670)
Fix multiprocessing docs regarding fork method check (#18563)
Various docstring improvements to testing.assert_* functions (#18494)
Fix formula in ewm_mean_by (#18506)
Pre-compute plugin_path before defining plugin (#18503)
Add Expr.null_count to aggregations (#18459)

🛠️ Other improvements

Fix a bunch of tests for new-streaming (#18659)
Don't raise on multiple same names in ie_join (#18658)
Check predicates in join_where (#18648)
Change join_where semantics (#18640)
Add benchmark tests for join_where with inequalities (#18614)
Check number of binary comparisons in join_where predicates (#18608)
Raise on suffixed predicate in join_where (#18607)
Fix Python docs build (#18605)
Use streaming argument in test_parquet_slice_pushdown_non_zero_offset (#18529)
Fix delta test merge (#18601)
Alter/skip some tests for new streaming (#18574)
Add lower-bound pin for numba (#18555)
Temporarily pin NumPy in CI to address dependency resolving issue (#18544)
Change PlSmallStr impl from Arc<str> to compact_str (#18508)
Make expressions containing Python UDFs serializable (#18135)
Change naming to new benchmark setup (#18473)
Ensure physical arguments to np ufuncs are rechunked (#18471)
Remove a string allocation in Parquet (#18466)
Remove network call in hf docs (#18454)

Thank you to all our contributors for making this release possible!
@0xbe7a, @MarcoGorelli, @WbaN314, @adamreeve, @alexander-beedie, @alonme, @barak1412, @coastalwhite, @dependabot, @dependabot[bot], @eitsupi, @henryharbeck, @ion-elgreco, @krasnobaev, @megaserg, @nameexhaustion, @ohanf, @orlp, @philss, @r-brink, @ritchie46, @skellys, @squnit, @stinodego, @wence- and @yarimiz

pola-rs/polars py-1.7.0 Python Polars 1.7.0 on GitHub

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

pola-rs/polars py-1.7.0
Python Polars 1.7.0

on GitHub