🏆 Highlights
- Add support for
IO[bytes]
andbytes
inscan_{...}
functions (#18532) - Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)
🚀 Performance improvements
- Back arrow arrays with SharedStorage which can have non-refcounted static slices (#18666)
- Don't traverse file list twice for extension validation (#18620)
- Remove cloning of
ColumnChunkMetadata
(#18615) - Add upfront partitioning in
ColumnChunkMetadata
(#18584) - Enable Parquet
parallel=prefiltered
forauto
(#18514) - Change
PlSmallStr
impl fromArc<str>
tocompact_str
(#18508) - Added optimizer rules for
is_null().all()
and similar expressions to usenull_count()
(#18359)
✨ Enhancements
- Update
BytecodeParser
for upcoming Python 3.13 (#18677) - Add tooltip by default to charts (#18625)
- Add support for
IO[bytes]
andbytes
inscan_{...}
functions (#18532) - Support shortcut eval of common boolean filters in SQL interface "WHERE" clause (#18571)
- Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)
- Make expressions containing Python UDFs serializable (#18135)
🐞 Bug fixes
- Use IO[bytes] instead of BytesIO in
DataFrame.write_parquet()
(#18652) - Scalar checks (#18627)
- Scanning hive partitioned files where hive columns are partially included in the file (#18626)
- Enable "polars-json/timezones" feature from "polars-io" (#18635)
- Use Buffer<T> in ObjectSeries, fixes variety of offset bugs (#18637)
- Properly slice validity mask on pl.Object series (#18631)
- Raise if single argument form in
replace
/replace_strict
is not a mapping (#18492) - Fix group first value after group-by slice (#18603)
- Allow for date/datetime subclasses (e.g. pd.Timestamp, FreezeGun) in pl.lit (#18497)
- Fix output type for
list.eval
in certain cases (#18570) - Fix
map_elements
for List return dtypes (#18567) - Check for duplicate column names in
read_database
cursor result, raisingDuplicateError
if found (#18548) - Do not remove double-sort if
maintain_order=True
(#18561) - Empty any_horizontal should be false, not true (#18545)
- Fix type inference error in
map_elements
for List types (#18542) - Address incorrect
align_frames
result when the alignment column contains NULL values (#18521) - Fix advertised version in source builds (#18523)
- Handle Parquet projection pushdown with only row index (#18520)
- DataFrame
write_database
not passing down "engine_options" when using ADBC (#18451) - Properly raise on invalid selector expressions (#18511)
- Wrong output column name in
or
andxor
operations (#18512) - Normalize by default in Series.entropy like Expr.entropy does (#18493)
- Various schema corrections (#18474)
- Don't drop objects on empty buffers (#18469)
- Expr.sign should preserve dtype (#18446)
- Ensure
assert_frame_not_equal
andassert_series_not_equal
raise on mismatched input types (#18402) - Fixed
Worksheet
definition inwrite_excel
type annotations (#18452)
📖 Documentation
- Update join_where docs to clarify behaviour (#18670)
- Fix multiprocessing docs regarding fork method check (#18563)
- Various docstring improvements to
testing.assert_*
functions (#18494) - Fix formula in ewm_mean_by (#18506)
- Pre-compute plugin_path before defining plugin (#18503)
- Add Expr.null_count to aggregations (#18459)
🛠️ Other improvements
- Fix a bunch of tests for new-streaming (#18659)
- Don't raise on multiple same names in ie_join (#18658)
- Check predicates in join_where (#18648)
- Change join_where semantics (#18640)
- Add benchmark tests for join_where with inequalities (#18614)
- Check number of binary comparisons in join_where predicates (#18608)
- Raise on suffixed predicate in join_where (#18607)
- Fix Python docs build (#18605)
- Use
streaming
argument intest_parquet_slice_pushdown_non_zero_offset
(#18529) - Fix delta test merge (#18601)
- Alter/skip some tests for new streaming (#18574)
- Add lower-bound pin for numba (#18555)
- Temporarily pin NumPy in CI to address dependency resolving issue (#18544)
- Change
PlSmallStr
impl fromArc<str>
tocompact_str
(#18508) - Make expressions containing Python UDFs serializable (#18135)
- Change naming to new benchmark setup (#18473)
- Ensure physical arguments to np ufuncs are rechunked (#18471)
- Remove a string allocation in Parquet (#18466)
- Remove network call in hf docs (#18454)
Thank you to all our contributors for making this release possible!
@0xbe7a, @MarcoGorelli, @WbaN314, @adamreeve, @alexander-beedie, @alonme, @barak1412, @coastalwhite, @dependabot, @dependabot[bot], @eitsupi, @henryharbeck, @ion-elgreco, @krasnobaev, @megaserg, @nameexhaustion, @ohanf, @orlp, @philss, @r-brink, @ritchie46, @skellys, @squnit, @stinodego, @wence- and @yarimiz