github pola-rs/polars py-1.7.0
Python Polars 1.7.0

latest releases: py-1.13.0, rs-0.44.2, rs-0.44.1...
2 months ago

🏆 Highlights

  • Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
  • Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)

🚀 Performance improvements

  • Back arrow arrays with SharedStorage which can have non-refcounted static slices (#18666)
  • Don't traverse file list twice for extension validation (#18620)
  • Remove cloning of ColumnChunkMetadata (#18615)
  • Add upfront partitioning in ColumnChunkMetadata (#18584)
  • Enable Parquet parallel=prefiltered for auto (#18514)
  • Change PlSmallStr impl from Arc<str> to compact_str (#18508)
  • Added optimizer rules for is_null().all() and similar expressions to use null_count() (#18359)

✨ Enhancements

  • Update BytecodeParser for upcoming Python 3.13 (#18677)
  • Add tooltip by default to charts (#18625)
  • Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
  • Support shortcut eval of common boolean filters in SQL interface "WHERE" clause (#18571)
  • Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)
  • Make expressions containing Python UDFs serializable (#18135)

🐞 Bug fixes

  • Use IO[bytes] instead of BytesIO in DataFrame.write_parquet() (#18652)
  • Scalar checks (#18627)
  • Scanning hive partitioned files where hive columns are partially included in the file (#18626)
  • Enable "polars-json/timezones" feature from "polars-io" (#18635)
  • Use Buffer<T> in ObjectSeries, fixes variety of offset bugs (#18637)
  • Properly slice validity mask on pl.Object series (#18631)
  • Raise if single argument form in replace/replace_strict is not a mapping (#18492)
  • Fix group first value after group-by slice (#18603)
  • Allow for date/datetime subclasses (e.g. pd.Timestamp, FreezeGun) in pl.lit (#18497)
  • Fix output type for list.eval in certain cases (#18570)
  • Fix map_elements for List return dtypes (#18567)
  • Check for duplicate column names in read_database cursor result, raising DuplicateError if found (#18548)
  • Do not remove double-sort if maintain_order=True (#18561)
  • Empty any_horizontal should be false, not true (#18545)
  • Fix type inference error in map_elements for List types (#18542)
  • Address incorrect align_frames result when the alignment column contains NULL values (#18521)
  • Fix advertised version in source builds (#18523)
  • Handle Parquet projection pushdown with only row index (#18520)
  • DataFrame write_database not passing down "engine_options" when using ADBC (#18451)
  • Properly raise on invalid selector expressions (#18511)
  • Wrong output column name in or and xor operations (#18512)
  • Normalize by default in Series.entropy like Expr.entropy does (#18493)
  • Various schema corrections (#18474)
  • Don't drop objects on empty buffers (#18469)
  • Expr.sign should preserve dtype (#18446)
  • Ensure assert_frame_not_equal and assert_series_not_equal raise on mismatched input types (#18402)
  • Fixed Worksheet definition in write_excel type annotations (#18452)

📖 Documentation

  • Update join_where docs to clarify behaviour (#18670)
  • Fix multiprocessing docs regarding fork method check (#18563)
  • Various docstring improvements to testing.assert_* functions (#18494)
  • Fix formula in ewm_mean_by (#18506)
  • Pre-compute plugin_path before defining plugin (#18503)
  • Add Expr.null_count to aggregations (#18459)

🛠️ Other improvements

  • Fix a bunch of tests for new-streaming (#18659)
  • Don't raise on multiple same names in ie_join (#18658)
  • Check predicates in join_where (#18648)
  • Change join_where semantics (#18640)
  • Add benchmark tests for join_where with inequalities (#18614)
  • Check number of binary comparisons in join_where predicates (#18608)
  • Raise on suffixed predicate in join_where (#18607)
  • Fix Python docs build (#18605)
  • Use streaming argument in test_parquet_slice_pushdown_non_zero_offset (#18529)
  • Fix delta test merge (#18601)
  • Alter/skip some tests for new streaming (#18574)
  • Add lower-bound pin for numba (#18555)
  • Temporarily pin NumPy in CI to address dependency resolving issue (#18544)
  • Change PlSmallStr impl from Arc<str> to compact_str (#18508)
  • Make expressions containing Python UDFs serializable (#18135)
  • Change naming to new benchmark setup (#18473)
  • Ensure physical arguments to np ufuncs are rechunked (#18471)
  • Remove a string allocation in Parquet (#18466)
  • Remove network call in hf docs (#18454)

Thank you to all our contributors for making this release possible!
@0xbe7a, @MarcoGorelli, @WbaN314, @adamreeve, @alexander-beedie, @alonme, @barak1412, @coastalwhite, @dependabot, @dependabot[bot], @eitsupi, @henryharbeck, @ion-elgreco, @krasnobaev, @megaserg, @nameexhaustion, @ohanf, @orlp, @philss, @r-brink, @ritchie46, @skellys, @squnit, @stinodego, @wence- and @yarimiz

Don't miss a new polars release

NewReleases is sending notifications on new releases.