github pola-rs/polars py-1.22.0
Python Polars 1.22.0

one day ago

🚀 Performance improvements

  • Reduce sharing in stringview arrays in new-streaming equijoin (#21129)
  • Implement native Expr.count() on new-streaming (#21126)
  • Speed up list operations that use amortized_iter() (#20964)
  • Use Cow as output for rechunk and add rechunk_mut (#21116)
  • Reduce arrow slice mmap overhead (#21113)
  • Reduce conversion cost in chunked string gather (#21112)
  • Enable prefiltered by default for new streaming (#21109)
  • Enable parquet column expressions for streaming (#21101)
  • Deduplicate buffers again in stringview concat kernel (#21098)
  • Add dedicated concatenate kernels (#21080)
  • Rechunk only once during join probe gather (#21072)
  • Micro-optimise internal DataFrame height and width checks (#21071)
  • Speed up from_pandas when converting frame with multi-index columns (#21063)
  • Change default memory prefetch to MADV_WILLNEED (#21056)
  • Remove cast to boolean after comparison in optimizer (#21022)
  • Split last rowgroup among all threads in new-streaming parquet reader (#21027)
  • Recombine into larger morsels in new-streaming join (#21008)
  • Improve list.min and list.max performance for logical types (#20972)
  • Ensure count query select minimal columns (#20923)

✨ Enhancements

  • Add projection pushdown to new streaming multiscan (#21139)
  • Implement join on struct dtype (#21093)
  • Use unique temporary directory path per user and restrict permissions (#21125)
  • Enable ingest of objects supporting the PyCapsule interface via from_arrow (#21128)
  • Enable new streaming multiscan for CSV (#21124)
  • Environment POLARS_MAX_CONCURRENT_SCANS in multiscan for new streaming (#21127)
  • Ensure AWS credential provider sources AWS_PROFILE from environment after deserialization (#21121)
  • Multi/Hive scans in new streaming engine (#21011)
  • Add linear_spaces (#20941)
  • IO plugins suppport lazy schema (#21079)
  • Add write_table() function to Unity catalog client (#21089)
  • Add is_object method to Polars DataType class (#21074)
  • Implement merge_sorted for binary (#21045)
  • Hold string cache in new streaming engine and fix row-encoding (#21039)
  • Add CredentialProviderAzure parameter to accept user-instantiated azure credential classes (#21047)
  • Expose unity catalog dataclasses and type aliases (#21046)
  • Support max/min method for Time dtype (#19815)
  • Implement a streaming merge sorted node (#20960)
  • Automatically use temporary credentials API for scanning Unity catalog tables (#21020)
  • Add negative slice support to new-streaming engine (#21001)
  • Allow for more RG skipping by rewriting expr in planner (#20828)
  • Rename catalog schema to namespace (#20993)
  • Add functionality to create and delete catalogs, tables and schemas to Unity catalog client (#20956)
  • Allow custom JSONEncoder for the json_normalize function, minor speedup (#20966)
  • Support passing aws_profile in storage_options (#20965)
  • Improved support for KeyboardInterrupts (#20961)
  • Make the available concat alignment strategies more generic (#20644)
  • Extract timezone info from python datetimes (#20822)
  • Add hint for POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY to error message (#20942)
  • Filter Parquet pages with ParquetColumnExpr (#20714)
  • Expose descending and nulls last in window order-by (#20919)

🐞 Bug fixes

  • Fix Expr.over applying scale incorrectly for Decimal types (#21140)
  • Fix IO plugin predicate with failed serialization (#21136)
  • Ensure lit handles datetimes with tzinfo that represents a fixed offset from UTC (#21003)
  • Correctly implement take_(opt_)chunked_unchecked for structs (#21134)
  • Restore printing backtraces on panics (#21131)
  • Use microseconds for Unity catalog datetime unit (#21122)
  • Fix incorrect output height for SQL SELECT COUNT(*) FROM (#21108)
  • Validate/coerce types for comparisons within join_where predicates (#21049)
  • Do not auto-init credential providers if credential fetch returns error (#21090)
  • Fix join_where incorrectly dropping transformations on RHS of equality expressions (#21067)
  • Quadratic allocations when loading nested Parquet column metadata (#21050)
  • Invalidate sortedness flag when sorting from pl.Categorical to pl.Categorical("lexical") (#21044)
  • Calling top_k on list type panics (#21043)
  • Fix rolling on empty DataFrame panicking (#21042)
  • Fix set_tbl_width_chars panicking with negative width (#20906)
  • Ensure write_excel recognises the Array dtype and writes it out as a string (#20994)
  • Fix merge_sorted producing incorrect results or panicking for some logical types (#21018)
  • Fix all-null list aggregations returning Null dtype (#20992)
  • Ensure scalar-only with_columns are broadcasted on new-streaming (#20983)
  • Improve SQL interface behaviour when INTERVAL is not a fixed duration (#20958)
  • Address minor regression for one-column DataFrame passed to is_in expressions (#20948)
  • Add Arrow Float16 conversion DataType (#20970)
  • Revert length check of patterns in str.extract_many() (#20953)
  • Add maintain order for flaky new-streaming test (#20954)
  • Allow for respawning of new streaming sinks (#20934)
  • Ensure Function name correctness in cse (#20929)
  • Don't consume c_stream as iterable (#20899)
  • Validate pl.Array shape argument types (#20915)
  • Fix from_numpy returning Null dtype for empty 1D numpy array (#20907)
  • Consider the original dtypes when selecting columns in write_excel function (#20909)
  • Handle boolean comparisons in Iceberg predicate pushdown (#18199)
  • Fix map_elements panicking with Decimal type (#20905)

📖 Documentation

  • Replace pandas where with mask in Migrating -> Coming from Pandas (#21085)
  • Correct Arrow misconception (#21053)
  • Add example showing use of write_delta with delta_lake.WriterProperties (#20746)
  • Add missing shape param to Array docstring (#20747)
  • Add IO plugins to Python API reference (#21028)
  • Document IO plugins (#20982)
  • Ensure set_sorted description references single-column behavior (#20709)

📦 Build system

  • Speed up CI by running a few more tests in parallel (#21057)

🛠️ Other improvements

  • Add test for equality filters in Parquet (#21114)
  • Add various tests for open issues (#21075)
  • Upgrade packages and apply latest formatting (#21086)
  • Move python dsl and builder_dsl code to dsl folder (#21077)
  • Organize python related logics in polars-plan (#21070)
  • Improve binary dispatch (#21061)
  • Skip physical order test (#21060)
  • Fix new ruff lints (#21040)
  • Added test to check for the computation of list.len for null (#20938)
  • Add make fix for running cargo clippy --fix (#21024)
  • Add tests for resolved issues (#20999)
  • Update code coverage workflow to use macos-latest runners (#20995)
  • Remove unused arrow file (#20974)
  • Deprecate the old streaming engine (#20949)
  • Move dt.replace tests to dedicated file, add "typing :: typed" classifier, remove unused testing function (#20945)
  • Extract merge sorted IR node (#20939)
  • Update copyright year (#20764)
  • Move Parquet deserialization to BitmapBuilder (#20896)
  • Also publish polars-python (#20933)
  • Remove verify_dict_indices_slice from main (#20928)
  • Add tests for already resolved issues (#20921)
  • Fix the verify_dict_indices codegen (#20920)
  • Add ProjectionContext in projection pushdown opt (#20918)

Thank you to all our contributors for making this release possible!
@FBruzzesi, @MarcoGorelli, @aberres, @alexander-beedie, @arnabanimesh, @bschoenmaeckers, @coastalwhite, @deanm0000, @dependabot[bot], @dimfeld, @eitsupi, @etiennebacher, @henryharbeck, @itamarst, @lmmx, @lukemanley, @mcrumiller, @mullimanko, @nameexhaustion, @orlp, @petrosbar, @ritchie46, @siddharth-vi, @skritsotalakis, @taureandyernv and dependabot[bot]

Don't miss a new polars release

NewReleases is sending notifications on new releases.