pola-rs/polars py-1.22.0 on GitHub

🚀 Performance improvements

Reduce sharing in stringview arrays in new-streaming equijoin (#21129)
Implement native Expr.count() on new-streaming (#21126)
Speed up list operations that use amortized_iter() (#20964)
Use Cow as output for rechunk and add rechunk_mut (#21116)
Reduce arrow slice mmap overhead (#21113)
Reduce conversion cost in chunked string gather (#21112)
Enable prefiltered by default for new streaming (#21109)
Enable parquet column expressions for streaming (#21101)
Deduplicate buffers again in stringview concat kernel (#21098)
Add dedicated concatenate kernels (#21080)
Rechunk only once during join probe gather (#21072)
Micro-optimise internal DataFrame height and width checks (#21071)
Speed up from_pandas when converting frame with multi-index columns (#21063)
Change default memory prefetch to MADV_WILLNEED (#21056)
Remove cast to boolean after comparison in optimizer (#21022)
Split last rowgroup among all threads in new-streaming parquet reader (#21027)
Recombine into larger morsels in new-streaming join (#21008)
Improve list.min and list.max performance for logical types (#20972)
Ensure count query select minimal columns (#20923)

✨ Enhancements

Add projection pushdown to new streaming multiscan (#21139)
Implement join on struct dtype (#21093)
Use unique temporary directory path per user and restrict permissions (#21125)
Enable ingest of objects supporting the PyCapsule interface via from_arrow (#21128)
Enable new streaming multiscan for CSV (#21124)
Environment POLARS_MAX_CONCURRENT_SCANS in multiscan for new streaming (#21127)
Ensure AWS credential provider sources AWS_PROFILE from environment after deserialization (#21121)
Multi/Hive scans in new streaming engine (#21011)
Add linear_spaces (#20941)
IO plugins suppport lazy schema (#21079)
Add write_table() function to Unity catalog client (#21089)
Add is_object method to Polars DataType class (#21074)
Implement merge_sorted for binary (#21045)
Hold string cache in new streaming engine and fix row-encoding (#21039)
Add CredentialProviderAzure parameter to accept user-instantiated azure credential classes (#21047)
Expose unity catalog dataclasses and type aliases (#21046)
Support max/min method for Time dtype (#19815)
Implement a streaming merge sorted node (#20960)
Automatically use temporary credentials API for scanning Unity catalog tables (#21020)
Add negative slice support to new-streaming engine (#21001)
Allow for more RG skipping by rewriting expr in planner (#20828)
Rename catalog schema to namespace (#20993)
Add functionality to create and delete catalogs, tables and schemas to Unity catalog client (#20956)
Allow custom JSONEncoder for the json_normalize function, minor speedup (#20966)
Support passing aws_profile in storage_options (#20965)
Improved support for KeyboardInterrupts (#20961)
Make the available concat alignment strategies more generic (#20644)
Extract timezone info from python datetimes (#20822)
Add hint for POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY to error message (#20942)
Filter Parquet pages with ParquetColumnExpr (#20714)
Expose descending and nulls last in window order-by (#20919)

🐞 Bug fixes

Fix Expr.over applying scale incorrectly for Decimal types (#21140)
Fix IO plugin predicate with failed serialization (#21136)
Ensure lit handles datetimes with tzinfo that represents a fixed offset from UTC (#21003)
Correctly implement take_(opt_)chunked_unchecked for structs (#21134)
Restore printing backtraces on panics (#21131)
Use microseconds for Unity catalog datetime unit (#21122)
Fix incorrect output height for SQL SELECT COUNT(*) FROM (#21108)
Validate/coerce types for comparisons within join_where predicates (#21049)
Do not auto-init credential providers if credential fetch returns error (#21090)
Fix join_where incorrectly dropping transformations on RHS of equality expressions (#21067)
Quadratic allocations when loading nested Parquet column metadata (#21050)
Invalidate sortedness flag when sorting from pl.Categorical to pl.Categorical("lexical") (#21044)
Calling top_k on list type panics (#21043)
Fix rolling on empty DataFrame panicking (#21042)
Fix set_tbl_width_chars panicking with negative width (#20906)
Ensure write_excel recognises the Array dtype and writes it out as a string (#20994)
Fix merge_sorted producing incorrect results or panicking for some logical types (#21018)
Fix all-null list aggregations returning Null dtype (#20992)
Ensure scalar-only with_columns are broadcasted on new-streaming (#20983)
Improve SQL interface behaviour when INTERVAL is not a fixed duration (#20958)
Address minor regression for one-column DataFrame passed to is_in expressions (#20948)
Add Arrow Float16 conversion DataType (#20970)
Revert length check of patterns in str.extract_many() (#20953)
Add maintain order for flaky new-streaming test (#20954)
Allow for respawning of new streaming sinks (#20934)
Ensure Function name correctness in cse (#20929)
Don't consume c_stream as iterable (#20899)
Validate pl.Array shape argument types (#20915)
Fix from_numpy returning Null dtype for empty 1D numpy array (#20907)
Consider the original dtypes when selecting columns in write_excel function (#20909)
Handle boolean comparisons in Iceberg predicate pushdown (#18199)
Fix map_elements panicking with Decimal type (#20905)

📖 Documentation

Replace pandas where with mask in Migrating -> Coming from Pandas (#21085)
Correct Arrow misconception (#21053)
Add example showing use of write_delta with delta_lake.WriterProperties (#20746)
Add missing shape param to Array docstring (#20747)
Add IO plugins to Python API reference (#21028)
Document IO plugins (#20982)
Ensure set_sorted description references single-column behavior (#20709)

📦 Build system

Speed up CI by running a few more tests in parallel (#21057)

🛠️ Other improvements

Add test for equality filters in Parquet (#21114)
Add various tests for open issues (#21075)
Upgrade packages and apply latest formatting (#21086)
Move python dsl and builder_dsl code to dsl folder (#21077)
Organize python related logics in polars-plan (#21070)
Improve binary dispatch (#21061)
Skip physical order test (#21060)
Fix new ruff lints (#21040)
Added test to check for the computation of list.len for null (#20938)
Add make fix for running cargo clippy --fix (#21024)
Add tests for resolved issues (#20999)
Update code coverage workflow to use macos-latest runners (#20995)
Remove unused arrow file (#20974)
Deprecate the old streaming engine (#20949)
Move dt.replace tests to dedicated file, add "typing :: typed" classifier, remove unused testing function (#20945)
Extract merge sorted IR node (#20939)
Update copyright year (#20764)
Move Parquet deserialization to BitmapBuilder (#20896)
Also publish polars-python (#20933)
Remove verify_dict_indices_slice from main (#20928)
Add tests for already resolved issues (#20921)
Fix the verify_dict_indices codegen (#20920)
Add ProjectionContext in projection pushdown opt (#20918)

Thank you to all our contributors for making this release possible!
@FBruzzesi, @MarcoGorelli, @aberres, @alexander-beedie, @arnabanimesh, @bschoenmaeckers, @coastalwhite, @deanm0000, @dependabot[bot], @dimfeld, @eitsupi, @etiennebacher, @henryharbeck, @itamarst, @lmmx, @lukemanley, @mcrumiller, @mullimanko, @nameexhaustion, @orlp, @petrosbar, @ritchie46, @siddharth-vi, @skritsotalakis, @taureandyernv and dependabot[bot]

pola-rs/polars py-1.22.0 Python Polars 1.22.0 on GitHub

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

pola-rs/polars py-1.22.0
Python Polars 1.22.0

on GitHub