🚀 Performance improvements
- Reduce sharing in stringview arrays in new-streaming equijoin (#21129)
- Implement native Expr.count() on new-streaming (#21126)
- Speed up list operations that use amortized_iter() (#20964)
- Use Cow as output for rechunk and add rechunk_mut (#21116)
- Reduce arrow slice mmap overhead (#21113)
- Reduce conversion cost in chunked string gather (#21112)
- Enable prefiltered by default for new streaming (#21109)
- Enable parquet column expressions for streaming (#21101)
- Deduplicate buffers again in stringview concat kernel (#21098)
- Add dedicated concatenate kernels (#21080)
- Rechunk only once during join probe gather (#21072)
- Micro-optimise internal
DataFrame
height and width checks (#21071) - Speed up from_pandas when converting frame with multi-index columns (#21063)
- Change default memory prefetch to MADV_WILLNEED (#21056)
- Remove cast to boolean after comparison in optimizer (#21022)
- Split last rowgroup among all threads in new-streaming parquet reader (#21027)
- Recombine into larger morsels in new-streaming join (#21008)
- Improve
list.min
andlist.max
performance for logical types (#20972) - Ensure count query select minimal columns (#20923)
✨ Enhancements
- Add projection pushdown to new streaming multiscan (#21139)
- Implement join on struct dtype (#21093)
- Use unique temporary directory path per user and restrict permissions (#21125)
- Enable ingest of objects supporting the PyCapsule interface via
from_arrow
(#21128) - Enable new streaming multiscan for CSV (#21124)
- Environment
POLARS_MAX_CONCURRENT_SCANS
in multiscan for new streaming (#21127) - Ensure AWS credential provider sources AWS_PROFILE from environment after deserialization (#21121)
- Multi/Hive scans in new streaming engine (#21011)
- Add
linear_spaces
(#20941) - IO plugins suppport lazy schema (#21079)
- Add
write_table()
function to Unity catalog client (#21089) - Add
is_object
method to PolarsDataType
class (#21074) - Implement
merge_sorted
for binary (#21045) - Hold string cache in new streaming engine and fix row-encoding (#21039)
- Add CredentialProviderAzure parameter to accept user-instantiated azure credential classes (#21047)
- Expose unity catalog dataclasses and type aliases (#21046)
- Support max/min method for Time dtype (#19815)
- Implement a streaming merge sorted node (#20960)
- Automatically use temporary credentials API for scanning Unity catalog tables (#21020)
- Add negative slice support to new-streaming engine (#21001)
- Allow for more RG skipping by rewriting expr in planner (#20828)
- Rename catalog
schema
tonamespace
(#20993) - Add functionality to create and delete catalogs, tables and schemas to Unity catalog client (#20956)
- Allow custom JSONEncoder for the
json_normalize
function, minor speedup (#20966) - Support passing
aws_profile
instorage_options
(#20965) - Improved support for KeyboardInterrupts (#20961)
- Make the available
concat
alignment strategies more generic (#20644) - Extract timezone info from python datetimes (#20822)
- Add hint for
POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY
to error message (#20942) - Filter Parquet pages with
ParquetColumnExpr
(#20714) - Expose descending and nulls last in window order-by (#20919)
🐞 Bug fixes
- Fix
Expr.over
applying scale incorrectly for Decimal types (#21140) - Fix IO plugin predicate with failed serialization (#21136)
- Ensure
lit
handles datetimes with tzinfo that represents a fixed offset from UTC (#21003) - Correctly implement take_(opt_)chunked_unchecked for structs (#21134)
- Restore printing backtraces on panics (#21131)
- Use microseconds for Unity catalog datetime unit (#21122)
- Fix incorrect output height for SQL
SELECT COUNT(*) FROM
(#21108) - Validate/coerce types for comparisons within join_where predicates (#21049)
- Do not auto-init credential providers if credential fetch returns error (#21090)
- Fix
join_where
incorrectly dropping transformations on RHS of equality expressions (#21067) - Quadratic allocations when loading nested Parquet column metadata (#21050)
- Invalidate sortedness flag when sorting from pl.Categorical to pl.Categorical("lexical") (#21044)
- Calling
top_k
on list type panics (#21043) - Fix rolling on empty DataFrame panicking (#21042)
- Fix
set_tbl_width_chars
panicking with negative width (#20906) - Ensure
write_excel
recognises the Array dtype and writes it out as a string (#20994) - Fix
merge_sorted
producing incorrect results or panicking for some logical types (#21018) - Fix all-null list aggregations returning Null dtype (#20992)
- Ensure scalar-only with_columns are broadcasted on new-streaming (#20983)
- Improve SQL interface behaviour when
INTERVAL
is not a fixed duration (#20958) - Address minor regression for one-column DataFrame passed to
is_in
expressions (#20948) - Add Arrow Float16 conversion DataType (#20970)
- Revert length check of
patterns
instr.extract_many()
(#20953) - Add maintain order for flaky new-streaming test (#20954)
- Allow for respawning of new streaming sinks (#20934)
- Ensure Function name correctness in cse (#20929)
- Don't consume c_stream as iterable (#20899)
- Validate
pl.Array
shape argument types (#20915) - Fix
from_numpy
returning Null dtype for empty 1D numpy array (#20907) - Consider the original dtypes when selecting columns in
write_excel
function (#20909) - Handle boolean comparisons in Iceberg predicate pushdown (#18199)
- Fix
map_elements
panicking with Decimal type (#20905)
📖 Documentation
- Replace pandas
where
withmask
in Migrating -> Coming from Pandas (#21085) - Correct Arrow misconception (#21053)
- Add example showing use of
write_delta
withdelta_lake.WriterProperties
(#20746) - Add missing
shape
param toArray
docstring (#20747) - Add IO plugins to Python API reference (#21028)
- Document IO plugins (#20982)
- Ensure
set_sorted
description references single-column behavior (#20709)
📦 Build system
- Speed up CI by running a few more tests in parallel (#21057)
🛠️ Other improvements
- Add test for equality filters in Parquet (#21114)
- Add various tests for open issues (#21075)
- Upgrade packages and apply latest formatting (#21086)
- Move python dsl and builder_dsl code to dsl folder (#21077)
- Organize python related logics in polars-plan (#21070)
- Improve binary dispatch (#21061)
- Skip physical order test (#21060)
- Fix new ruff lints (#21040)
- Added test to check for the computation of list.len for null (#20938)
- Add make fix for running cargo clippy --fix (#21024)
- Add tests for resolved issues (#20999)
- Update code coverage workflow to use macos-latest runners (#20995)
- Remove unused arrow file (#20974)
- Deprecate the old streaming engine (#20949)
- Move
dt.replace
tests to dedicated file, add "typing :: typed" classifier, remove unused testing function (#20945) - Extract merge sorted IR node (#20939)
- Update copyright year (#20764)
- Move Parquet deserialization to
BitmapBuilder
(#20896) - Also publish polars-python (#20933)
- Remove verify_dict_indices_slice from main (#20928)
- Add tests for already resolved issues (#20921)
- Fix the
verify_dict_indices
codegen (#20920) - Add ProjectionContext in projection pushdown opt (#20918)
Thank you to all our contributors for making this release possible!
@FBruzzesi, @MarcoGorelli, @aberres, @alexander-beedie, @arnabanimesh, @bschoenmaeckers, @coastalwhite, @deanm0000, @dependabot[bot], @dimfeld, @eitsupi, @etiennebacher, @henryharbeck, @itamarst, @lmmx, @lukemanley, @mcrumiller, @mullimanko, @nameexhaustion, @orlp, @petrosbar, @ritchie46, @siddharth-vi, @skritsotalakis, @taureandyernv and dependabot[bot]