🚀 Performance improvements
- Toggle projection pushdown for eager rolling (#21405)
- Fix pathologic
rolling + group-by
performance and memory explosion (#21403) - Add sampling to new-streaming equi join to decide between build/probe side (#21197)
✨ Enhancements
- Implement i128 -> str cast (#21411)
- Connect polars-cloud (#21387)
- Version DSL (#21383)
- Make user facing binary formats mostly self describing (#21380)
- Filter hive files using predicates in new streaming (#21372)
- Add negative slicing to new streaming multiscan (#21219)
- Allow iterable of frames as input to
align_frames
(#21209) - Implement sorted flags for struct series (#21290)
- Support reading arrow Map type from Delta (#21330)
- Add a dedicated
remove
method forDataFrame
andLazyFrame
(#21259) - Rename
credentials
parameter tocredential
inCredentialProviderAzure
(#21295) - Implement
merge_sorted
for struct (#21205) - Add positive slice for new streaming MultiScan (#21191)
- Don't take in rewriting visitor (#21212)
- Add SQL support for the
DELETE
statement (#21190) - Add row index to new streaming multiscan (#21169)
- Improve DataFrame fmt in explain (#21158)
🐞 Bug fixes
- Method
dt.ordinal_day
was returning UTC results as opposed to those on the local timestamp (#21410) - Use Kahan summation for rolling sum kernels. Fix numerical stability issues (#21413)
- Add scalar checks for
n
andfill_value
parameters inshift
(#21292) - Upcast small integer dtypes for rolling sum operations (#21397)
- Don't silently produce null values from invalid input to
pl.datetime
andpl.date
(#21013) - Allow duration multiplied w/ primitive to propagate in IR schema (#21394)
- Struct arithmetic broadcasting behavior (#21382)
- Prefiltered optional plain primitive kernel (#21381)
- Panic when projecting only row index from IPC file (#21361)
- Properly update groups after
gather
in aggregation context (#21369) - Mark test as may_fail_auto_streaming (#21373)
- Properly set
fast_unique
in EnumBuilder (#21366) - Rust test race condition (#21368)
- Fix unequal DataFrame column heights from parquet hive scan with filter (#21340)
- Fix ColumnNotFound error selecting
len()
after semi/anti join (#21355) - Merge Parquet nested and flat decoders (#21342)
- Incorrect atomic ordering in Connector (#21341)
- Method
dt.offset_by
was discarding month and year info if day was included in offset for timezone-aware columns (#21291) - Fix pickling
polars.col
on Python versions <3.11 (#21333) - Fix duplicate column names after join if suffix already present (#21315)
- Skip Batches Expression for boolean literals (#21310)
- Fix performance regression for eager
join_where
(#21308) - Fix incorrect predicate pushdown for predicates referring to right-join key columns (#21293)
- Panic in
to_physical
for series of arrays and lists (#21289) - Resolve deadlock due to leaking in Connector recv drop (#21296)
- Incorrect result for merge_sorted with lexical categorical (#21278)
- Add
Int128
path forjoin_asof
(#21282) - Categorical min/max returning String dtype rather than Categorical (#21232)
- Checking overflow in Sliced function (#21207)
- Adding a struct field using a literal raises InvalidOperationError (#21254)
- Return nulls for
is_finite
,is_infinite
, andis_nan
when dtype ispl.Null
(#21253) - Account for minor change in new
connectorx
release (#21277) - Properly implement and test Skip Batch Predicate (#21269)
- Infinite recursion when broadcasting into struct zip_outer_validity (#21268)
- Deadlock due to bad logic in new-streaming join sampling (#21265)
- Incorrect result for top_k/bottom_k when input is sorted (#21264)
- UTF-8 validation of nested string slice in Parquet (#21262)
- Raise instead of panicking when casting a Series to a Struct with the wrong number of fields (#21213)
- Defer credential provider resolution to take place at query collection instead of construction (#21225)
- Do not panic in
strptime()
ifformat
ends with '%' (#21176) - Raise error instead of panicking for unsupported SQL operations (#20789)
- Projection of only row index in new streaming IPC (#21167)
- Fix projection count query optimization (#21162)
📖 Documentation
- Fix doc for SQL Functions navigation (#21412)
- Fix initial selector example (#21321)
- Add pandas strictness API difference (#21312)
- Improve
Expr.name.map
docstring example (#21309) - Add logo to Ask AI (#21261)
- Fix docs for Catalog (#21252)
- AI widget again (#21257)
- Revert plugin (#21250)
- Add kappa ask ai widget (#21243)
- Update social icons in API reference docs (#21214)
- Improve Arrow key feature description (#21171)
- Improve example in IO plugins user guide (#21146)
🛠️ Other improvements
- Move storage of hive partitions to DataFrame (#21364)
- Feature gate merge sorted in new streaming engine (#21338)
- Remove new streaming old multiscan (#21300)
- Add tests for fixed open issues (#21185)
- Try to mimic all steps (#21249)
- Require version for POLARS_VERSION (#21248)
- Fix docs (#21246)
- Avoid unnecessary
packaging
dependency (#21223) - Remove unused file (#21240)
- Add use_field_init_shorthand = true to rustfmt (#21237)
- Don't mutate arena by default in Rewriting Visitor (#21234)
- Disable the TraceMalloc allocator (#21231)
- Add feature gate to old streaming deprecation warning (#21179)
- Install seaborn when running remote benchmark (#21168)
Thank you to all our contributors for making this release possible!
@GiovanniGiacometti, @JakubValtar, @MarcoGorelli, @Matt711, @Shoeboxam, @YichiZhang0613, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @edwinvehmaanpera, @erikbrinkman, @etiennebacher, @hemanth94, @henryharbeck, @jqnatividad, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @ydagosto