🚀 Performance improvements
- Rolling min/max for partially sorted data (#9819)
- use hash set in drop_many (#9807)
- Faster is_sorted when no flag set (#9777)
- optimize n_unique for integers (#9568)
- remove sort columns on multiple-key OOC sort (#9545)
- don't needlessly trigger bitcount (#9561)
- don't initialize memory before row-encoding (#9435)
- reduce page faults in q1
~-30%
(#9423) - reduce rayon/idle time in streaming (#9416)
- use row format in streaming join
~15%
(#9379) - row encode buffer reuse (#9371)
- bytes row format for streaming groupby/unique keys
>3.5x
(#9346) - push slices down map functions (#9350)
- increase streaming groupby spill size from 256 to 10_000 (#9312)
- perf(rust, python) Improve rolling min and max for nonulls (#9277)
- slightly improve n_unique performance (#9286)
- speed up write_csv for time-zone-aware columns (#9093)
- parallelize rolling_window group materialization (#9095)
✨ Enhancements
- pass through unknown schema in unnest (#9896)
- access
OptState
inLazyFrame
to unit-test optimization toggle methods. (#9883) - respect and allow more options in eager json parsing (#9882)
- allow set_sorted in streaming (#9876)
- Expr.cat.get_categories expression (#9869)
- add
LENGTH
andOCTET_LENGTH
string functions for SQL (#9860) polars_warn!
macro (#9868)- Add Run-length Encoding functions (#9826)
- add
include_key
parameter topartition_by
(#9750) - add
LEFT
string function for SQL (#9836) - add
REGEXP_LIKE
function for SQL (both two and three parameter version) (#9838) - add
maintain_order
argument tosort
/top_k
/bottom_k
(#9672) - add drop_many_amortized (#9814)
- Dedicated horizontal aggregation functions (#9752)
- implement with_row_count as private function (#9810)
- add support for SQL
SUBSTR
function (#9803) - add SQL support for binary data and expand recognised SQL dtype strings (#9802)
- reworked comfy-table layout constraints, improving table wrapping/repr (#9744)
- allow qcut in window expressions (#9745)
- Improve cut and allow use in expressions (#9580)
- clearer message when stringcache-related errors occur (#9715)
- improve expression formatting (#9704)
- set string cache in window functions (#9705)
- raise on both sides of datetime/str comparison (#9692)
- support deserializing struct json into df (#9688)
- add tree formatter for expressions (#9684)
- add
.list.any()
and.list.all()
(#9573) - extend dtype/selector matching for
Datetime
with a "*" wildcard for timezones (#9641) - add polars::VERSION (#9660)
- add symmetric difference to list set operations (#9655)
- add dt.base_utc_offset (#9636)
- add dt.dst_offset feature (#9629)
- allow to specify index order in
to_numpy
(#9592) - accept expressions in
repeat
(#9614) - set operations for list (#9599)
- add drop_first parameter for to_dummies (issue #8246) (#9143)
- raise if window size in rolling functions isn't strictly positive (#9465)
- add infer schema len to json_extract (#9478)
- Adds (Most) Remaining Trig Functions to
SQLContext
(#9453) - update error handling msg for sql functions (#9474)
- add str.titlecase (#9457)
- raise if period is negative in groupby_rolling (#9445)
- add SQL
round
support (#9330) - dont error for time-zone-aware parsing if time zone is UTC (#9414)
- support all numeric dtypes in serde (#9393)
- ensure part of the plan is streaming if aggregati… (#9387)
- add relaxed concatenation (#9382)
- add sql DROP TABLE (#9355)
- support ternary expressions in streaming (#9343)
- add decoding support for row format (#9339)
- add SQL support for null-aware equality checks (#9332)
- add SQL support for regular expression operators (
~
,!~
,~*
, and!~*
) (#9327) - support
//
integer floordiv operator in the SQL engine (#9324) - serde for 'to_physical' expr (#9294)
- add join cardinality validation (#9278)
- keep sorted flag after Expr::truncate (#9275)
- add "sql_expr" function (#9248)
- rewrite correlation functions to expression architecture (#9258)
- keep sorted flag on
offset_by
(#9253) - add intersection primitive for selector API (#9240)
- building blocks for expression expansion sets (#9231)
- Add ddof option to rolling_var and rolling_std (#8957)
- immediately flatten nested unions (#9220)
- support float expression on integers (#9210)
- add binary to list<u8> cast (#9161)
- add arr.unique expression (#9159)
- implement explode for DataType::Array (#9157)
Decimal
type:sum
,min
,max
aggregations inselect
andagg
context. (#9135)- Decimal arithmetic (#9123)
- support decimals as cast types in csv parser (#9121)
- Improve error handling for
repeat
(#9117) - conversion from
Utf8
toDecimal
. (#9090)
🐞 Bug fixes
- fix(rust,python) respect original series dtype when constructing
LitIter
(#9886) - sum aggregation empty set is 0, not null (#9894)
- Allow None as exponent (#9880)
- preserve expression aliases when parsing SQL with
pl.sql_expr
(#9875) - fmt unknown dtype (#9872)
- fix row-encode of 32 byte payloads (#9843)
- shrink_type on all-null columns (#9811)
- don't go into streaming engine when groupby by list (#9834)
- fix regex + exclude (#9827)
- potential integer overflow in drop_many_amortized (#9829)
- add
maintain_order
argument tosort
/top_k
/bottom_k
(#9672) - fix array concat and Series::fill_null (#9825)
- dont preserve sortedness in offset_by for tz-aware non-constant durations (#9818)
- Remove stray
arr.eval
references (#9821) - fix row-encode of null data (#9813)
- allow +00:00 when loading from arrow (#9747)
- fix row-count schema (#9797)
- fix supertype detection (#9787)
- merge rev-maps when building list arrays of categoricals. (#9742)
- Loosen restrictions on cut expressions and add docs (#9730)
- Fix list symmetric difference (#9732)
- Fix list intersection (#9735)
- don't clear rev_map when categorical series is cle… (#9720)
- fix(rust, python) improve glob pattern testing (#9721)
- don't run hstack checks when using cached names (#9709)
- fix result dtype in date_range(..., eager=True) if duration contains "1s1d" (#9670)
- increment seed between samples (#9694)
- fix cse_plan invalid projection removal (#9700)
- fix ne_missing for booleans vs lit (#9693)
- raise if to_datetime would have parsed input incorrectly (#9675)
- respect time_zone in lazy date_range (#8591)
- redo weighted rolling var (#9609)
- Correct weighted rolling quantile definition (#9608)
- clear hashes buffer in generic streaming joins (#9612)
- stable list namespace ouput when all elements are … (#9610)
- validate time zone in cast and from_arrow operations (#9598)
- make json feature depend on "dtype-struct" feature (#9589)
- fix join suffix collision (#9579)
- fix sum consistency (#9576)
- fix take of array dtype (#9575)
- fix predicate pushdown case before sort (#9574)
- fix lazy schema of temporal_range functions when no alias is provided (#9543)
- change the path parameter from to (#9531)
- fix join validation when swapped (#9534)
- fix race condition in out-of-core sort (#9521)
- unset sortedness for local date and local datetime (#9515)
- maintain sortedness flags on append/extend (#9496)
- fix serde for small integer dtypes (#9495)
- raise if window size in rolling functions isn't strictly positive (#9465)
- groupby rolling with negative offset (#9428)
- date_range with unit microseconds was producing incorrect results (#9413)
- read_csv was parsing dates incorrectly when the dtype was overridden (#9420)
- Compute Spearman rank correlations using average ra… (#9415)
- Fix rolling min/max when window is empty (#9406)
- fix compilation of other rustc versions (#9392)
- list zip with (#9367)
- parquet + categorical (#9363)
- respect startby in groupby_dynamic when every is greater than 1d (#9362)
- raise groupby apply on empty frame (#9360)
- raise more informative error on string arguments (#9352)
- correct assertion (#9320)
- fix rolling weighted mean (#9292)
- raise on invalid sort_by (#9262)
- correct ne/e_missing schema (#9257)
- fix cached reproject offsets (#9254)
- delay opening files in streaming engine (#9251)
- ensure agg(F(lit)) == lit (#9222)
- don't SO on concat(expressions) (#9214)
- clip window_size to length in rolling_apply (#9209)
- rolling_apply window_size == len (#9181)
- respect time zone in strptime/to_datetime when exact=False (#9171)
- make null chunking behavior equal to other dtypes (#9176)
- return single numpy array in Array dtype -> numpy (#9164)
- fix regression in boolean nulls comparison (#9142)
- fix struct null_count if fields are null arrays (#9151)
- categorical construction from null values (#9145)
- let
apply
caller determine if length needs to be checked. (#9140) - struct
is_in
should upcast numeric types (#9110) - json_extract on empty series (#9126)
- bubble up dtype when converting from arrow (#9120)
- rolling_groupy was returning incorrect results when offset was positive (#9082)
🛠️ Other improvements
- Rolling quantile and median use DynArgs (#9867)
- Clean up workspace definition (#9861)
- Fix all clippy warnings in the test suite (#9839)
- Refactor failing test (#9823)
- Remove stray
arr.eval
references (#9821) - fix cut features (#9808)
- cluster file scans in one node (#9799)
- Remove old cut/qcut (#9763)
- Small updates to issue templates (#9789)
- unswap from_tz and to_tz in replace_timezone (#9768)
- More cleanup around
arange
(#9769) - More cleanup for
arange
(#9681) - Fix small typo (#9714)
- refactor
arange
and addint_range
/int_ranges
(#9666) - clean up inconsistencies in duration string language (#9551)
- ensure date-range integration test runs in CI (#9554)
- remove some redundancies in sort (#9541)
- Fix some doc examples (#9405)
- Remove outdated badges from README (#9532)
- don't pickle pyarrow dataset (#9523)
- Remove StdWindow in rolling (#9486)
- remove unreachable code (#9463)
- note that weekday is actually ISO weekday (#9440)
- Add some documentation on the CI workflows (#9404)
- fix typo in polars-lazy docs (#9354)
- Utilize caching in test job (#9301)
- Caching for benchmark workflow (#9267)
- Further CI cleanup for Rust lints (#9260)
- Separate workflow for Rust lints (#9245)
- Fix itoap dependency specification (#9239)
- Fix more broken links (#9230)
- Fix some doc links (#9227)
- Fix unused import warning in release build (#9224)
- split up dsl::functions module (#9213)
- update object_store requirement from 0.5.3 to 0.6.0 (#9154)
- simplify slow datetime parser (#9183)
- remove outdated struct, improve naming (#9172)
- change decimal inference and argument order (#9133)
- Include license file in polars-json crate (#9113)
- Remove dbg statement from CoreJsonReader (#9114)
- use concrete type for time zones (#9076)
Thank you to all our contributors for making this release possible!
@0xbe7a, @AnatolyBuga, @CloseChoice, @DeflateAwning, @EdmundsEcho, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @ankane, @avimallu, @baggiponte, @bfeif, @borchero, @braaannigan, @c-peters, @datapythonista, @dependabot, @dependabot[bot], @dkrako, @durandtibo, @eitsupi, @guanqun, @jeroenjanssens, @jonashaag, @jorisSchaller, @josh, @kljensen, @lorentzenchr, @magarick, @mcrumiller, @messense, @mgperry, @mishpat, @moritzwilksch, @ritchie46, @sorhawell, @stinodego, @tarrafil, @thomascamminady, @ttencate, @universalmind303 and @zundertj