pola-rs/polars rs-0.31.1 on GitHub

🚀 Performance improvements

Rolling min/max for partially sorted data (#9819)
use hash set in drop_many (#9807)
Faster is_sorted when no flag set (#9777)
optimize n_unique for integers (#9568)
remove sort columns on multiple-key OOC sort (#9545)
don't needlessly trigger bitcount (#9561)
don't initialize memory before row-encoding (#9435)
reduce page faults in q1 ~-30% (#9423)
reduce rayon/idle time in streaming (#9416)
use row format in streaming join ~15% (#9379)
row encode buffer reuse (#9371)
bytes row format for streaming groupby/unique keys >3.5x (#9346)
push slices down map functions (#9350)
increase streaming groupby spill size from 256 to 10_000 (#9312)
perf(rust, python) Improve rolling min and max for nonulls (#9277)
slightly improve n_unique performance (#9286)
speed up write_csv for time-zone-aware columns (#9093)
parallelize rolling_window group materialization (#9095)

✨ Enhancements

pass through unknown schema in unnest (#9896)
access OptState in LazyFrame to unit-test optimization toggle methods. (#9883)
respect and allow more options in eager json parsing (#9882)
allow set_sorted in streaming (#9876)
Expr.cat.get_categories expression (#9869)
add LENGTH and OCTET_LENGTH string functions for SQL (#9860)
polars_warn! macro (#9868)
Add Run-length Encoding functions (#9826)
add include_key parameter to partition_by (#9750)
add LEFT string function for SQL (#9836)
add REGEXP_LIKE function for SQL (both two and three parameter version) (#9838)
add maintain_order argument to sort/top_k/bottom_k (#9672)
add drop_many_amortized (#9814)
Dedicated horizontal aggregation functions (#9752)
implement with_row_count as private function (#9810)
add support for SQL SUBSTR function (#9803)
add SQL support for binary data and expand recognised SQL dtype strings (#9802)
reworked comfy-table layout constraints, improving table wrapping/repr (#9744)
allow qcut in window expressions (#9745)
Improve cut and allow use in expressions (#9580)
clearer message when stringcache-related errors occur (#9715)
improve expression formatting (#9704)
set string cache in window functions (#9705)
raise on both sides of datetime/str comparison (#9692)
support deserializing struct json into df (#9688)
add tree formatter for expressions (#9684)
add .list.any() and .list.all() (#9573)
extend dtype/selector matching for Datetime with a "*" wildcard for timezones (#9641)
add polars::VERSION (#9660)
add symmetric difference to list set operations (#9655)
add dt.base_utc_offset (#9636)
add dt.dst_offset feature (#9629)
allow to specify index order in to_numpy (#9592)
accept expressions in repeat (#9614)
set operations for list (#9599)
add drop_first parameter for to_dummies (issue #8246) (#9143)
raise if window size in rolling functions isn't strictly positive (#9465)
add infer schema len to json_extract (#9478)
Adds (Most) Remaining Trig Functions to SQLContext (#9453)
update error handling msg for sql functions (#9474)
add str.titlecase (#9457)
raise if period is negative in groupby_rolling (#9445)
add SQL round support (#9330)
dont error for time-zone-aware parsing if time zone is UTC (#9414)
support all numeric dtypes in serde (#9393)
ensure part of the plan is streaming if aggregati… (#9387)
add relaxed concatenation (#9382)
add sql DROP TABLE (#9355)
support ternary expressions in streaming (#9343)
add decoding support for row format (#9339)
add SQL support for null-aware equality checks (#9332)
add SQL support for regular expression operators (~, !~, ~*, and !~*) (#9327)
support // integer floordiv operator in the SQL engine (#9324)
serde for 'to_physical' expr (#9294)
add join cardinality validation (#9278)
keep sorted flag after Expr::truncate (#9275)
add "sql_expr" function (#9248)
rewrite correlation functions to expression architecture (#9258)
keep sorted flag on offset_by (#9253)
add intersection primitive for selector API (#9240)
building blocks for expression expansion sets (#9231)
Add ddof option to rolling_var and rolling_std (#8957)
immediately flatten nested unions (#9220)
support float expression on integers (#9210)
add binary to list<u8> cast (#9161)
add arr.unique expression (#9159)
implement explode for DataType::Array (#9157)
Decimal type: sum, min, max aggregations in select and agg context. (#9135)
Decimal arithmetic (#9123)
support decimals as cast types in csv parser (#9121)
Improve error handling for repeat (#9117)
conversion from Utf8 to Decimal. (#9090)

🐞 Bug fixes

fix(rust,python) respect original series dtype when constructing LitIter (#9886)
sum aggregation empty set is 0, not null (#9894)
Allow None as exponent (#9880)
preserve expression aliases when parsing SQL with pl.sql_expr (#9875)
fmt unknown dtype (#9872)
fix row-encode of 32 byte payloads (#9843)
shrink_type on all-null columns (#9811)
don't go into streaming engine when groupby by list (#9834)
fix regex + exclude (#9827)
potential integer overflow in drop_many_amortized (#9829)
add maintain_order argument to sort/top_k/bottom_k (#9672)
fix array concat and Series::fill_null (#9825)
dont preserve sortedness in offset_by for tz-aware non-constant durations (#9818)
Remove stray arr.eval references (#9821)
fix row-encode of null data (#9813)
allow +00:00 when loading from arrow (#9747)
fix row-count schema (#9797)
fix supertype detection (#9787)
merge rev-maps when building list arrays of categoricals. (#9742)
Loosen restrictions on cut expressions and add docs (#9730)
Fix list symmetric difference (#9732)
Fix list intersection (#9735)
don't clear rev_map when categorical series is cle… (#9720)
fix(rust, python) improve glob pattern testing (#9721)
don't run hstack checks when using cached names (#9709)
fix result dtype in date_range(..., eager=True) if duration contains "1s1d" (#9670)
increment seed between samples (#9694)
fix cse_plan invalid projection removal (#9700)
fix ne_missing for booleans vs lit (#9693)
raise if to_datetime would have parsed input incorrectly (#9675)
respect time_zone in lazy date_range (#8591)
redo weighted rolling var (#9609)
Correct weighted rolling quantile definition (#9608)
clear hashes buffer in generic streaming joins (#9612)
stable list namespace ouput when all elements are … (#9610)
validate time zone in cast and from_arrow operations (#9598)
make json feature depend on "dtype-struct" feature (#9589)
fix join suffix collision (#9579)
fix sum consistency (#9576)
fix take of array dtype (#9575)
fix predicate pushdown case before sort (#9574)
fix lazy schema of temporal_range functions when no alias is provided (#9543)
change the path parameter from to (#9531)
fix join validation when swapped (#9534)
fix race condition in out-of-core sort (#9521)
unset sortedness for local date and local datetime (#9515)
maintain sortedness flags on append/extend (#9496)
fix serde for small integer dtypes (#9495)
raise if window size in rolling functions isn't strictly positive (#9465)
groupby rolling with negative offset (#9428)
date_range with unit microseconds was producing incorrect results (#9413)
read_csv was parsing dates incorrectly when the dtype was overridden (#9420)
Compute Spearman rank correlations using average ra… (#9415)
Fix rolling min/max when window is empty (#9406)
fix compilation of other rustc versions (#9392)
list zip with (#9367)
parquet + categorical (#9363)
respect startby in groupby_dynamic when every is greater than 1d (#9362)
raise groupby apply on empty frame (#9360)
raise more informative error on string arguments (#9352)
correct assertion (#9320)
fix rolling weighted mean (#9292)
raise on invalid sort_by (#9262)
correct ne/e_missing schema (#9257)
fix cached reproject offsets (#9254)
delay opening files in streaming engine (#9251)
ensure agg(F(lit)) == lit (#9222)
don't SO on concat(expressions) (#9214)
clip window_size to length in rolling_apply (#9209)
rolling_apply window_size == len (#9181)
respect time zone in strptime/to_datetime when exact=False (#9171)
make null chunking behavior equal to other dtypes (#9176)
return single numpy array in Array dtype -> numpy (#9164)
fix regression in boolean nulls comparison (#9142)
fix struct null_count if fields are null arrays (#9151)
categorical construction from null values (#9145)
let apply caller determine if length needs to be checked. (#9140)
struct is_in should upcast numeric types (#9110)
json_extract on empty series (#9126)
bubble up dtype when converting from arrow (#9120)
rolling_groupy was returning incorrect results when offset was positive (#9082)

🛠️ Other improvements

Rolling quantile and median use DynArgs (#9867)
Clean up workspace definition (#9861)
Fix all clippy warnings in the test suite (#9839)
Refactor failing test (#9823)
Remove stray arr.eval references (#9821)
fix cut features (#9808)
cluster file scans in one node (#9799)
Remove old cut/qcut (#9763)
Small updates to issue templates (#9789)
unswap from_tz and to_tz in replace_timezone (#9768)
More cleanup around arange (#9769)
More cleanup for arange (#9681)
Fix small typo (#9714)
refactor arange and add int_range/int_ranges (#9666)
clean up inconsistencies in duration string language (#9551)
ensure date-range integration test runs in CI (#9554)
remove some redundancies in sort (#9541)
Fix some doc examples (#9405)
Remove outdated badges from README (#9532)
don't pickle pyarrow dataset (#9523)
Remove StdWindow in rolling (#9486)
remove unreachable code (#9463)
note that weekday is actually ISO weekday (#9440)
Add some documentation on the CI workflows (#9404)
fix typo in polars-lazy docs (#9354)
Utilize caching in test job (#9301)
Caching for benchmark workflow (#9267)
Further CI cleanup for Rust lints (#9260)
Separate workflow for Rust lints (#9245)
Fix itoap dependency specification (#9239)
Fix more broken links (#9230)
Fix some doc links (#9227)
Fix unused import warning in release build (#9224)
split up dsl::functions module (#9213)
update object_store requirement from 0.5.3 to 0.6.0 (#9154)
simplify slow datetime parser (#9183)
remove outdated struct, improve naming (#9172)
change decimal inference and argument order (#9133)
Include license file in polars-json crate (#9113)
Remove dbg statement from CoreJsonReader (#9114)
use concrete type for time zones (#9076)

Thank you to all our contributors for making this release possible!
@0xbe7a, @AnatolyBuga, @CloseChoice, @DeflateAwning, @EdmundsEcho, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @ankane, @avimallu, @baggiponte, @bfeif, @borchero, @braaannigan, @c-peters, @datapythonista, @dependabot, @dependabot[bot], @dkrako, @durandtibo, @eitsupi, @guanqun, @jeroenjanssens, @jonashaag, @jorisSchaller, @josh, @kljensen, @lorentzenchr, @magarick, @mcrumiller, @messense, @mgperry, @mishpat, @moritzwilksch, @ritchie46, @sorhawell, @stinodego, @tarrafil, @thomascamminady, @ttencate, @universalmind303 and @zundertj

pola-rs/polars rs-0.31.1 Rust Polars 0.31.1 on GitHub

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

pola-rs/polars rs-0.31.1
Rust Polars 0.31.1

on GitHub