github pola-rs/polars rs-0.31.1
Rust Polars 0.31.1

latest releases: rs-0.44.2, rs-0.44.1, rs-0.44.0...
16 months ago

🚀 Performance improvements

  • Rolling min/max for partially sorted data (#9819)
  • use hash set in drop_many (#9807)
  • Faster is_sorted when no flag set (#9777)
  • optimize n_unique for integers (#9568)
  • remove sort columns on multiple-key OOC sort (#9545)
  • don't needlessly trigger bitcount (#9561)
  • don't initialize memory before row-encoding (#9435)
  • reduce page faults in q1 ~-30% (#9423)
  • reduce rayon/idle time in streaming (#9416)
  • use row format in streaming join ~15% (#9379)
  • row encode buffer reuse (#9371)
  • bytes row format for streaming groupby/unique keys >3.5x (#9346)
  • push slices down map functions (#9350)
  • increase streaming groupby spill size from 256 to 10_000 (#9312)
  • perf(rust, python) Improve rolling min and max for nonulls (#9277)
  • slightly improve n_unique performance (#9286)
  • speed up write_csv for time-zone-aware columns (#9093)
  • parallelize rolling_window group materialization (#9095)

✨ Enhancements

  • pass through unknown schema in unnest (#9896)
  • access OptState in LazyFrame to unit-test optimization toggle methods. (#9883)
  • respect and allow more options in eager json parsing (#9882)
  • allow set_sorted in streaming (#9876)
  • Expr.cat.get_categories expression (#9869)
  • add LENGTH and OCTET_LENGTH string functions for SQL (#9860)
  • polars_warn! macro (#9868)
  • Add Run-length Encoding functions (#9826)
  • add include_key parameter to partition_by (#9750)
  • add LEFT string function for SQL (#9836)
  • add REGEXP_LIKE function for SQL (both two and three parameter version) (#9838)
  • add maintain_order argument to sort/top_k/bottom_k (#9672)
  • add drop_many_amortized (#9814)
  • Dedicated horizontal aggregation functions (#9752)
  • implement with_row_count as private function (#9810)
  • add support for SQL SUBSTR function (#9803)
  • add SQL support for binary data and expand recognised SQL dtype strings (#9802)
  • reworked comfy-table layout constraints, improving table wrapping/repr (#9744)
  • allow qcut in window expressions (#9745)
  • Improve cut and allow use in expressions (#9580)
  • clearer message when stringcache-related errors occur (#9715)
  • improve expression formatting (#9704)
  • set string cache in window functions (#9705)
  • raise on both sides of datetime/str comparison (#9692)
  • support deserializing struct json into df (#9688)
  • add tree formatter for expressions (#9684)
  • add .list.any() and .list.all() (#9573)
  • extend dtype/selector matching for Datetime with a "*" wildcard for timezones (#9641)
  • add polars::VERSION (#9660)
  • add symmetric difference to list set operations (#9655)
  • add dt.base_utc_offset (#9636)
  • add dt.dst_offset feature (#9629)
  • allow to specify index order in to_numpy (#9592)
  • accept expressions in repeat (#9614)
  • set operations for list (#9599)
  • add drop_first parameter for to_dummies (issue #8246) (#9143)
  • raise if window size in rolling functions isn't strictly positive (#9465)
  • add infer schema len to json_extract (#9478)
  • Adds (Most) Remaining Trig Functions to SQLContext (#9453)
  • update error handling msg for sql functions (#9474)
  • add str.titlecase (#9457)
  • raise if period is negative in groupby_rolling (#9445)
  • add SQL round support (#9330)
  • dont error for time-zone-aware parsing if time zone is UTC (#9414)
  • support all numeric dtypes in serde (#9393)
  • ensure part of the plan is streaming if aggregati… (#9387)
  • add relaxed concatenation (#9382)
  • add sql DROP TABLE (#9355)
  • support ternary expressions in streaming (#9343)
  • add decoding support for row format (#9339)
  • add SQL support for null-aware equality checks (#9332)
  • add SQL support for regular expression operators (~, !~, ~*, and !~*) (#9327)
  • support // integer floordiv operator in the SQL engine (#9324)
  • serde for 'to_physical' expr (#9294)
  • add join cardinality validation (#9278)
  • keep sorted flag after Expr::truncate (#9275)
  • add "sql_expr" function (#9248)
  • rewrite correlation functions to expression architecture (#9258)
  • keep sorted flag on offset_by (#9253)
  • add intersection primitive for selector API (#9240)
  • building blocks for expression expansion sets (#9231)
  • Add ddof option to rolling_var and rolling_std (#8957)
  • immediately flatten nested unions (#9220)
  • support float expression on integers (#9210)
  • add binary to list<u8> cast (#9161)
  • add arr.unique expression (#9159)
  • implement explode for DataType::Array (#9157)
  • Decimal type: sum, min, max aggregations in select and agg context. (#9135)
  • Decimal arithmetic (#9123)
  • support decimals as cast types in csv parser (#9121)
  • Improve error handling for repeat (#9117)
  • conversion from Utf8 to Decimal. (#9090)

🐞 Bug fixes

  • fix(rust,python) respect original series dtype when constructing LitIter (#9886)
  • sum aggregation empty set is 0, not null (#9894)
  • Allow None as exponent (#9880)
  • preserve expression aliases when parsing SQL with pl.sql_expr (#9875)
  • fmt unknown dtype (#9872)
  • fix row-encode of 32 byte payloads (#9843)
  • shrink_type on all-null columns (#9811)
  • don't go into streaming engine when groupby by list (#9834)
  • fix regex + exclude (#9827)
  • potential integer overflow in drop_many_amortized (#9829)
  • add maintain_order argument to sort/top_k/bottom_k (#9672)
  • fix array concat and Series::fill_null (#9825)
  • dont preserve sortedness in offset_by for tz-aware non-constant durations (#9818)
  • Remove stray arr.eval references (#9821)
  • fix row-encode of null data (#9813)
  • allow +00:00 when loading from arrow (#9747)
  • fix row-count schema (#9797)
  • fix supertype detection (#9787)
  • merge rev-maps when building list arrays of categoricals. (#9742)
  • Loosen restrictions on cut expressions and add docs (#9730)
  • Fix list symmetric difference (#9732)
  • Fix list intersection (#9735)
  • don't clear rev_map when categorical series is cle… (#9720)
  • fix(rust, python) improve glob pattern testing (#9721)
  • don't run hstack checks when using cached names (#9709)
  • fix result dtype in date_range(..., eager=True) if duration contains "1s1d" (#9670)
  • increment seed between samples (#9694)
  • fix cse_plan invalid projection removal (#9700)
  • fix ne_missing for booleans vs lit (#9693)
  • raise if to_datetime would have parsed input incorrectly (#9675)
  • respect time_zone in lazy date_range (#8591)
  • redo weighted rolling var (#9609)
  • Correct weighted rolling quantile definition (#9608)
  • clear hashes buffer in generic streaming joins (#9612)
  • stable list namespace ouput when all elements are … (#9610)
  • validate time zone in cast and from_arrow operations (#9598)
  • make json feature depend on "dtype-struct" feature (#9589)
  • fix join suffix collision (#9579)
  • fix sum consistency (#9576)
  • fix take of array dtype (#9575)
  • fix predicate pushdown case before sort (#9574)
  • fix lazy schema of temporal_range functions when no alias is provided (#9543)
  • change the path parameter from to (#9531)
  • fix join validation when swapped (#9534)
  • fix race condition in out-of-core sort (#9521)
  • unset sortedness for local date and local datetime (#9515)
  • maintain sortedness flags on append/extend (#9496)
  • fix serde for small integer dtypes (#9495)
  • raise if window size in rolling functions isn't strictly positive (#9465)
  • groupby rolling with negative offset (#9428)
  • date_range with unit microseconds was producing incorrect results (#9413)
  • read_csv was parsing dates incorrectly when the dtype was overridden (#9420)
  • Compute Spearman rank correlations using average ra… (#9415)
  • Fix rolling min/max when window is empty (#9406)
  • fix compilation of other rustc versions (#9392)
  • list zip with (#9367)
  • parquet + categorical (#9363)
  • respect startby in groupby_dynamic when every is greater than 1d (#9362)
  • raise groupby apply on empty frame (#9360)
  • raise more informative error on string arguments (#9352)
  • correct assertion (#9320)
  • fix rolling weighted mean (#9292)
  • raise on invalid sort_by (#9262)
  • correct ne/e_missing schema (#9257)
  • fix cached reproject offsets (#9254)
  • delay opening files in streaming engine (#9251)
  • ensure agg(F(lit)) == lit (#9222)
  • don't SO on concat(expressions) (#9214)
  • clip window_size to length in rolling_apply (#9209)
  • rolling_apply window_size == len (#9181)
  • respect time zone in strptime/to_datetime when exact=False (#9171)
  • make null chunking behavior equal to other dtypes (#9176)
  • return single numpy array in Array dtype -> numpy (#9164)
  • fix regression in boolean nulls comparison (#9142)
  • fix struct null_count if fields are null arrays (#9151)
  • categorical construction from null values (#9145)
  • let apply caller determine if length needs to be checked. (#9140)
  • struct is_in should upcast numeric types (#9110)
  • json_extract on empty series (#9126)
  • bubble up dtype when converting from arrow (#9120)
  • rolling_groupy was returning incorrect results when offset was positive (#9082)

🛠️ Other improvements

  • Rolling quantile and median use DynArgs (#9867)
  • Clean up workspace definition (#9861)
  • Fix all clippy warnings in the test suite (#9839)
  • Refactor failing test (#9823)
  • Remove stray arr.eval references (#9821)
  • fix cut features (#9808)
  • cluster file scans in one node (#9799)
  • Remove old cut/qcut (#9763)
  • Small updates to issue templates (#9789)
  • unswap from_tz and to_tz in replace_timezone (#9768)
  • More cleanup around arange (#9769)
  • More cleanup for arange (#9681)
  • Fix small typo (#9714)
  • refactor arange and add int_range/int_ranges (#9666)
  • clean up inconsistencies in duration string language (#9551)
  • ensure date-range integration test runs in CI (#9554)
  • remove some redundancies in sort (#9541)
  • Fix some doc examples (#9405)
  • Remove outdated badges from README (#9532)
  • don't pickle pyarrow dataset (#9523)
  • Remove StdWindow in rolling (#9486)
  • remove unreachable code (#9463)
  • note that weekday is actually ISO weekday (#9440)
  • Add some documentation on the CI workflows (#9404)
  • fix typo in polars-lazy docs (#9354)
  • Utilize caching in test job (#9301)
  • Caching for benchmark workflow (#9267)
  • Further CI cleanup for Rust lints (#9260)
  • Separate workflow for Rust lints (#9245)
  • Fix itoap dependency specification (#9239)
  • Fix more broken links (#9230)
  • Fix some doc links (#9227)
  • Fix unused import warning in release build (#9224)
  • split up dsl::functions module (#9213)
  • update object_store requirement from 0.5.3 to 0.6.0 (#9154)
  • simplify slow datetime parser (#9183)
  • remove outdated struct, improve naming (#9172)
  • change decimal inference and argument order (#9133)
  • Include license file in polars-json crate (#9113)
  • Remove dbg statement from CoreJsonReader (#9114)
  • use concrete type for time zones (#9076)

Thank you to all our contributors for making this release possible!
@0xbe7a, @AnatolyBuga, @CloseChoice, @DeflateAwning, @EdmundsEcho, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @ankane, @avimallu, @baggiponte, @bfeif, @borchero, @braaannigan, @c-peters, @datapythonista, @dependabot, @dependabot[bot], @dkrako, @durandtibo, @eitsupi, @guanqun, @jeroenjanssens, @jonashaag, @jorisSchaller, @josh, @kljensen, @lorentzenchr, @magarick, @mcrumiller, @messense, @mgperry, @mishpat, @moritzwilksch, @ritchie46, @sorhawell, @stinodego, @tarrafil, @thomascamminady, @ttencate, @universalmind303 and @zundertj

Don't miss a new polars release

NewReleases is sending notifications on new releases.