github pola-rs/polars rs-0.34.0
Rust Polars 0.34.0

latest releases: py-0.20.31, py-0.20.30, py-0.20.29...
7 months ago

🏆 Highlights

  • postfix rolling expression as a special case of window functions. (#11445)
  • support 'hive partitioning' aware readers (#11284)

💥 Breaking changes

  • Rename .list.lengths and .str.lengths (#11613)
  • Rename write_csv parameter quote to quote_char (#11583)
  • Add disable_string_cache (#11020)

🚀 Performance improvements

  • fix regression non-null asof join (#11984)
  • drasticly improve performance of limit on async parquet datasets (#11965)
  • support multiple files in a single scan parquet node. (#11922)
  • fix accidental quadratic behavior; cache null_count (#11889)
  • fix quadratic behavior in append sorted check (#11893)
  • properly push down slice before left/asof join (#11854)
  • Improve performance of cot (cotangent) (#11717)
  • rechunk before grouping on multiple keys (#11711)
  • process parquet statistics before downloading row-group (#11709)
  • push down predicates that refer to group_by keys (#11687)
  • slightly faster float equality (#11652)
  • actually use projection information in async parquet reader (#11637)
  • improve performance and fix panic in async parquet reader (#11607)
  • use try_binary_elementwise over try_binary_elementwise_values (#11596)
  • skip empty chunks in concat (#11565)
  • improve sparse sample performance (#11544)
  • early return in replace_time_zone if target and source time zones match (#11478)
  • greatly improve parquet cloud reading (#11479)
  • ensure we download row-groups concurrently. (#11464)
  • don't load N metadata files when globbing N files (#11422)
  • remove double memcopy (#11365)
  • adress perf regression (#11354)
  • improve dynamic_groupby_iter (#11341)
  • improve and fix rolling windows by linear scanning (#11326)
  • improve outer join materialization (#11241)
  • use ryu and itoa for primitive serialization (#11193)
  • use try-binary-elementwise instead of try-binary-elementwise-values in dt_truncate (#11189)
  • Using cache for str.contains regex compilation (#11183)

✨ Enhancements

  • optimize asof_join and allow null/string keys (#11712)
  • limit concurrent downloads in async parquet (#11971)
  • sample fraction can take an expr (#11943)
  • Add infer_schema_length to pl.read_json (#11724)
  • improve error handling in scan_parquet and deal with file limits (#11938)
  • support multiple files in a single scan parquet node. (#11922)
  • error instead of panic in unsupported sinks (#11915)
  • Introduce list.sample (#11845)
  • don't require empty config for cloud scan_parquet (#11819)
  • Expressify pct_change and move to ops (#11786)
  • add DATE function for SQL (#11541)
  • right-align numeric columns (#7475)
  • Add config setting to control how many List items are printed (#11409)
  • allow specifying schema in pl.scan_ndjson (#10963)
  • easier arrow2/arrow-rs conversion (#11666)
  • support multiple sources in scan_file (#11661)
  • allow coalesce in streaming (#11633)
  • Implement schema, schema_override for pl.read_json with array-like input (#11492)
  • add SQL support for UNION [ALL] BY NAME, add "diagonal_relaxed" strategy for pl.concat (#11597)
  • improve performance and fix panic in async parquet reader (#11607)
  • add time_unit argument to duration, default to "us" (#11586)
  • elide overflow checks on i64 (#11563)
  • add INITCAP string function for SQL (#9884)
  • Use IPC for (un)pickling dataframes/series (#11507)
  • support left and right anti/semi joins from the SQL interface (#11501)
  • expressify peak_min/peak_max (#11482)
  • IN(subquery) and SQL Subquery Infrastructure (#11218)
  • Format null arrays in Series (#11289)
  • postfix rolling expression as a special case of window functions. (#11445)
  • allow for "by" column to be of dtype Date in rolling_* functions (#11004)
  • support 'abfss' for azure (#11413)
  • multi-threaded async runtime (#11411)
  • async parquet. (#11403)
  • fail fast when invalid cloud settings; introduce retries arg (#11380)
  • modernize CPU features (#11351)
  • introduce 'label' instead of 'truncate' in group_by_dynamic, which can take label='right' (#11337)
  • Expressify list.shift (#11320)
  • add gather_skip_nulls implementation (#11329)
  • top_k and bottom_k supports pass an expr (#11344)
  • support 'hive partitioning' aware readers (#11284)
  • str.strip_chars supports take an expr argument (#11313)
  • sample n can take an expr (#11257)
  • Add disable_string_cache (#11020)
  • clip supports expr arguments and physical numeric dtype (#11288)
  • Introduce list.drop_nulls (#11272)
  • str.splitn and split_exact can take an expr argument by (#11275)
  • introduce ambiguous option for dt.round (#11269)
  • improve binary helper so we don't need to rechunk. (#11247)
  • Adds NULLIF and COALESCE SQL functions (#11124)
  • better tree-formatting representation (#11176)
  • Support duration + date (#11190)
  • binary search and rechunk in chunked gather (#11199)
  • Expressify str.strip_prefix & suffix (#11197)
  • sql udfs (#10957)
  • run cloud parquet reader in default engine (#11196)
  • list.join's separator can be expression (#11167)
  • argument every of datetime.truncate can be expression (#11155)

🐞 Bug fixes

  • fix streaming multi-column/multi-dtype sort (#11981)
  • ensure streaming parquet datasets deal with limits (#11977)
  • implement proper hash for identifier in cse (#11960)
  • fix take return dtype in group context. (#11949)
  • sql In should work without specific ops (#11947)
  • construct list series from any values subject to dtype (#11944)
  • avoid integer overflow in offsets_to_groups when bigidx is enabled (#11901)
  • read_csv for empty lines (#11924)
  • predicate push-down remove predicate refers to alias for more branch (#11887)
  • use physcial append (#11894)
  • recursively apply cast_unchecked in lists (#11884)
  • recursively check allowed streaming dtypes (#11879)
  • fix project pushdown for double projection contains count (#11843)
  • series.to_numpy fails with dtype=Null (#11858)
  • panic on hive scan from cloud (#11847)
  • Propagate validity when cast primitive to list (#11846)
  • Edge cases for list count formatting (#11780)
  • remove flag inconsistency 'map_many' (#11817)
  • ensure projections containing only hive columns are projected (#11803)
  • patch broken aHash AES intrinsics on ARM (#11801)
  • fix key in object-store cache (#11790)
  • handle logical types in plugins (#11788)
  • make PyLazyGroupby reusable (#11769)
  • only exclude final output names of group_by key expressions (#11768)
  • fix ambiguity wrt list aggregation states (#11758)
  • Correctly process subseconds in pl.duration (#11748)
  • LazyFrame.drop_columns overflow issue when columns.len()>schema.len() (#11716)
  • index_to_chunked_index's fast path is not correct (#11710)
  • use actual number of read rows for hive materialization (#11690)
  • return float dtype in interpolate (for method="linear") for numeric dtypes (#11624)
  • fix seg fault in concat_str of empty series (#11704)
  • Fix match on last item for join_asof with strategy="nearest" (#11673)
  • fix display str for peak_max and top_k (#11657)
  • Fix input replacement logic for slice (#11631)
  • slice expr can be taken in cse (#11628)
  • ensure nested logical types are converted to physical (#11621)
  • correctly convert nullability of nested parquet fields to arrow (#11619)
  • improve performance and fix panic in async parquet reader (#11607)
  • expand all literals before group_by (#11590)
  • mark take_group_last function as unsafe (#11587)
  • handle unary operators applied to numbers used in SQL IN clauses (#11574)
  • Align new_columns argument for scan_csv and read_csv (#11575)
  • don't conflate supported UNION ops in the SQL parser with (currently) unsupported UNION "BY NAME" variations (#11576)
  • incomplete reading of list types from parquet (#11578)
  • respect identity in horizontal sum (#11559)
  • bug in BitMask::get_u32 (#11560)
  • take slice into account in parallel unions (#11558)
  • correct schema empty df in hive partitioning read (#11557)
  • ensure ListChunked::full_null uses physical types (#11554)
  • respect 'hive_partitioning' argument in parquet (#11551)
  • fix parquet deserialization Overflow error by using i64 offset types when promoting Arrow Lists to LargeLists (#11549)
  • streamline is_in handling of mismatched dtypes and fix a minor regression (#11533)
  • catch use of non equi-joins in SQL interface and raise appropriate error (#11526)
  • rework SQL join constraint processing to properly account for all USING columns (#11518)
  • literal hash (#11508)
  • Fix lazy schema for cut/qcut when allow_breaks=True (#11287)
  • correct output schema of hive partition and projection at scan (#11499)
  • correct projection pushdown in hive partitioned read (#11486)
  • fix for write_csv when using non-default "quote" char (#11474)
  • fix deserialization of parquets with large string list columns causing stack overflow (#11471)
  • Fix SQL ANY and ALL behaviour (#10879)
  • address multiple issues caused by implicit casting of is_in values to the column dtype being searched (#11427)
  • raise on invalid sort_by group lengths (#11423)
  • fix outer join on bools (#11417)
  • fix categorical collect (#11414)
  • Free bitmap when slicing into a non-null array (#11405)
  • async parquet. (#11403)
  • Fix edge-case where the Array dtype could (internally) be considered numeric (#11398)
  • Fix empty check when building a list (#11378)
  • more cloud urls (#11361)
  • ensure cloud globbing can deal with spaces (#11360)
  • recognize more cloud urls (#11357)
  • Fix Series.__contains__ for None values and implement is_in for null Series (#11345)
  • don't panic on multi-nodes in streaming conversion (#11343)
  • ensure trailing quote is written for temporal data when CSV quote_style is non-numeric (#11328)
  • fix empty Series construction edge-case with Struct dtype (#11301)
  • add missing feature flags on tests (#11305)
  • set partitions independent of thread pool (#11304)
  • parse sign for decimal properly (#11302)
  • consume duplicates in rolling_by window (#11261)
  • handle url encoded paths in objectpath creation (#11240)
  • use POOL when writing csv (#11222)
  • is_in for bool evaluate has_false incorrectly (#11217)
  • fix nullable filter mask in group_by (#11207)
  • replace n-th in filter (#11206)
  • fix translation of Series-nested datetime/date values for scan_pyarrow predicates (#11195)
  • impl hash for more function expr (#11182)
  • list.join's separator can be expression (#11167)
  • Add some missing expr type hint for series (#11171)
  • Make pl.struct serializable (#11169)
  • Fix rust test for logical plan optimizer for categoricals (#11135)
  • propagate null value for str/binary starts/ends_with and contains (#11141)

🛠️ Other improvements

  • optimize asof_join and allow null/string keys (#11712)
  • Add Development and Releases sections to the documentation (#11932)
  • use ahash from crates.io release (#11964)
  • move unique_counts to ops (#11963)
  • fix take return dtype in group context. (#11949)
  • move moment to ops (#11941)
  • fix some typos and add polars-business to curated plugin list (#11916)
  • prepare for multiple files in a node (#11918)
  • load 40x40 avatar from github and add loading=lazy attribute. (#11886)
  • Fix Cargo warning for parquet2 dependency (#11882)
  • Allow manual trigger for docs deployment (#11881)
  • rename new_from_owned_with_null_bitmap (#11828)
  • add section about plugins (#11855)
  • fix incorrect example of valid time zones (#11873)
  • Bump docs dependencies (#11852)
  • add missing polars-ops tests to CI (#11859)
  • Update doc comments for with_column to reflect that columns can be updated (#11840)
  • Move round to ops (#11838)
  • arrow: remove unused arithmetic code and remove doctests (#11820)
  • Move diff to polars-ops (#11818)
  • remove redundant if branch in nested parquet (#11814)
  • Move ewma to polars-ops (#11794)
  • Make some functions in dsl::mod non-anonymous (#11799)
  • Move cum_agg to polars-ops (#11770)
  • more granular polars-ops imports (#11760)
  • Make all emw function expr non-anonymous (#11638)
  • clarify polars-arrow <=> arrow2 license (#11755)
  • Version polars-arrow with the other crates (#11738)
  • fill missing fill_null strategies (#11751)
  • Minor fix in code example in section Coming from Pandas (#11745) (#11745)
  • Update group_by_dynamic example (#11737)
  • merge nano-arrow/polars-arrow (#11719)
  • Improving the documentation of the SQL expressions (#11708)
  • *_horizontal dependent on reduce_expr to expression architecture (#11685)
  • update document of folds (#11705)
  • update rustc and fix future (#11696)
  • better align help command output following addition of some longer options (#11681)
  • sum_horizontal to expression architecture (#11659)
  • Cleanup the match block for date inference (#11677)
  • Adding feature annotation (#11671)
  • add note about use of polars-lts-cpu for macOS x86-64/rosetta (#11660)
  • improve rank implementation, especially around nulls (#11651)
  • Bring cloud monikers in line with the ones in is_cloud_url (#11629)
  • Rename .list.lengths and .str.lengths (#11613)
  • Make backwardfill and forwardfill function expr non-anonymous (#11630)
  • Make all expr in dt namespace non-anonymous (#11627)
  • Fix changelog for language-specific breaking changes (#11617)
  • avoid nightly rust for case conversion (#11610)
  • Make value_counts and unique_counts function expr non-anonymous (#11601)
  • Make arg_min(max), diff in list namespace non-anonymous (#11602)
  • Rename write_csv parameter quote to quote_char (#11583)
  • use a generic consistent total ordering, also for floats (#11468)
  • Move mode operation from core to ops crate (#11543)
  • fix lints (#11555)
  • use single threaded take under certain values size (#11539)
  • fix some features (#11529)
  • move (hor_)str_concat to polars-ops (#11488)
  • minor changes in peak-min/max (#11491)
  • align cloud url regex in rust and python (#11481)
  • move AnonymousScan into Scan node (#11502)
  • move repeat_by to polars-ops (#11461)
  • upgrade to nightly-10-02 (#11460)
  • Update contributing guide to include memory requirement (#11458)
  • remove unused order_by attribute (#11434)
  • cleanup sort_by expresion impl (#11431)
  • large windows runner for release (#11370)
  • Fix error message reference to infer_schema_length (#11358)
  • move rank to polars-ops (#11349)
  • unify display for namespaced function expr (#11342)
  • Fix some cargo manifest warnings (#11327)
  • Use GITHUB_TOKEN to get contributor information for docs (#11321)
  • Add disable_string_cache (#11020)
  • remove default auto-explode for map_many_private (#11270)
  • Add API links for Rust user guide examples (#11294)
  • update a few dependencies (#11283)
  • move scan helpers to separate module (#11279)
  • update sponsors (#11271)
  • bump chrono to 0.4.31 (#11258)
  • bind all remaining method in StringNameSpace to function expr (#11229)
  • Make some list function expr non-anonymous (#11230)
  • remove lz4_flex feature (#11253)
  • remove unnecessary transmute (#11250)
  • move (almost) all join related code from polars-core to polars-ops. (#11228)
  • Mention the performant feature only once (#11223)
  • remove unneeded indirection (#11233)
  • remove unneeded mutex around object-store (#11224)
  • bind struct.rename_fields to function expr (#11215)
  • fix un-compilable rust example in user guide. (#11214)
  • add various missing expression doc-comments (#11213)
  • Fix user_guide of str.split (#11185)
  • New take implementation (#11138)
  • Fix rust test for logical plan optimizer for categoricals (#11135)

Thank you to all our contributors for making this release possible!
@ByteNybbler, @Cheukting, @Fokko, @Hofer-Julian, @JulianCologne, @LaurynasMiksys, @MarcoGorelli, @Rohxn16, @SeanTroyUWO, @TheDataScientistNL, @Walnut356, @aberres, @alexander-beedie, @alicja-januszkiewicz, @andysham, @billylanchantin, @bowlofeggs, @c-peters, @cmdlineluser, @dannyvankooten, @dependabot, @dependabot[bot], @ewoolsey, @jhorstmann, @jonashaag, @jrycw, @mcrumiller, @messense, @nameexhaustion, @orlp, @petrosbar, @ptiza, @rancomp, @reswqa, @ritchie46, @rjthoen, @romanovacca, @sd2k, @shenker, @squnit, @stinodego, @svaningelgem, @thomasjpfan, @uchiiii, @universalmind303 and Romano Vacca

Don't miss a new polars release

NewReleases is sending notifications on new releases.