github pola-rs/polars rs-0.35.0
Rust Polars 0.35.0

latest releases: py-1.7.1, rs-0.43.1, py-1.7.0...
10 months ago

🏆 Highlights

  • improve join performance through radix partitioned join (#12270)

💥 Breaking changes

  • Rename cumulative functions cumsum -> cum_sum and similar (#12513)
  • Rename take to gather (#12528)
  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • Rename take_every to gather_every (#12531)
  • Deprecate parse_int in favor of to_integer (#12464)
  • plugins add version and context (#12433)
  • Fix scan_csv error type (#12355)
  • Rename write_csv parameter has_header to include_header (#12351)
  • Rename is_signed to is_signed_integer (#12220)
  • Rename dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
  • Rename ljust/rjust to pad_end/pad_start (#11975)

🚀 Performance improvements

  • speed up cov/corr with SIMD + strength-reduction ~3x 0.19.13/ ~2x numpy (#12471)
  • apply predicates and statistics of parquet files in streaming mode (#12439)
  • use online algorithm for cov/corr ~2x (#12412)
  • indexvec in group-by (#12371)
  • reduce allocations in hash join (#12368)
  • change concurrency parameters (#12321)
  • improve join performance through radix partitioned join (#12270)
  • remove extra multiplication in hash_to_partition (#12233)
  • allow non-power-of-two partitions (#12225)
  • Reduce compute in error message for failed datetime parsing (#12147)
  • improve parquet downloading (#12061)

✨ Enhancements

  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • support http scan_parquet (#12517)
  • Add support for UTF-8 BOM option in write_csv and sink_csv (#12253)
  • remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
  • Allow comparison of two local categories with the same hash (#12503)
  • more changes for versioned plugins (#12504)
  • plugins add version and context (#12433)
  • include i128 in more primitive functions (#12413)
  • write rolling functions as private expressions. (#12379)
  • Add round_sig_figs expression for rounding to significant figures (#11959)
  • change concurrency parameters (#12321)
  • deprecate _saturating in duration string language, make it the default (#12301)
  • auto infer ambiguous for truncate and round (#12204)
  • Rename is_signed to is_signed_integer (#12220)
  • New Config options for numeric formatting: digit grouping and thousands/decimal separator (#12099)
  • allow non-aggregation predicate in ternary groupby (#12286)
  • Add name= in .write_avro to set schema name (#12255)
  • Add support for reading zstd compressed files (no-options) in read_csv (#12214)
  • start prefetching all files immediately (#12201)
  • Add .list.to_array expression (#12192)
  • consolidate & improve all casting failure error messages (#12168)
  • tunable concurrency (#12171)
  • support reverse sort in streaming (#12169)
  • Add .arr.to_list expression (#12136)
  • add concurrency budget (#12117)
  • Introduce ignore_nulls for str.concat (#12108)
  • casting utf8 to temporal (#12072)
  • Add supertype for List/Array (#12016)
  • enable eq and neq for array dtype (#12020)
  • Expressify n of shift (#12004)
  • add dedicated name namespace for operations that affect expression names (#11973)

🐞 Bug fixes

  • fix incorrect ternary agg states (#12538)
  • fix and improve ternary evaluation on groups (#12529)
  • saturating sub in debug msg (#12525)
  • fix panic when writing Decimal type to parquet (#12532)
  • pre-fefetch struct columns in async projection pd (#12514)
  • rechunk cross join output in streaming (#12511)
  • fix as_list logical types (#12507)
  • fix streaming cross join on empty df (#12491)
  • dont overflow when calculating date range over very long periods (#12479)
  • Allow append/zip_with/extend on local categoricals (#12369)
  • Do not panic if time is invalid (#12466)
  • empty csv no-raise (#12434)
  • Fix scan_csv error type (#12355)
  • binary operations in aggregation context on literals (#12430)
  • update groups state after binary aggregation (#12415)
  • Remove extra \n when reading file-like object wi… (#12333)
  • revert ternary special broadcast, ensure broadcast is always to max height (#12395)
  • ensure first/last return null if empty (#12401)
  • Do not cast lit if has same dtype (#12342)
  • Fix index column name of rolling/dynamic group by (#12365)
  • ternary broadcasting with empty truthy or falsy and agg predicate (#12357)
  • uint64 should be correctly extracted from python object (#12338)
  • expr_output_name include literal (#12335)
  • Fix Decimal dtype table repr (#12318)
  • Fix behavior of month intervals in date_range (#12317)
  • scan emtpy csv miss row_count (#12316)
  • zip_with also broadcast mask (#12309)
  • respect hive_partitioning flag when dealing with multiple files (#12315)
  • parquet, add row_count to empty file materialization (#12310)
  • fix download ranges in parquet (#12313)
  • object store path derivation for local URL (#12308)
  • don't move right endpoint of windows in rolling in default offset==-period case (#12267)
  • Raise more informative error on invalid reshape input (#12288)
  • incorrect super type for literals in nested binary exprs (#12238)
  • Update null_count after arithmetic (#12280)
  • fix ambiguous aggregation type (#12269)
  • Consistently propagate nulls for numpy ufuncs (#12212)
  • respect return_scalar of list scalars (#12251)
  • potential overflow (#12206)
  • always start a new thread if the thread is already blocking (#12202)
  • with_row_count should block predicate push down for lazy csv (#12187)
  • rechunk failed-list series before iterate (#12189)
  • Raise if *_horizontal without inputs (#12106)
  • fix incorrect desc sort behavior (#12141)
  • take should block predicate pushdown (#12130)
  • use null type when read from unknown row (#12128)
  • boundary predicate to block all accumulated predicates in push down (#12105)
  • make python schema_overrides information available to the rust-side inference code when initialising from records/dicts (#12045)
  • fix panic when initializing Series with array of list dtype (#12148)
  • Fix schema of arr.min/max (#12127)
  • ensure filter predicate inputs exist in schema (#12089)
  • str.concat on empty list (#12066)
  • binary agg should group aware if literal not a scalar (#12043)
  • Use Arrow schema for file readers (#12048)
  • Error on duplicates in hive partitioning (#12040)
  • display fmt for str split (#12039)
  • sum_horizontal should not always cast to int (#12031)
  • fix apply_to_inner's dtype (#12010)
  • Fix padding for non-ASCII strings (#12008)
  • inline parts of unstable unicode module for stable (#12003)
  • fix dot visualization of anonymous scans (#12002)
  • SQL table aliases (#11988)

🛠️ Other improvements

  • Rename cumulative functions cumsum -> cum_sum and similar (#12513)
  • fix and improve ternary evaluation on groups (#12529)
  • Rename take to gather (#12528)
  • Add dedicated horizontal aggregation methods to DataFrame (#12492)
  • Rename take_every to gather_every (#12531)
  • Add polars-ds to list of community plugins (#12527)
  • add schema test (#12523)
  • remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
  • add test for previous commit (#12510)
  • Support Python 3.12 (#12094)
  • Fix some typos (#12485)
  • Deprecate parse_int in favor of to_integer (#12464)
  • update rustc (#12468)
  • rename the DataType in the polars-arrow crate to ArrowDataType for clarity, preventing conflation with our own/native DataType (#12459)
  • Replace outdated dev dependency tempdir (#12462)
  • move cov/corr to polars-ops (#12411)
  • use unwrap_or_else and get_unchecked_release in rolling kernels (#12405)
  • dprint/markdown link checker minor updates (#12409)
  • replace as_u64 with dirty_hash (#12327)
  • Fix ruff linting invocation (#12350)
  • Rename write_csv parameter has_header to include_header (#12351)
  • Build and verify Rust examples in docs (#12334)
  • Fix some feature flags (#12325)
  • Organize Cargo.toml (#12323)
  • remove fxhash (#12322)
  • Run rustfmt on doc examples (#12319)
  • Consolidate "getting started" and "user guide" sections (#12246)
  • deprecate _saturating in duration string language, make it the default (#12301)
  • simplify expr checking in predicate push down (#12287)
  • Replace dev dependency avro-rs with apache-avro (#12295)
  • Run clippy on all targets (#12293)
  • Add top-level make clippy, simplify Rust linting workflows (#12290)
  • ensure we git-ignore ALL .venv dirs (#12289)
  • incorrect super type for literals in nested binary exprs (#12238)
  • remove unwrap from group_by (#12263)
  • update object_store (#12006) (#12273)
  • Remove recommended setting from IDE docs (#12275)
  • Add feature flag for list.eval (#12254)
  • factor out some shared code in truncate_impl (#12229)
  • update Cargo.lock (#12226)
  • Make all functions in string namespace non-anonymous (#12215)
  • Rename dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
  • use enum for Ambiguous (#12193)
  • Standardize project name formatting across docs (#12185)
  • Update sqlparser to 0.39 (#12173)
  • pin ring (#12176)
  • Refactor FunctionExpr module (#12162)
  • Fix tests for pyarrow 14 (#12170)
  • Fix triggers for docs deployment (#12159)
  • Make all functions in binary namespace non-anonymous (#12126)
  • Consolidate contributing info (#12109)
  • Fix typo in user-guide/expressions/plugins.md (#12115)
  • Update CODEOWNERS (#12107)
  • visualize plugin directory layout in user guide (#12092)
  • Minor improvements to the docs website (#12084)
  • reshape and repeat_by non-anoymous (#12064)
  • upgrade zstd to 0.13 in polars-parquet (#12062)
  • Direct CONTRIBUTING to the docs website (#12042)
  • inline parquet2 (#12026)
  • remove parquet logic from polars-arrow and consolidate logic in polars-parquet crate. (#12022)
  • move abs to ops (#12005)
  • Rename ljust/rjust to pad_end/pad_start (#11975)
  • Disable type checking for dataframe_api_compat dependency (#11997)

Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Priyansh121096, @abstractqqq, @alexander-beedie, @braaannigan, @brayanjuls, @c-peters, @cmdlineluser, @daviskirk, @dependabot, @dependabot[bot], @dgilman, @hirohira9119, @ion-elgreco, @jerome3o, @jrycw, @mcrumiller, @messense, @moritzwilksch, @nameexhaustion, @orlp, @owrior, @rancomp, @reswqa, @ritchie46, @rob-sil, @stefmolin, @stinodego, @uchiiii, @universalmind303 and @wsyxbcl

Don't miss a new polars release

NewReleases is sending notifications on new releases.