pola-rs/polars rs-0.35.0 on GitHub

🏆 Highlights

improve join performance through radix partitioned join (#12270)

💥 Breaking changes

Rename cumulative functions cumsum -> cum_sum and similar (#12513)
Rename take to gather (#12528)
Add dedicated horizontal aggregation methods to DataFrame (#12492)
Rename take_every to gather_every (#12531)
Deprecate parse_int in favor of to_integer (#12464)
plugins add version and context (#12433)
Fix scan_csv error type (#12355)
Rename write_csv parameter has_header to include_header (#12351)
Rename is_signed to is_signed_integer (#12220)
Rename dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
Rename ljust/rjust to pad_end/pad_start (#11975)

🚀 Performance improvements

speed up cov/corr with SIMD + strength-reduction ~3x 0.19.13/ ~2x numpy (#12471)
apply predicates and statistics of parquet files in streaming mode (#12439)
use online algorithm for cov/corr ~2x (#12412)
indexvec in group-by (#12371)
reduce allocations in hash join (#12368)
change concurrency parameters (#12321)
improve join performance through radix partitioned join (#12270)
remove extra multiplication in hash_to_partition (#12233)
allow non-power-of-two partitions (#12225)
Reduce compute in error message for failed datetime parsing (#12147)
improve parquet downloading (#12061)

✨ Enhancements

Add dedicated horizontal aggregation methods to DataFrame (#12492)
support http scan_parquet (#12517)
Add support for UTF-8 BOM option in write_csv and sink_csv (#12253)
remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
Allow comparison of two local categories with the same hash (#12503)
more changes for versioned plugins (#12504)
plugins add version and context (#12433)
include i128 in more primitive functions (#12413)
write rolling functions as private expressions. (#12379)
Add round_sig_figs expression for rounding to significant figures (#11959)
change concurrency parameters (#12321)
deprecate _saturating in duration string language, make it the default (#12301)
auto infer ambiguous for truncate and round (#12204)
Rename is_signed to is_signed_integer (#12220)
New Config options for numeric formatting: digit grouping and thousands/decimal separator (#12099)
allow non-aggregation predicate in ternary groupby (#12286)
Add name= in .write_avro to set schema name (#12255)
Add support for reading zstd compressed files (no-options) in read_csv (#12214)
start prefetching all files immediately (#12201)
Add .list.to_array expression (#12192)
consolidate & improve all casting failure error messages (#12168)
tunable concurrency (#12171)
support reverse sort in streaming (#12169)
Add .arr.to_list expression (#12136)
add concurrency budget (#12117)
Introduce ignore_nulls for str.concat (#12108)
casting utf8 to temporal (#12072)
Add supertype for List/Array (#12016)
enable eq and neq for array dtype (#12020)
Expressify n of shift (#12004)
add dedicated name namespace for operations that affect expression names (#11973)

🐞 Bug fixes

fix incorrect ternary agg states (#12538)
fix and improve ternary evaluation on groups (#12529)
saturating sub in debug msg (#12525)
fix panic when writing Decimal type to parquet (#12532)
pre-fefetch struct columns in async projection pd (#12514)
rechunk cross join output in streaming (#12511)
fix as_list logical types (#12507)
fix streaming cross join on empty df (#12491)
dont overflow when calculating date range over very long periods (#12479)
Allow append/zip_with/extend on local categoricals (#12369)
Do not panic if time is invalid (#12466)
empty csv no-raise (#12434)
Fix scan_csv error type (#12355)
binary operations in aggregation context on literals (#12430)
update groups state after binary aggregation (#12415)
Remove extra \n when reading file-like object wi… (#12333)
revert ternary special broadcast, ensure broadcast is always to max height (#12395)
ensure first/last return null if empty (#12401)
Do not cast lit if has same dtype (#12342)
Fix index column name of rolling/dynamic group by (#12365)
ternary broadcasting with empty truthy or falsy and agg predicate (#12357)
uint64 should be correctly extracted from python object (#12338)
expr_output_name include literal (#12335)
Fix Decimal dtype table repr (#12318)
Fix behavior of month intervals in date_range (#12317)
scan emtpy csv miss row_count (#12316)
zip_with also broadcast mask (#12309)
respect hive_partitioning flag when dealing with multiple files (#12315)
parquet, add row_count to empty file materialization (#12310)
fix download ranges in parquet (#12313)
object store path derivation for local URL (#12308)
don't move right endpoint of windows in rolling in default offset==-period case (#12267)
Raise more informative error on invalid reshape input (#12288)
incorrect super type for literals in nested binary exprs (#12238)
Update null_count after arithmetic (#12280)
fix ambiguous aggregation type (#12269)
Consistently propagate nulls for numpy ufuncs (#12212)
respect return_scalar of list scalars (#12251)
potential overflow (#12206)
always start a new thread if the thread is already blocking (#12202)
with_row_count should block predicate push down for lazy csv (#12187)
rechunk failed-list series before iterate (#12189)
Raise if *_horizontal without inputs (#12106)
fix incorrect desc sort behavior (#12141)
take should block predicate pushdown (#12130)
use null type when read from unknown row (#12128)
boundary predicate to block all accumulated predicates in push down (#12105)
make python schema_overrides information available to the rust-side inference code when initialising from records/dicts (#12045)
fix panic when initializing Series with array of list dtype (#12148)
Fix schema of arr.min/max (#12127)
ensure filter predicate inputs exist in schema (#12089)
str.concat on empty list (#12066)
binary agg should group aware if literal not a scalar (#12043)
Use Arrow schema for file readers (#12048)
Error on duplicates in hive partitioning (#12040)
display fmt for str split (#12039)
sum_horizontal should not always cast to int (#12031)
fix apply_to_inner's dtype (#12010)
Fix padding for non-ASCII strings (#12008)
inline parts of unstable unicode module for stable (#12003)
fix dot visualization of anonymous scans (#12002)
SQL table aliases (#11988)

🛠️ Other improvements

Rename cumulative functions cumsum -> cum_sum and similar (#12513)
fix and improve ternary evaluation on groups (#12529)
Rename take to gather (#12528)
Add dedicated horizontal aggregation methods to DataFrame (#12492)
Rename take_every to gather_every (#12531)
Add polars-ds to list of community plugins (#12527)
add schema test (#12523)
remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
add test for previous commit (#12510)
Support Python 3.12 (#12094)
Fix some typos (#12485)
Deprecate parse_int in favor of to_integer (#12464)
update rustc (#12468)
rename the DataType in the polars-arrow crate to ArrowDataType for clarity, preventing conflation with our own/native DataType (#12459)
Replace outdated dev dependency tempdir (#12462)
move cov/corr to polars-ops (#12411)
use unwrap_or_else and get_unchecked_release in rolling kernels (#12405)
dprint/markdown link checker minor updates (#12409)
replace as_u64 with dirty_hash (#12327)
Fix ruff linting invocation (#12350)
Rename write_csv parameter has_header to include_header (#12351)
Build and verify Rust examples in docs (#12334)
Fix some feature flags (#12325)
Organize Cargo.toml (#12323)
remove fxhash (#12322)
Run rustfmt on doc examples (#12319)
Consolidate "getting started" and "user guide" sections (#12246)
deprecate _saturating in duration string language, make it the default (#12301)
simplify expr checking in predicate push down (#12287)
Replace dev dependency avro-rs with apache-avro (#12295)
Run clippy on all targets (#12293)
Add top-level make clippy, simplify Rust linting workflows (#12290)
ensure we git-ignore ALL .venv dirs (#12289)
incorrect super type for literals in nested binary exprs (#12238)
remove unwrap from group_by (#12263)
update object_store (#12006) (#12273)
Remove recommended setting from IDE docs (#12275)
Add feature flag for list.eval (#12254)
factor out some shared code in truncate_impl (#12229)
update Cargo.lock (#12226)
Make all functions in string namespace non-anonymous (#12215)
Rename dt.seconds to dt.total_seconds (likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179)
use enum for Ambiguous (#12193)
Standardize project name formatting across docs (#12185)
Update sqlparser to 0.39 (#12173)
pin ring (#12176)
Refactor FunctionExpr module (#12162)
Fix tests for pyarrow 14 (#12170)
Fix triggers for docs deployment (#12159)
Make all functions in binary namespace non-anonymous (#12126)
Consolidate contributing info (#12109)
Fix typo in user-guide/expressions/plugins.md (#12115)
Update CODEOWNERS (#12107)
visualize plugin directory layout in user guide (#12092)
Minor improvements to the docs website (#12084)
reshape and repeat_by non-anoymous (#12064)
upgrade zstd to 0.13 in polars-parquet (#12062)
Direct CONTRIBUTING to the docs website (#12042)
inline parquet2 (#12026)
remove parquet logic from polars-arrow and consolidate logic in polars-parquet crate. (#12022)
move abs to ops (#12005)
Rename ljust/rjust to pad_end/pad_start (#11975)
Disable type checking for dataframe_api_compat dependency (#11997)

Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Priyansh121096, @abstractqqq, @alexander-beedie, @braaannigan, @brayanjuls, @c-peters, @cmdlineluser, @daviskirk, @dependabot, @dependabot[bot], @dgilman, @hirohira9119, @ion-elgreco, @jerome3o, @jrycw, @mcrumiller, @messense, @moritzwilksch, @nameexhaustion, @orlp, @owrior, @rancomp, @reswqa, @ritchie46, @rob-sil, @stefmolin, @stinodego, @uchiiii, @universalmind303 and @wsyxbcl

pola-rs/polars rs-0.35.0 Rust Polars 0.35.0 on GitHub

🏆 Highlights

💥 Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

pola-rs/polars rs-0.35.0
Rust Polars 0.35.0

on GitHub