pola-rs/polars rs-0.34.0 on GitHub

🏆 Highlights

postfix rolling expression as a special case of window functions. (#11445)
support 'hive partitioning' aware readers (#11284)

💥 Breaking changes

Rename .list.lengths and .str.lengths (#11613)
Rename write_csv parameter quote to quote_char (#11583)
Add disable_string_cache (#11020)

🚀 Performance improvements

fix regression non-null asof join (#11984)
drasticly improve performance of limit on async parquet datasets (#11965)
support multiple files in a single scan parquet node. (#11922)
fix accidental quadratic behavior; cache null_count (#11889)
fix quadratic behavior in append sorted check (#11893)
properly push down slice before left/asof join (#11854)
Improve performance of cot (cotangent) (#11717)
rechunk before grouping on multiple keys (#11711)
process parquet statistics before downloading row-group (#11709)
push down predicates that refer to group_by keys (#11687)
slightly faster float equality (#11652)
actually use projection information in async parquet reader (#11637)
improve performance and fix panic in async parquet reader (#11607)
use try_binary_elementwise over try_binary_elementwise_values (#11596)
skip empty chunks in concat (#11565)
improve sparse sample performance (#11544)
early return in replace_time_zone if target and source time zones match (#11478)
greatly improve parquet cloud reading (#11479)
ensure we download row-groups concurrently. (#11464)
don't load N metadata files when globbing N files (#11422)
remove double memcopy (#11365)
adress perf regression (#11354)
improve dynamic_groupby_iter (#11341)
improve and fix rolling windows by linear scanning (#11326)
improve outer join materialization (#11241)
use ryu and itoa for primitive serialization (#11193)
use try-binary-elementwise instead of try-binary-elementwise-values in dt_truncate (#11189)
Using cache for str.contains regex compilation (#11183)

✨ Enhancements

optimize asof_join and allow null/string keys (#11712)
limit concurrent downloads in async parquet (#11971)
sample fraction can take an expr (#11943)
Add infer_schema_length to pl.read_json (#11724)
improve error handling in scan_parquet and deal with file limits (#11938)
support multiple files in a single scan parquet node. (#11922)
error instead of panic in unsupported sinks (#11915)
Introduce list.sample (#11845)
don't require empty config for cloud scan_parquet (#11819)
Expressify pct_change and move to ops (#11786)
add DATE function for SQL (#11541)
right-align numeric columns (#7475)
Add config setting to control how many List items are printed (#11409)
allow specifying schema in pl.scan_ndjson (#10963)
easier arrow2/arrow-rs conversion (#11666)
support multiple sources in scan_file (#11661)
allow coalesce in streaming (#11633)
Implement schema, schema_override for pl.read_json with array-like input (#11492)
add SQL support for UNION [ALL] BY NAME, add "diagonal_relaxed" strategy for pl.concat (#11597)
improve performance and fix panic in async parquet reader (#11607)
add time_unit argument to duration, default to "us" (#11586)
elide overflow checks on i64 (#11563)
add INITCAP string function for SQL (#9884)
Use IPC for (un)pickling dataframes/series (#11507)
support left and right anti/semi joins from the SQL interface (#11501)
expressify peak_min/peak_max (#11482)
IN(subquery) and SQL Subquery Infrastructure (#11218)
Format null arrays in Series (#11289)
postfix rolling expression as a special case of window functions. (#11445)
allow for "by" column to be of dtype Date in rolling_* functions (#11004)
support 'abfss' for azure (#11413)
multi-threaded async runtime (#11411)
async parquet. (#11403)
fail fast when invalid cloud settings; introduce retries arg (#11380)
modernize CPU features (#11351)
introduce 'label' instead of 'truncate' in group_by_dynamic, which can take label='right' (#11337)
Expressify list.shift (#11320)
add gather_skip_nulls implementation (#11329)
top_k and bottom_k supports pass an expr (#11344)
support 'hive partitioning' aware readers (#11284)
str.strip_chars supports take an expr argument (#11313)
sample n can take an expr (#11257)
Add disable_string_cache (#11020)
clip supports expr arguments and physical numeric dtype (#11288)
Introduce list.drop_nulls (#11272)
str.splitn and split_exact can take an expr argument by (#11275)
introduce ambiguous option for dt.round (#11269)
improve binary helper so we don't need to rechunk. (#11247)
Adds NULLIF and COALESCE SQL functions (#11124)
better tree-formatting representation (#11176)
Support duration + date (#11190)
binary search and rechunk in chunked gather (#11199)
Expressify str.strip_prefix & suffix (#11197)
sql udfs (#10957)
run cloud parquet reader in default engine (#11196)
list.join's separator can be expression (#11167)
argument every of datetime.truncate can be expression (#11155)

🐞 Bug fixes

fix streaming multi-column/multi-dtype sort (#11981)
ensure streaming parquet datasets deal with limits (#11977)
implement proper hash for identifier in cse (#11960)
fix take return dtype in group context. (#11949)
sql In should work without specific ops (#11947)
construct list series from any values subject to dtype (#11944)
avoid integer overflow in offsets_to_groups when bigidx is enabled (#11901)
read_csv for empty lines (#11924)
predicate push-down remove predicate refers to alias for more branch (#11887)
use physcial append (#11894)
recursively apply cast_unchecked in lists (#11884)
recursively check allowed streaming dtypes (#11879)
fix project pushdown for double projection contains count (#11843)
series.to_numpy fails with dtype=Null (#11858)
panic on hive scan from cloud (#11847)
Propagate validity when cast primitive to list (#11846)
Edge cases for list count formatting (#11780)
remove flag inconsistency 'map_many' (#11817)
ensure projections containing only hive columns are projected (#11803)
patch broken aHash AES intrinsics on ARM (#11801)
fix key in object-store cache (#11790)
handle logical types in plugins (#11788)
make PyLazyGroupby reusable (#11769)
only exclude final output names of group_by key expressions (#11768)
fix ambiguity wrt list aggregation states (#11758)
Correctly process subseconds in pl.duration (#11748)
LazyFrame.drop_columns overflow issue when columns.len()>schema.len() (#11716)
index_to_chunked_index's fast path is not correct (#11710)
use actual number of read rows for hive materialization (#11690)
return float dtype in interpolate (for method="linear") for numeric dtypes (#11624)
fix seg fault in concat_str of empty series (#11704)
Fix match on last item for join_asof with strategy="nearest" (#11673)
fix display str for peak_max and top_k (#11657)
Fix input replacement logic for slice (#11631)
slice expr can be taken in cse (#11628)
ensure nested logical types are converted to physical (#11621)
correctly convert nullability of nested parquet fields to arrow (#11619)
improve performance and fix panic in async parquet reader (#11607)
expand all literals before group_by (#11590)
mark take_group_last function as unsafe (#11587)
handle unary operators applied to numbers used in SQL IN clauses (#11574)
Align new_columns argument for scan_csv and read_csv (#11575)
don't conflate supported UNION ops in the SQL parser with (currently) unsupported UNION "BY NAME" variations (#11576)
incomplete reading of list types from parquet (#11578)
respect identity in horizontal sum (#11559)
bug in BitMask::get_u32 (#11560)
take slice into account in parallel unions (#11558)
correct schema empty df in hive partitioning read (#11557)
ensure ListChunked::full_null uses physical types (#11554)
respect 'hive_partitioning' argument in parquet (#11551)
fix parquet deserialization Overflow error by using i64 offset types when promoting Arrow Lists to LargeLists (#11549)
streamline is_in handling of mismatched dtypes and fix a minor regression (#11533)
catch use of non equi-joins in SQL interface and raise appropriate error (#11526)
rework SQL join constraint processing to properly account for all USING columns (#11518)
literal hash (#11508)
Fix lazy schema for cut/qcut when allow_breaks=True (#11287)
correct output schema of hive partition and projection at scan (#11499)
correct projection pushdown in hive partitioned read (#11486)
fix for write_csv when using non-default "quote" char (#11474)
fix deserialization of parquets with large string list columns causing stack overflow (#11471)
Fix SQL ANY and ALL behaviour (#10879)
address multiple issues caused by implicit casting of is_in values to the column dtype being searched (#11427)
raise on invalid sort_by group lengths (#11423)
fix outer join on bools (#11417)
fix categorical collect (#11414)
Free bitmap when slicing into a non-null array (#11405)
async parquet. (#11403)
Fix edge-case where the Array dtype could (internally) be considered numeric (#11398)
Fix empty check when building a list (#11378)
more cloud urls (#11361)
ensure cloud globbing can deal with spaces (#11360)
recognize more cloud urls (#11357)
Fix Series.__contains__ for None values and implement is_in for null Series (#11345)
don't panic on multi-nodes in streaming conversion (#11343)
ensure trailing quote is written for temporal data when CSV quote_style is non-numeric (#11328)
fix empty Series construction edge-case with Struct dtype (#11301)
add missing feature flags on tests (#11305)
set partitions independent of thread pool (#11304)
parse sign for decimal properly (#11302)
consume duplicates in rolling_by window (#11261)
handle url encoded paths in objectpath creation (#11240)
use POOL when writing csv (#11222)
is_in for bool evaluate has_false incorrectly (#11217)
fix nullable filter mask in group_by (#11207)
replace n-th in filter (#11206)
fix translation of Series-nested datetime/date values for scan_pyarrow predicates (#11195)
impl hash for more function expr (#11182)
list.join's separator can be expression (#11167)
Add some missing expr type hint for series (#11171)
Make pl.struct serializable (#11169)
Fix rust test for logical plan optimizer for categoricals (#11135)
propagate null value for str/binary starts/ends_with and contains (#11141)

🛠️ Other improvements

optimize asof_join and allow null/string keys (#11712)
Add Development and Releases sections to the documentation (#11932)
use ahash from crates.io release (#11964)
move unique_counts to ops (#11963)
fix take return dtype in group context. (#11949)
move moment to ops (#11941)
fix some typos and add polars-business to curated plugin list (#11916)
prepare for multiple files in a node (#11918)
load 40x40 avatar from github and add loading=lazy attribute. (#11886)
Fix Cargo warning for parquet2 dependency (#11882)
Allow manual trigger for docs deployment (#11881)
rename new_from_owned_with_null_bitmap (#11828)
add section about plugins (#11855)
fix incorrect example of valid time zones (#11873)
Bump docs dependencies (#11852)
add missing polars-ops tests to CI (#11859)
Update doc comments for with_column to reflect that columns can be updated (#11840)
Move round to ops (#11838)
arrow: remove unused arithmetic code and remove doctests (#11820)
Move diff to polars-ops (#11818)
remove redundant if branch in nested parquet (#11814)
Move ewma to polars-ops (#11794)
Make some functions in dsl::mod non-anonymous (#11799)
Move cum_agg to polars-ops (#11770)
more granular polars-ops imports (#11760)
Make all emw function expr non-anonymous (#11638)
clarify polars-arrow <=> arrow2 license (#11755)
Version polars-arrow with the other crates (#11738)
fill missing fill_null strategies (#11751)
Minor fix in code example in section Coming from Pandas (#11745) (#11745)
Update group_by_dynamic example (#11737)
merge nano-arrow/polars-arrow (#11719)
Improving the documentation of the SQL expressions (#11708)
*_horizontal dependent on reduce_expr to expression architecture (#11685)
update document of folds (#11705)
update rustc and fix future (#11696)
better align help command output following addition of some longer options (#11681)
sum_horizontal to expression architecture (#11659)
Cleanup the match block for date inference (#11677)
Adding feature annotation (#11671)
add note about use of polars-lts-cpu for macOS x86-64/rosetta (#11660)
improve rank implementation, especially around nulls (#11651)
Bring cloud monikers in line with the ones in is_cloud_url (#11629)
Rename .list.lengths and .str.lengths (#11613)
Make backwardfill and forwardfill function expr non-anonymous (#11630)
Make all expr in dt namespace non-anonymous (#11627)
Fix changelog for language-specific breaking changes (#11617)
avoid nightly rust for case conversion (#11610)
Make value_counts and unique_counts function expr non-anonymous (#11601)
Make arg_min(max), diff in list namespace non-anonymous (#11602)
Rename write_csv parameter quote to quote_char (#11583)
use a generic consistent total ordering, also for floats (#11468)
Move mode operation from core to ops crate (#11543)
fix lints (#11555)
use single threaded take under certain values size (#11539)
fix some features (#11529)
move (hor_)str_concat to polars-ops (#11488)
minor changes in peak-min/max (#11491)
align cloud url regex in rust and python (#11481)
move AnonymousScan into Scan node (#11502)
move repeat_by to polars-ops (#11461)
upgrade to nightly-10-02 (#11460)
Update contributing guide to include memory requirement (#11458)
remove unused order_by attribute (#11434)
cleanup sort_by expresion impl (#11431)
large windows runner for release (#11370)
Fix error message reference to infer_schema_length (#11358)
move rank to polars-ops (#11349)
unify display for namespaced function expr (#11342)
Fix some cargo manifest warnings (#11327)
Use GITHUB_TOKEN to get contributor information for docs (#11321)
Add disable_string_cache (#11020)
remove default auto-explode for map_many_private (#11270)
Add API links for Rust user guide examples (#11294)
update a few dependencies (#11283)
move scan helpers to separate module (#11279)
update sponsors (#11271)
bump chrono to 0.4.31 (#11258)
bind all remaining method in StringNameSpace to function expr (#11229)
Make some list function expr non-anonymous (#11230)
remove lz4_flex feature (#11253)
remove unnecessary transmute (#11250)
move (almost) all join related code from polars-core to polars-ops. (#11228)
Mention the performant feature only once (#11223)
remove unneeded indirection (#11233)
remove unneeded mutex around object-store (#11224)
bind struct.rename_fields to function expr (#11215)
fix un-compilable rust example in user guide. (#11214)
add various missing expression doc-comments (#11213)
Fix user_guide of str.split (#11185)
New take implementation (#11138)
Fix rust test for logical plan optimizer for categoricals (#11135)

Thank you to all our contributors for making this release possible!
@ByteNybbler, @Cheukting, @Fokko, @Hofer-Julian, @JulianCologne, @LaurynasMiksys, @MarcoGorelli, @Rohxn16, @SeanTroyUWO, @TheDataScientistNL, @Walnut356, @aberres, @alexander-beedie, @alicja-januszkiewicz, @andysham, @billylanchantin, @bowlofeggs, @c-peters, @cmdlineluser, @dannyvankooten, @dependabot, @dependabot[bot], @ewoolsey, @jhorstmann, @jonashaag, @jrycw, @mcrumiller, @messense, @nameexhaustion, @orlp, @petrosbar, @ptiza, @rancomp, @reswqa, @ritchie46, @rjthoen, @romanovacca, @sd2k, @shenker, @squnit, @stinodego, @svaningelgem, @thomasjpfan, @uchiiii, @universalmind303 and Romano Vacca

pola-rs/polars rs-0.34.0 Rust Polars 0.34.0 on GitHub

🏆 Highlights

💥 Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

pola-rs/polars rs-0.34.0
Rust Polars 0.34.0

on GitHub