pola-rs/polars rs-0.33.0 on GitHub

🏆 Highlights

implementing sink_csv for LazyFrame (#10682)

💥 Breaking changes

empty product returns identity (#10842)
return f64 for rank when method="average" (#10734)
Rename groupby to group_by (#10654)
Read/write support for IPC streams in DataFrames (#10606)
Change behavior of all - fix Kleene logic implementation for all/any (#10564)
remove fixed_seed and add pl.set_random_seed (#10388)
Make arange an alias for int_range (#9983)
date_range/time_range no longer return a List type (#10526)
Remove various functionalities deprecated before 0.18 (#10527)

⚠️ Deprecations

Rename is_first/last to is_first/last_distinct (#11130)
Rename count_match to count_matches (#11028)
Rename strip to strip_chars (#10813)
Add datetime_range expression function (#10213)
Rename Series/Expr.rolling_apply to rolling_map (#10750)

🚀 Performance improvements

improve performance of fast projection (#10945)
parse time zones outside of downcast_iter() in replace_time_zone (#10713)
use binary abstraction for atan2 (#10588)
use binary abstraction in pow (#10562)

✨ Enhancements

Expressify str.split argument. (#11117)
Expressify argument of binary contains (#11091)
dt.offset_by supports broadcasting lhs (#11095)
Expressify argument of binary starts_with and ends_with (#11076)
json_extract supports extract static and string value to list dtype (#11057)
add quote_style="never" option for write_csv (#11015)
add support for nextest (#11048)
Add literal for str count_match (#10996)
More dtypes supports cast to list (#11025)
ParquetCloudSink to allow streaming pipelines into remote ObjectStores (#10060)
Add strip_prefix and strip_suffix to the string namespace (#10958)
Add datetime_range expression function (#10213)
add proper cache for Regex compilation (#10934)
implementation of array_to_string (#10839)
apply left side predicate pushdown also to right side if all predicate columns are also join columns (#10841)
accept expr in str.count_match (#10900)
accept expressions in .offset_by (#9967)
implement drop as special case of select (#10885)
Supports is_last operation (#10760)
activate cse for group_by (again) (#10749)
add pairwise float sum implementation (#10756)
implementing sink_csv for LazyFrame (#10682)
Supports series unique & arg_unique & n_unique for list (#10743)
repeat_by should also support broadcasting of LHS (#10735)
deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
is_first also supports numeric list type. (#10727)
improve slice pushdown in unions (#10723)
Support min and max strategy for binary & str columns fill null (#10673)
support broadcasting in list set operations (#10668)
add truncate_ragged_lines (#10660)
supports cast to list (#10623)
Rename groupby to group_by (#10654)
preserve whitespace in notebook output (#10644)
Read/write support for IPC streams in DataFrames (#10606)
improve binary (arity) generics (#10622)
propagate null is in is_in and more generic array construction (#10614)
Change behavior of all - fix Kleene logic implementation for all/any (#10564)
frame-level cast support (#10504)
Add failed column to cast exception (#10507)
Make arange an alias for int_range (#9983)
date_range/time_range no longer return a List type (#10526)
Remove various functionalities deprecated before 0.18 (#10527)

🐞 Bug fixes

Correct hash and fmt for struct expr (#11119)
enforce sortedness of by argument in rolling_* functions (#11002)
Filter on empty objectChunked should not throw error (#11073)
ensure null_count statistics accounts for null array (#11070)
toggle off cse if ext_context is used (#11051)
Correct field dtype of string concat (#11055)
pushed-down expr should be considered when evaluating ExternalContext (#11023)
fix rolling_* functions when "by" has nanosecond resolution (#11005)
Don't reuse member for Selector::Add (#11026)
fix the construction of List<Null> (#10969)
allow singular null in regex pattern (#10948)
compute length of null array in explode (#10946)
Allow exactly one value in start/end for int_range (#10914)
count was falsy tagged as cse in group by (#10917)
Retain original dtype when deserializing an empty list (#10893)
CSE don't accept opaque functions (#10905)
Make int_range(s) exclusive on the upper bound when step is negative (#10898)
fix conversion from decimal to float (#10776)
Add broadcasting for list comparisons (#10857)
don't overflow length before checking limit (#10883)
fix bug where datetimes were not parsed in read_csv when pattern had no hour or minute (#10877)
tag amortized iter unsafe and add safe alternatives (#10881)
use pool in dataframe arithmetic (#10864)
remove debug println! from datetime fn (#10862)
repair polars_err string interpolation (#10863)
make count_match docs and extract_all docs/impl consistent around zero matches (#10854)
empty product returns identity (#10842)
never panic in hash/equality doesn't hold in cse (#10836)
Improve bound checks on temporal ranges (#10837)
var/std behavior around few elements (#10828)
Fix divided by zero error when read empty csv in streaming mode (#10819)
fix equality of quantile aggregation node (#10816)
Reading an only-header csv file in streaming mode should not panic (#10810)
get_single_leaf can't handle Expr::Count (#10790)
string to decimal parsing (#10712)
support groupby literal in streaming (#10771)
ORDER BY on unselected columns (#10752)
Fix is_in cannot cast list type for float (#10769)
fix unicode truncation in json parsing (#10761)
Error message of list unique should not display inner type (#10748)
create chunks_mut entry in vtable (#10745)
Prevent panic on sample_n with replacement from empty df (#10731)
only preserve sortedness flag in replace_time_zone when safe (#10738)
Error on value_counts on column named "counts" (#10737)
Build Series from empty Series vector (#10558)
return f64 for rank when method="average" (#10734)
Keep min/max and arg_min/arg_max consistent. (#10716)
Fix bug when providing custom labels and opting for duplicates in qcut (#10686)
Cast small int type when scan csv in streaming mode. (#10679)
Reused input series in rolling_apply should not be orderly (#10694)
re-sort buffer when update window swap the whole buffer (#10696)
Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
AllHorizontal format string (#10658)
List<null> chunked builder should take care of series name (#10642)
respect 'ignore_errors=False' in csv parser (#10641)
fix rename + projection pushdown (#10624)
fix int/float downcast in is_in (#10620)
Change behavior of all - fix Kleene logic implementation for all/any (#10564)
Fix serialization for categorical chunked. (#10609)
join_asof missing tolerance implementation, address edge-cases (#10482)
Take input_schema to create physical expr for Selection (#10571)
fix serialization of empty lists (#10563)
Clear window cache after evaluate predication expr (#10505)
Parsing regex col in Expr::Columns (#10551)
sanitize column naming in boolean ops (#10531)
fix build for wasm (#10536)
remove fixed_seed and add pl.set_random_seed (#10388)
fix build for wasm (#9502)
rollback cse in groupby: python 0.18.15 (#10491)

🛠️ Other improvements

Removed duplicated example (#11109)
Add CODEOWNERS for docs folder (#11107)
Refactor starts_with and ends_with for string (#11085)
Integrate user guide (#11089)
remove feature gate join/groupby in polars-core (#10965)
Add Documentation issue type (#11042)
complete intra-docs in api documentation (#11007)
genericize take implementation (#10976)
genericize PolarsDataType (#10952)
enhance internal crates readme with reference to main crate (#10928)
Add Duration method for checking full days (#10850)
apply with_name in more places (#10899)
never compare opaque functions (#10906)
eliminate repetition in utf8 datetime functions (#10860)
Fix issue templates for bug reports (#10896)
remove LocalProjection (#10886)
request verbose logging output of minimal reproducable examples (#10882)
Reorganize range expression module (#10871)
introduce with_name for Series/ChunkedArray (#10859)
Further refactor temporal range functions (#10844)
Refactor range related functions (#10830)
Fix the un-compile Black box function parts in polars lazy cookbook (#10809)
Fix some broken links / formatting (#10772)
Improve docs for polars-lazy (#10729)
update rustc nightly_2023-08-26 (#10467)
default to rust native flate2 lib (#10733)
Clear GitHub Actions caches weekly (#10715)
move 'is_in' to polars-ops (#10645)
Clean up schema calculation for date_range (#10653)
remove unused apply functions and add fallible generic apply functions (#10621)
Enforce up-to-date Cargo.lock (#10555)
make binary chunkedarray functions DRY (#10607)
bump MSRV to 1.65 (#10568)
genericize chunk implementation (#10506)
use ChunkArray::(try_)from_chunk_iter (#10497)
add VSCode rust-analyzer settings (#10498)
Update URLs for dev documentation (#10495)
Update features for latest flate2 release (#10492)

Thank you to all our contributors for making this release possible!
@Barsik-sus, @I8dNLo, @JulianCologne, @KacpiW, @MarcoGorelli, @Object905, @OndrejSlamecka, @Qqwy, @SeanTroyUWO, @TNieuwdorp, @VasanthakumarV, @alexander-beedie, @aminalaee, @antoniocali, @braaannigan, @bvanelli, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @henrikig, @ion-elgreco, @jakob-keller, @jeroenjanssens, @jonashaag, @lorepozo, @marki259, @mcrumiller, @messense, @mrogowski11, @nameexhaustion, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @s-banach, @sdamashek, @stinodego, @svaningelgem, @thomasjpfan, @titoeb, @trueb2, @washcycle, @wdoppenberg and @zundertj

pola-rs/polars rs-0.33.0 Rust Polars 0.33 on GitHub

🏆 Highlights

💥 Breaking changes

⚠️ Deprecations

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

🛠️ Other improvements

pola-rs/polars rs-0.33.0
Rust Polars 0.33

on GitHub