🏆 Highlights
- implementing sink_csv for LazyFrame (#10682)
💥 Breaking changes
- empty product returns identity (#10842)
- return
f64
forrank
whenmethod="average"
(#10734) - Rename
groupby
togroup_by
(#10654) - Read/write support for IPC streams in DataFrames (#10606)
- Change behavior of
all
- fix Kleene logic implementation forall
/any
(#10564) - remove fixed_seed and add pl.set_random_seed (#10388)
- Make
arange
an alias forint_range
(#9983) date_range
/time_range
no longer return aList
type (#10526)- Remove various functionalities deprecated before
0.18
(#10527)
⚠️ Deprecations
- Rename
is_first/last
tois_first/last_distinct
(#11130) - Rename
count_match
tocount_matches
(#11028) - Rename
strip
tostrip_chars
(#10813) - Add
datetime_range
expression function (#10213) - Rename
Series/Expr.rolling_apply
torolling_map
(#10750)
🚀 Performance improvements
- improve performance of fast projection (#10945)
- parse time zones outside of downcast_iter() in replace_time_zone (#10713)
- use binary abstraction for atan2 (#10588)
- use binary abstraction in pow (#10562)
✨ Enhancements
- Expressify str.split argument. (#11117)
- Expressify argument of binary contains (#11091)
- dt.offset_by supports broadcasting lhs (#11095)
- Expressify argument of binary starts_with and ends_with (#11076)
- json_extract supports extract static and string value to list dtype (#11057)
- add quote_style="never" option for
write_csv
(#11015) - add support for nextest (#11048)
- Add
literal
for str count_match (#10996) - More dtypes supports cast to list (#11025)
- ParquetCloudSink to allow streaming pipelines into remote ObjectStores (#10060)
- Add
strip_prefix
andstrip_suffix
to the string namespace (#10958) - Add
datetime_range
expression function (#10213) - add proper cache for Regex compilation (#10934)
- implementation of
array_to_string
(#10839) - apply left side predicate pushdown also to right side if all predicate columns are also join columns (#10841)
- accept expr in
str.count_match
(#10900) - accept expressions in
.offset_by
(#9967) - implement drop as special case of
select
(#10885) - Supports is_last operation (#10760)
- activate cse for group_by (again) (#10749)
- add pairwise float sum implementation (#10756)
- implementing sink_csv for LazyFrame (#10682)
- Supports series unique & arg_unique & n_unique for list (#10743)
- repeat_by should also support broadcasting of LHS (#10735)
- deprecate 'use_earliest' argument in favour of 'ambiguous', which can take expressions (#10719)
- is_first also supports numeric list type. (#10727)
- improve slice pushdown in unions (#10723)
- Support min and max strategy for binary & str columns fill null (#10673)
- support broadcasting in list set operations (#10668)
- add
truncate_ragged_lines
(#10660) - supports cast to list (#10623)
- Rename
groupby
togroup_by
(#10654) - preserve whitespace in notebook output (#10644)
- Read/write support for IPC streams in DataFrames (#10606)
- improve binary (arity) generics (#10622)
- propagate null is in
is_in
and more generic array construction (#10614) - Change behavior of
all
- fix Kleene logic implementation forall
/any
(#10564) - frame-level
cast
support (#10504) - Add failed column to cast exception (#10507)
- Make
arange
an alias forint_range
(#9983) date_range
/time_range
no longer return aList
type (#10526)- Remove various functionalities deprecated before
0.18
(#10527)
🐞 Bug fixes
- Correct hash and fmt for struct expr (#11119)
- enforce sortedness of by argument in rolling_* functions (#11002)
- Filter on empty objectChunked should not throw error (#11073)
- ensure null_count statistics accounts for null array (#11070)
- toggle off cse if ext_context is used (#11051)
- Correct field dtype of string concat (#11055)
- pushed-down expr should be considered when evaluating ExternalContext (#11023)
- fix rolling_* functions when "by" has nanosecond resolution (#11005)
- Don't reuse member for Selector::Add (#11026)
- fix the construction of List<Null> (#10969)
- allow singular null in regex pattern (#10948)
- compute length of null array in explode (#10946)
- Allow exactly one value in start/end for
int_range
(#10914) - count was falsy tagged as cse in group by (#10917)
- Retain original dtype when deserializing an empty list (#10893)
- CSE don't accept opaque functions (#10905)
- Make
int_range(s)
exclusive on the upper bound when step is negative (#10898) - fix conversion from decimal to float (#10776)
- Add broadcasting for list comparisons (#10857)
- don't overflow length before checking limit (#10883)
- fix bug where datetimes were not parsed in read_csv when pattern had no hour or minute (#10877)
- tag amortized iter unsafe and add safe alternatives (#10881)
- use pool in dataframe arithmetic (#10864)
- remove debug
println!
from datetime fn (#10862) - repair polars_err string interpolation (#10863)
- make count_match docs and extract_all docs/impl consistent around zero matches (#10854)
- empty product returns identity (#10842)
- never panic in hash/equality doesn't hold in cse (#10836)
- Improve bound checks on temporal ranges (#10837)
- var/std behavior around few elements (#10828)
- Fix divided by zero error when read empty csv in streaming mode (#10819)
- fix equality of quantile aggregation node (#10816)
- Reading an only-header csv file in streaming mode should not panic (#10810)
- get_single_leaf can't handle Expr::Count (#10790)
- string to decimal parsing (#10712)
- support groupby literal in streaming (#10771)
ORDER BY
on unselected columns (#10752)- Fix is_in cannot cast list type for float (#10769)
- fix unicode truncation in json parsing (#10761)
- Error message of list unique should not display inner type (#10748)
- create
chunks_mut
entry in vtable (#10745) - Prevent panic on sample_n with replacement from empty df (#10731)
- only preserve sortedness flag in replace_time_zone when safe (#10738)
- Error on
value_counts
on column named"counts"
(#10737) - Build Series from empty Series vector (#10558)
- return
f64
forrank
whenmethod="average"
(#10734) - Keep min/max and arg_min/arg_max consistent. (#10716)
- Fix bug when providing custom labels and opting for duplicates in qcut (#10686)
- Cast small int type when scan csv in streaming mode. (#10679)
- Reused input series in rolling_apply should not be orderly (#10694)
- re-sort buffer when update window swap the whole buffer (#10696)
- Set the correct fast_explode flag for ListUtf8ChunkedBuilder (#10684)
- Sorted Utf8Chunked max_str and min_str should consider null value (#10675)
AllHorizontal
format string (#10658)- List<null> chunked builder should take care of series name (#10642)
- respect 'ignore_errors=False' in csv parser (#10641)
- fix rename + projection pushdown (#10624)
- fix int/float downcast in
is_in
(#10620) - Change behavior of
all
- fix Kleene logic implementation forall
/any
(#10564) - Fix serialization for categorical chunked. (#10609)
- join_asof missing
tolerance
implementation, address edge-cases (#10482) - Take input_schema to create physical expr for Selection (#10571)
- fix serialization of empty lists (#10563)
- Clear window cache after evaluate predication expr (#10505)
- Parsing regex col in Expr::Columns (#10551)
- sanitize column naming in boolean ops (#10531)
- fix build for wasm (#10536)
- remove fixed_seed and add pl.set_random_seed (#10388)
- fix build for wasm (#9502)
- rollback cse in groupby: python 0.18.15 (#10491)
🛠️ Other improvements
- Removed duplicated example (#11109)
- Add CODEOWNERS for docs folder (#11107)
- Refactor starts_with and ends_with for string (#11085)
- Integrate user guide (#11089)
- remove feature gate join/groupby in polars-core (#10965)
- Add Documentation issue type (#11042)
- complete intra-docs in api documentation (#11007)
- genericize take implementation (#10976)
- genericize PolarsDataType (#10952)
- enhance internal crates readme with reference to main crate (#10928)
- Add
Duration
method for checking full days (#10850) - apply with_name in more places (#10899)
- never compare opaque functions (#10906)
- eliminate repetition in utf8 datetime functions (#10860)
- Fix issue templates for bug reports (#10896)
- remove
LocalProjection
(#10886) - request verbose logging output of minimal reproducable examples (#10882)
- Reorganize
range
expression module (#10871) - introduce with_name for Series/ChunkedArray (#10859)
- Further refactor temporal range functions (#10844)
- Refactor
range
related functions (#10830) - Fix the un-compile Black box function parts in polars lazy cookbook (#10809)
- Fix some broken links / formatting (#10772)
- Improve docs for
polars-lazy
(#10729) - update rustc nightly_2023-08-26 (#10467)
- default to rust native flate2 lib (#10733)
- Clear GitHub Actions caches weekly (#10715)
- move 'is_in' to polars-ops (#10645)
- Clean up schema calculation for
date_range
(#10653) - remove unused apply functions and add fallible generic apply functions (#10621)
- Enforce up-to-date
Cargo.lock
(#10555) - make binary chunkedarray functions DRY (#10607)
- bump MSRV to 1.65 (#10568)
- genericize chunk implementation (#10506)
- use ChunkArray::(try_)from_chunk_iter (#10497)
- add VSCode rust-analyzer settings (#10498)
- Update URLs for dev documentation (#10495)
- Update features for latest
flate2
release (#10492)
Thank you to all our contributors for making this release possible!
@Barsik-sus, @I8dNLo, @JulianCologne, @KacpiW, @MarcoGorelli, @Object905, @OndrejSlamecka, @Qqwy, @SeanTroyUWO, @TNieuwdorp, @VasanthakumarV, @alexander-beedie, @aminalaee, @antoniocali, @braaannigan, @bvanelli, @c-peters, @cjackal, @cmdlineluser, @dependabot, @dependabot[bot], @drgif, @henrikig, @ion-elgreco, @jakob-keller, @jeroenjanssens, @jonashaag, @lorepozo, @marki259, @mcrumiller, @messense, @mrogowski11, @nameexhaustion, @orlp, @owrior, @rben01, @reswqa, @ritchie46, @s-banach, @sdamashek, @stinodego, @svaningelgem, @thomasjpfan, @titoeb, @trueb2, @washcycle, @wdoppenberg and @zundertj