⚠️ Breaking changes
- rename some function arguments (#8017)
- don't create duplicate pivot names (#8002)
- Remove deprecated behaviour (#7978)
- rename
toggle_string_cache
toenable_string_cache
(#7970) - change top_k(descending) -> bottom_k (#7969)
- in
sort
,top_k
,sort_by
, andarg_sort_by
, raise ifdescending
is a sequence and its length doesn't match the number of columns to sort by (#7957) - Use RowsError instead of RowsException as recommended … (#6009)
- Use
time_unit
/time_zone
instead oftu
/tz
(#7910) - More ergonomic args for
struct
,concat_str
, andarg_sort_by
(#7308) - swap arguments of
shift_and_fill
and add default… (#7192) - set maintain_order=False for df/lf.unique (#7468)
- Rename pipe arg
func
tofunction
(#7139) - Set some args for
Series
/Expr
methods to keyword-only (#7860)
🚀 Performance improvements
FromParalleIter<Option<str>> for Utf8Chunked
~1.9x
(#8058)- speed up from_par_iter Option<bool>
~2.5x
(#8057) - parallelize numeric ChunkedArray materialization
~2x
. (#8053) - parallelize
into_groups
materialization ~-25%
(#8036) - use a trusted anyvalue builder (#8001)
- numeric grouptuples with nulls hash in single pass
~25%
(#7980) - ensure primitives are parsed first in anyvalue conversion (#7955)
- use perfect hash table for categoricals (#7951)
✨ Enhancements
- multiple sql contexts & optional sql highlighting in cli (#8072)
- implement arg_sort for struct dtype (#8051)
- Support
DataFrame
init from pyarrowRecordBatch
objects, and improve init fromArray
(#8011) - allow
write_ipc
to takefile=None
(returningBytesIO
) (#7997) - Add __array__ method to DataFrame (#7979)
- support struct in df.unique (#7976)
- change top_k(descending) -> bottom_k (#7969)
- basic sanity-checks for some
Config
methods, reference POLARS_MAX_THREADS inthreadpool_size
docstring (#7965) - optimize away nested unions in lp (#7861)
- Use RowsError instead of RowsException as recommended … (#6009)
- More ergonomic args for
struct
,concat_str
, andarg_sort_by
(#7308)
🐞 Bug fixes
- check element count in multi-column explode (#8050)
- set lower limit for chunk_size (#8048)
- impl to_static for struct (#8037)
- create Series with list of only None with Float32 dtype (#8015)
- version gate pyarrow version for `to_pandas=(use_pyarrow… (#8026)
- Only allow correct type for get_column and to_series arg… (#7983)
- Output correct dtype for values of remapping dict in map… (#8013)
- all/any empty sets (#8012)
- struct null_count, cast string, tranpose and describe (#8009)
- fix pivot and transpose of struct data (#8005)
- don't create duplicate pivot names (#8002)
- Fix test_literal_group_agg_chunked_7968 test (#7991)
- fix chunked literals in expression engine (#7973)
- in
sort
,top_k
,sort_by
, andarg_sort_by
, raise ifdescending
is a sequence and its length doesn't match the number of columns to sort by (#7957) - pandas 2.0 compat (#7962)
- concat object types (#7958)
- fix decimal conversion alignment (#7954)
🛠️ Other improvements
- Fix Expr.apply docstring for return_dtype parameter (#8069)
- rename some function arguments (#8017)
- Remove deprecated behaviour (#7978)
- Add docstring examples for top_k and bottom_k (#7987)
- rename
toggle_string_cache
toenable_string_cache
(#7970) - add remaining operator-equivalent method docstrings and a related html/docs entry (#7953)
- Use
time_unit
/time_zone
instead oftu
/tz
(#7910) - swap arguments of
shift_and_fill
and add default… (#7192) - set maintain_order=False for df/lf.unique (#7468)
- Rename pipe arg
func
tofunction
(#7139) - Set some args for
Series
/Expr
methods to keyword-only (#7860)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @StefanBRas, @alexander-beedie, @ghuls, @rben01, @ritchie46, @stinodego and @universalmind303