🏆 Highlights
- improve join performance through radix partitioned join (#12270)
💥 Breaking changes
- Rename cumulative functions
cumsum -> cum_sum
and similar (#12513) - Rename
take
togather
(#12528) - Add dedicated horizontal aggregation methods to
DataFrame
(#12492) - Rename
take_every
togather_every
(#12531) - Deprecate
parse_int
in favor ofto_integer
(#12464) - plugins add version and context (#12433)
- Fix
scan_csv
error type (#12355) - Rename
write_csv
parameterhas_header
toinclude_header
(#12351) - Rename
is_signed
tois_signed_integer
(#12220) - Rename
dt.seconds
todt.total_seconds
(likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179) - Rename
ljust
/rjust
topad_end
/pad_start
(#11975)
🚀 Performance improvements
- speed up cov/corr with SIMD + strength-reduction
~3x 0.19.13/ ~2x numpy
(#12471) - apply predicates and statistics of parquet files in streaming mode (#12439)
- use online algorithm for cov/corr
~2x
(#12412) - indexvec in group-by (#12371)
- reduce allocations in hash join (#12368)
- change concurrency parameters (#12321)
- improve join performance through radix partitioned join (#12270)
- remove extra multiplication in hash_to_partition (#12233)
- allow non-power-of-two partitions (#12225)
- Reduce compute in error message for failed datetime parsing (#12147)
- improve parquet downloading (#12061)
✨ Enhancements
- Add dedicated horizontal aggregation methods to
DataFrame
(#12492) - support http scan_parquet (#12517)
- Add support for UTF-8 BOM option in
write_csv
andsink_csv
(#12253) - remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
- Allow comparison of two local categories with the same hash (#12503)
- more changes for versioned plugins (#12504)
- plugins add version and context (#12433)
- include i128 in more primitive functions (#12413)
- write rolling functions as private expressions. (#12379)
- Add
round_sig_figs
expression for rounding to significant figures (#11959) - change concurrency parameters (#12321)
- deprecate
_saturating
in duration string language, make it the default (#12301) - auto infer
ambiguous
for truncate and round (#12204) - Rename
is_signed
tois_signed_integer
(#12220) - New
Config
options for numeric formatting: digit grouping and thousands/decimal separator (#12099) - allow non-aggregation predicate in ternary groupby (#12286)
- Add
name=
in.write_avro
to set schema name (#12255) - Add support for reading zstd compressed files (no-options) in read_csv (#12214)
- start prefetching all files immediately (#12201)
- Add
.list.to_array
expression (#12192) - consolidate & improve all casting failure error messages (#12168)
- tunable concurrency (#12171)
- support reverse sort in streaming (#12169)
- Add
.arr.to_list
expression (#12136) - add concurrency budget (#12117)
- Introduce ignore_nulls for str.concat (#12108)
- casting utf8 to temporal (#12072)
- Add supertype for
List
/Array
(#12016) - enable eq and neq for array dtype (#12020)
- Expressify n of shift (#12004)
- add dedicated
name
namespace for operations that affect expression names (#11973)
🐞 Bug fixes
- fix incorrect ternary agg states (#12538)
- fix and improve ternary evaluation on groups (#12529)
- saturating sub in debug msg (#12525)
- fix panic when writing
Decimal
type to parquet (#12532) - pre-fefetch struct columns in async projection pd (#12514)
- rechunk cross join output in streaming (#12511)
- fix as_list logical types (#12507)
- fix streaming cross join on empty df (#12491)
- dont overflow when calculating date range over very long periods (#12479)
- Allow append/zip_with/extend on local categoricals (#12369)
- Do not panic if time is invalid (#12466)
- empty csv no-raise (#12434)
- Fix
scan_csv
error type (#12355) - binary operations in aggregation context on literals (#12430)
- update groups state after binary aggregation (#12415)
- Remove extra
\n
when reading file-like object wi… (#12333) - revert ternary special broadcast, ensure broadcast is always to max height (#12395)
- ensure first/last return null if empty (#12401)
- Do not cast lit if has same dtype (#12342)
- Fix index column name of rolling/dynamic group by (#12365)
- ternary broadcasting with empty truthy or falsy and agg predicate (#12357)
- uint64 should be correctly extracted from python object (#12338)
- expr_output_name include literal (#12335)
- Fix Decimal dtype table repr (#12318)
- Fix behavior of month intervals in
date_range
(#12317) - scan emtpy csv miss row_count (#12316)
- zip_with also broadcast mask (#12309)
- respect hive_partitioning flag when dealing with multiple files (#12315)
- parquet, add row_count to empty file materialization (#12310)
- fix download ranges in parquet (#12313)
- object store path derivation for local URL (#12308)
- don't move right endpoint of windows in rolling in default
offset==-period
case (#12267) - Raise more informative error on invalid
reshape
input (#12288) - incorrect super type for literals in nested binary exprs (#12238)
- Update
null_count
after arithmetic (#12280) - fix ambiguous aggregation type (#12269)
- Consistently propagate nulls for
numpy
ufuncs (#12212) - respect return_scalar of list scalars (#12251)
- potential overflow (#12206)
- always start a new thread if the thread is already blocking (#12202)
- with_row_count should block predicate push down for lazy csv (#12187)
- rechunk failed-list series before iterate (#12189)
- Raise if *_horizontal without inputs (#12106)
- fix incorrect desc sort behavior (#12141)
take
should block predicate pushdown (#12130)- use null type when read from unknown row (#12128)
- boundary predicate to block all accumulated predicates in push down (#12105)
- make python
schema_overrides
information available to the rust-side inference code when initialising from records/dicts (#12045) - fix panic when initializing Series with array of list dtype (#12148)
- Fix schema of arr.min/max (#12127)
- ensure filter predicate inputs exist in schema (#12089)
- str.concat on empty list (#12066)
- binary agg should group aware if literal not a scalar (#12043)
- Use Arrow schema for file readers (#12048)
- Error on duplicates in hive partitioning (#12040)
- display fmt for str split (#12039)
- sum_horizontal should not always cast to int (#12031)
- fix apply_to_inner's dtype (#12010)
- Fix padding for non-ASCII strings (#12008)
- inline parts of unstable unicode module for stable (#12003)
- fix dot visualization of anonymous scans (#12002)
- SQL table aliases (#11988)
🛠️ Other improvements
- Rename cumulative functions
cumsum -> cum_sum
and similar (#12513) - fix and improve ternary evaluation on groups (#12529)
- Rename
take
togather
(#12528) - Add dedicated horizontal aggregation methods to
DataFrame
(#12492) - Rename
take_every
togather_every
(#12531) - Add
polars-ds
to list of community plugins (#12527) - add schema test (#12523)
- remove lexical (replace with atoi_simd, ryu, and itao). (#12512)
- add test for previous commit (#12510)
- Support Python 3.12 (#12094)
- Fix some typos (#12485)
- Deprecate
parse_int
in favor ofto_integer
(#12464) - update rustc (#12468)
- rename the
DataType
in the polars-arrow crate toArrowDataType
for clarity, preventing conflation with our own/nativeDataType
(#12459) - Replace outdated dev dependency
tempdir
(#12462) - move cov/corr to polars-ops (#12411)
- use unwrap_or_else and get_unchecked_release in rolling kernels (#12405)
- dprint/markdown link checker minor updates (#12409)
- replace as_u64 with dirty_hash (#12327)
- Fix ruff linting invocation (#12350)
- Rename
write_csv
parameterhas_header
toinclude_header
(#12351) - Build and verify Rust examples in docs (#12334)
- Fix some feature flags (#12325)
- Organize Cargo.toml (#12323)
- remove fxhash (#12322)
- Run rustfmt on doc examples (#12319)
- Consolidate "getting started" and "user guide" sections (#12246)
- deprecate
_saturating
in duration string language, make it the default (#12301) - simplify expr checking in predicate push down (#12287)
- Replace dev dependency
avro-rs
withapache-avro
(#12295) - Run
clippy
on all targets (#12293) - Add top-level
make clippy
, simplify Rust linting workflows (#12290) - ensure we git-ignore ALL
.venv
dirs (#12289) - incorrect super type for literals in nested binary exprs (#12238)
- remove unwrap from group_by (#12263)
- update object_store (#12006) (#12273)
- Remove recommended setting from IDE docs (#12275)
- Add feature flag for
list.eval
(#12254) - factor out some shared code in
truncate_impl
(#12229) - update Cargo.lock (#12226)
- Make all functions in string namespace non-anonymous (#12215)
- Rename
dt.seconds
todt.total_seconds
(likewise for days, hours, minutes, milliseconds, microseconds, and nanoseconds) (#12179) - use enum for Ambiguous (#12193)
- Standardize project name formatting across docs (#12185)
- Update
sqlparser
to0.39
(#12173) - pin ring (#12176)
- Refactor
FunctionExpr
module (#12162) - Fix tests for pyarrow 14 (#12170)
- Fix triggers for docs deployment (#12159)
- Make all functions in binary namespace non-anonymous (#12126)
- Consolidate contributing info (#12109)
- Fix typo in user-guide/expressions/plugins.md (#12115)
- Update CODEOWNERS (#12107)
- visualize plugin directory layout in user guide (#12092)
- Minor improvements to the docs website (#12084)
- reshape and repeat_by non-anoymous (#12064)
- upgrade zstd to 0.13 in
polars-parquet
(#12062) - Direct CONTRIBUTING to the docs website (#12042)
- inline parquet2 (#12026)
- remove parquet logic from
polars-arrow
and consolidate logic inpolars-parquet
crate. (#12022) - move abs to ops (#12005)
- Rename
ljust
/rjust
topad_end
/pad_start
(#11975) - Disable type checking for
dataframe_api_compat
dependency (#11997)
Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Priyansh121096, @abstractqqq, @alexander-beedie, @braaannigan, @brayanjuls, @c-peters, @cmdlineluser, @daviskirk, @dependabot, @dependabot[bot], @dgilman, @hirohira9119, @ion-elgreco, @jerome3o, @jrycw, @mcrumiller, @messense, @moritzwilksch, @nameexhaustion, @orlp, @owrior, @rancomp, @reswqa, @ritchie46, @rob-sil, @stefmolin, @stinodego, @uchiiii, @universalmind303 and @wsyxbcl