🏆 Highlights
- new implementation for
String/Binary
type. (#13748)
💥 Breaking changes
- Remove
DatetimeChunked::convert_time_zone
(#14046) - Rename
LiteralValue::to_anyvalue
toLiteralValue::to_any_value
(#14033) - Rename
drop_columns
todrop
(#13754) - Rename
pl.count()
topl.len()
(#13719) - Rename
row_count_name
/row_count_offset
parameters in IO functions torow_index_*
(#13563) - Rename
with_row_count
towith_row_index
(#13494)
🚀 Performance improvements
- prune parquet row groups when
is_not_null
is used (#14260) - use is_between to skip parquet row groups (#14244)
- Use a compression API that is designed for this use case (#11699) (#14194)
- Use
UnitVec
in polars-plan traversal (#14199) - use
UnitVec
in streaming joins (#14197) - improve
ChunkId
(#14175) - improve iteration performance (#14126)
- elide unneeded work in window? (#14108)
- run window functions more in parallel (#14095)
- improve skip row group using statistics condition (#14056)
- improve string/binary reverse performance (#14016)
- optimize
DataFrame.describe
by presorting columns (#13822) - elide redundant bound checks. (#13909)
- speedup boolean filter (#13905)
- speedup binview filter (#13902)
- improve binview filter (#13878)
- apply string view GC more conservatively (#13850)
- add optimized BinaryViewArray comparison kernels (#13839)
- lazy cache binview bytes len (#13830)
- fast-path for eager int_range (#13811)
- Optimize
arr.sum
for inner non-null bool (#13800) - directly embed data ptr in Buffer (#13744)
- elide parallelism restriction on generic rolling expressions (#13662)
- ensure time groups are parallelized (#13660)
- do not eagerly compute bitcount (#13562)
- optimise SQL engine string concat (#13499)
- remove lifetime requirement from CategoricalChunkedBuilder (#13319)
✨ Enhancements
- add
u8
/i8
/u16
/i16
parsers to CSV reader (#14241) - Implements
list.gather_every
(#14253) - Implements
prefix/suffix_fields
(#14251) - Polish decimal arithmetic (#14172)
- Introduce
arr.to_struct
(#14202) - Supports map fields name of struct (#14203)
- make
IdxVec
generic asUnitVec
(#14196) - add new arithmetic kernels (#14026)
- Supports
unique
andhash_rows
fornull
column (#14111) - Implement arithmetic operations for
Null
columns (#14107) - Add strict/non-strict construction of Boolean/Binary series (#14073)
- Improve
Series::from_any_values
logic (#14052) - Adapt extend_constant to function expr architecture and expressify it (#14058)
- add integer negation (#14049)
list
&array
measures of dispersion (#13245)- gc binview when writing ipc (#14035)
- When calling
convert_time_zone
on time-zone-naive datetime, convert as if converting from UTC (#13960) - DataFrame supports explode by array column (#13958)
- improve binary formatting (#13981)
- preserve Enum information when going to IPC (#13943)
- support kwargs in plugin 'field' functions and raise error on unsupported binview layout (#13944)
- support cast decimal to utf8 (#13829)
- add SQL support for
timestamp
precision modifier (#13936) - support negative indexing and expressions for
LEFT
,RIGHT
andSUBSTR
SQL string funcs (#13888) - Introduce
explode
forArrayNameSpace
(#13923) - raise better error message for .dt.time on Date column (#13932)
- List set_operations supports float (#13920)
- Add
ignore_nulls
forarr.join
(#13919) - register 'set_sorted' as batch/elementwise (#13896)
- move Enum/Categorical categories to binview (#13882)
- Add
ignore_nulls
forlist.join
(#13701) - Add
ignore_nulls
forpl.concat_str
(#13877) - fix parquet for binview (#13873)
- support mmap for binview in OOC (#13872)
- implement ffi for
binview
(#13871) - Support zero fill null strategy for binary and string columns (#13869)
- Implement/fix unary minus operator
-pl.col(...)
(#13776) - extend SQL
EXTRACT
with "century", "millennium", and "timezone" parts (#13634) - fix binview ipc format (#13842)
- add SQL support for
numeric
and/ordecimal
types (#13739) - improve panic message (#13836)
- Expressify
str.zfill
(#13790) - new implementation for
String/Binary
type. (#13748) - Add
nulls_last
forSeries.sort
(#13794) - Impl
count_matches
for array namespace (#13675) - Add
nulls_last
forlist/array.sort
(#13795) - Rename
drop_columns
todrop
(#13754) - convert fixed-offset timezones to respective Etc timezone from time zone database (#13738)
- Expressify
str.slice
(#13747) - implement binview for polars-row (#13736)
- implement binview for polars-json (#13737)
- add architecture for polars-flavored IPC (#13734)
- implement binview comparison kernels (#13715)
- raise default frame/series repr height from 8 to 10 (#13699)
- write parquet ColumnOrder (#13672)
- Impl
contains
for ArrayNameSpace (#13638) - improve
rolling()
expression formatting (#13657) - Implement
is_between
in Rust (#11945) - Expressify
pattern
ofstr.extract
(#13607) - Impl
join
for ArrayNameSpace (#13586) - add SQL engine support for string cast to
json
(#13624) - add SQL engine support for
EXTRACT
andDATE_PART
(#13603) - add
BinaryView
toparquet
writer/reader. (#13489) - add SQL engine support for
POSITION
andSTRPOS
(#13585) is_in
support for array dtype (#13559)- add new
str.find
expression, returning the index of a regex pattern or literal substring (#13561) - add SQL engine support for
LIKE
andILIKE
pattern matching (#13522) - improve hive partition pruning (#13358) (#13426)
- don't rechunk by default in lazy scans (#13518)
- Add
cum_count
expression function (#13478) - add SQL engine support for
IF
control flow function (#13491) - add SQL engine support for
MOD
function (#13502) - return datetime for datetime mean & median (#13417)
- add SQL engine support for
CONCAT_WS
string function (#13483) BinaryView
/Utf8View
IPC support (#13464)- Implement wasm Pool::scope (#13476)
- add SQL engine support for
RIGHT
andREVERSE
string functions (#13461) - implement
BinaryView
andUtf8View
inpolars-arrow
(#13243) - add SQL engine support for variadic string
CONCAT
function (#13428) - add support for AND in SQL join-clause context (#13242)
- Impl ordering ops for array namespace (#13414)
- add SQL engine support for
REPLACE
string function (#13431) - add SQL engine support for
SIGN
function (#13429) - add SQL engine support for
IFNULL
function (#13432) - additional SQL support for
bytes
,bit
, andhex
literals (#13389)
🐞 Bug fixes
- deduplicate recursive growables (#14264)
- Fix
glimpse
overload signature (#14258) - allow set operations on list of categoricals (#14110)
any/all_horizontal
with single input has incorrect type (#14256)- load numpy array with np array values #14237 (#14238)
- Fix join validation for String types (#14229)
- make csv parser more robust to edge cases (#14210)
- Fix for
set_operations
of binary dtype (#14152) - fix read_csv date/datetime inference and parsing (#14113)
- don't see files as hive partitions (#14128)
- allow eval on list of categoricals (#14132)
- add missing conditional compile flag for
StringFunction::Find
(#14129) - Forbid casting from
Date
toTime
and vice versa (#14127) - preserve old naming convention for multi-value pivot (this will change in 1.0 to no longer redundantly have the column name in the middle) (#14120)
- Implements
gt/lt
cmp for null dtype (#14119) - ignore comments at beginning of csv if schema provided (#14115)
- fix pivot when multiple columns are passed. Output is now aligned with what tidyverse / pandas.pivot_table would do (#14048)
- some temporal conversion errors for datetimes earlier than
1970-01-01
(#14050) - Preserve name when casting from categorical (#14085)
- fix cse bug when window function is nested (#14070)
- Fix
melt
panic when there are no value vars (#14057) json_encode
should respect the logical type (#14063)- improve skip row group using statistics condition (#14056)
- Raise for .dt.epoch and .dt.timestamp for Duration dtype (#13962)
- handle
SliceSink
with empty data (#14025) - correct field type schema inference (using read_csv) (#14042)
- Map
AnyValue::Null
to datatypeNull
(#14045) - Use int formatter for unsigned ints (#14043)
- quick fix for multiple chunks binary reverse (#14024)
- count matches on list categorical (#14021)
list.min/max
with empty and/or None elements (#14018)- allow get access to list of categoricals (#14015)
- Fix casting from categorical to numeric (#13957)
- read_csv preserve whitespace and newlines (#13934)
- append decimal with different scale (#13977)
- Allow casting integer types to Enum (#13955)
arg_min/max
on categoricals should respect ordering (#13998)- serialize decimal type (#13997)
- check input type for
arr/list.contains
(#13959) - Allow dtype merge when inner dtype is enum (#13938)
- recurse less in streaming shared sinks (#13930)
- ensure order is preserved if streaming from different sources (#13922)
- Fix
is_not_null
for Struct columns (#13921) - make 100 * pl.col(pl.Boolean).mean() work (#13725)
- allow extract of numeric from str AnyValue (#13865)
- single-element .dt.time() and .dt.date() should always preserve sortedness (#13808)
- prune emtpy chunks before set operations (#13898)
- treat null columns as zero in
sum_horizontal
(#13880) - include null count in rolling window validity with
min_periods
(#13863) - don't return NaN as free memory fraction (#13860)
- parquet hybrid RLE encoding did not always align to bit width (#13883)
- Add
ignore_nulls
forlist.join
(#13701) - .dt.time() was panicking for datetimes prior to unix epoch (#13812)
- Correct err message of
check_map_output_len
(#13854) - allow list creation of decimals (#13851)
- Implement
abs
for Decimal, error on Date/Time/Datetime (#13821) - decompress the right number of rows when reading compressed CSVs (#13721)
- rolling nested groups deadlock (#13835)
gather_every
should work on agg context (#13810)- When reading Parquet or Arrow, convert +00:00 timezone to UTC (#13816)
- Fix segfault of
is_in
(#13814) - don't panic on full null qcut (#13815)
- do not read data for zero-length compressed buffer (#13791)
- Fix the non-null test of
transpose
(#13783) - Raise error instead of panic when joining on wildcard/nth (#13742)
str.concat
correctly ignore single null value (#13751)- Selectors
by_name
andby_dtype
should allow empty list as input (#11024) - Use
NonZeroUsize
forbatch_size
parameter inwrite_csv/sink_csv/scan_ndjson
(#13726) - error instead of panicking in sql if empty function (#13691)
- gather.get schema (#13679)
- ensure we hit proper cache in nested
rolling
expressions (#13666) - Allow
av_buffer
cast numeric record to temporal type (#13661) - streaming cross join if swapped is hit (#13656)
- Make sure rolling key is projected when process projection (#13622)
- fix schema inference for json (#13637)
- Empty series of AggregatedList should also have list dtype (#13620)
- fallback to cast kernel if
inline_cast
AnyValue raise (#13595) LazyFrame::join()
no longer ignores 3JoinArgs
parameters (#13570)- fix reverse variable row decoding (#13587)
- Fix
scatter
for null values (#13578) - Fix
cum_count
with regards to start value / null values (#13535) - Fix precision/scale handling and invalid numbers in string-to-decimal conversions. (#13548)
- Treat Python
None
as null value forObject
dtype (#13564) Expr.replace
to single value did not replace NULLs (#13551)AnyValue::StructOwned
panic when hashing (#13553)- improve hive partition pruning (#13358) (#13426)
- fix projection pushdown for new outer join schema (#13527)
- ensure size-hint of TrueIdxIter is correct (#13508)
- correct 'outer_coalesce' logic in case of duplicate names (#13501)
- raise for out-of-range datetimes in to_datetime/strptime (#13403)
- Keep logical type when getting values from list (#13456)
- Handle duplicate/ambiguous inputs for
replace
(#13217) - skip null/empty values if replace_lit_n_char (#13400)
- fix is_in operator when comparing string with global categoricals (#13412)
- use different generics for
shift_and_fill
parameters (#13379)
📖 Documentation
- fix code block in user-guide/lazy/schemas (#14228)
- Fix typo in contributing guide (#14181)
- Small improvements Ecosystem page (#14176)
- fix code blocks in user-guide/concepts/data-structures (#14146)
- Fix bullet point formatting in CI contributing guide (#14117)
- Remove outdated reference to horizontal concat feature (#14105)
- Replace alternatives page with more objective comparison (#13784)
- Improve structure of user guide (#13951)
- Improve structure of user guide (#13639)
- Introduce ecosystem page in user guide (#13903)
- Mention deltalake write support in README (#13890)
- Fix typo in deprecation message of
with_row_count
(#13793) - Fix incorrect "coming from pandas" syntax (#13767)
- Improve streaming section of the user guide (#13750)
- fix linking to feature flags in user guide (#13644)
- Improve documentation on broadcasting (#13394)
- Add note about toolchain issue under native Windows (#13590)
- update SQL section of the README (#13529)
- update polars-business > polars-xdt link (#13509)
📦 Build system
🛠️ Other improvements
- make gather_chunked completely generic (#14195)
- Add
.cargo
directory to .gitignore (#14191) take_chunked
to polars-ops (#14185)- Enable
clippy
lint to warn on debug macros (#14178) - Run
cargo update
(#14160) - merge take kernels (#14137)
- improve From<Ca> -> Vec (#14123)
- hoist boolean -> string cast (#14122)
- Remove
DatetimeChunked::convert_time_zone
(#14046) - More generic way to present an expression tree diagram (#14020)
- Rename
LiteralValue::to_anyvalue
toLiteralValue::to_any_value
(#14033) - make Enums an actual datatype (#14011)
- update rustc (#13947)
- move
filter
topolars-compute
(#13897) - bump object_store to 0.9 (#13857)
- Make functions in
expr/general
non-anonymous (#13832) - Fix doctests (#13831)
- Refactor Python release workflow (#13807)
- Make
pl.duration
non-anonymous (#13762) - Rename
pl.count()
topl.len()
(#13719) - Deprecate
dt.with_time_unit
in favor ofcast(pl.Int64).cast(pl.Datetime(time_unit, time_zone))
(#13667) - Auto-add 'needs triage' label to bugs (#13671)
- make rolling index column visible to optimizer (#13658)
- Rename
lazy-regex
feature toregex
to alignpolars
withpolars-lazy
crate (#13647) - Add
Documentation
/Build system
sections to the changelog (#13594) - Filter unhelpful messages in
make build
(#13579) - Remove extra line break between checkboxes in GitHub bug report issues (#13576)
- Rename
row_count_name
/row_count_offset
parameters in IO functions torow_index_*
(#13563) - Rename
with_row_count
towith_row_index
(#13494) - simplify parquet binary ordering function (#13488)
- dont panic of ambiguous is of wrong type (#13388)
Thank you to all our contributors for making this release possible!
@29antonioac, @Bromeon, @ByteNybbler, @JulianCologne, @MarcNuebel, @MarcoGorelli, @NedJWestern, @ShivMunagala, @Vincenthays, @Wainberg, @aaarrti, @alexander-beedie, @apcamargo, @bchalk101, @braaannigan, @c-peters, @cgevans, @cmdlineluser, @collinprince, @deanm0000, @dependabot, @dependabot[bot], @dpinol, @edavisau, @eitsupi, @flisky, @grinya007, @hamishs, @henryharbeck, @ion-elgreco, @itamarst, @jacksonthall22, @jcrozum, @kstoneriv3, @langestefan, @lukemanley, @mcrumiller, @mkucijan, @nameexhaustion, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @s-banach, @shritesh, @stinodego, @taki-mekhalfa, @thomasaarholt, @tim-stephenson, @universalmind303, @valorien and @wjandrea