pola-rs/polars rs-0.37.0 on GitHub

🏆 Highlights

new implementation for String/Binary type. (#13748)

💥 Breaking changes

Remove DatetimeChunked::convert_time_zone (#14046)
Rename LiteralValue::to_anyvalue to LiteralValue::to_any_value (#14033)
Rename drop_columns to drop (#13754)
Rename pl.count() to pl.len() (#13719)
Rename row_count_name/row_count_offset parameters in IO functions to row_index_* (#13563)
Rename with_row_count to with_row_index (#13494)

🚀 Performance improvements

prune parquet row groups when is_not_null is used (#14260)
use is_between to skip parquet row groups (#14244)
Use a compression API that is designed for this use case (#11699) (#14194)
Use UnitVec in polars-plan traversal (#14199)
use UnitVec in streaming joins (#14197)
improve ChunkId (#14175)
improve iteration performance (#14126)
elide unneeded work in window? (#14108)
run window functions more in parallel (#14095)
improve skip row group using statistics condition (#14056)
improve string/binary reverse performance (#14016)
optimize DataFrame.describe by presorting columns (#13822)
elide redundant bound checks. (#13909)
speedup boolean filter (#13905)
speedup binview filter (#13902)
improve binview filter (#13878)
apply string view GC more conservatively (#13850)
add optimized BinaryViewArray comparison kernels (#13839)
lazy cache binview bytes len (#13830)
fast-path for eager int_range (#13811)
Optimize arr.sum for inner non-null bool (#13800)
directly embed data ptr in Buffer (#13744)
elide parallelism restriction on generic rolling expressions (#13662)
ensure time groups are parallelized (#13660)
do not eagerly compute bitcount (#13562)
optimise SQL engine string concat (#13499)
remove lifetime requirement from CategoricalChunkedBuilder (#13319)

✨ Enhancements

add u8/i8/u16/i16 parsers to CSV reader (#14241)
Implements list.gather_every (#14253)
Implements prefix/suffix_fields (#14251)
Polish decimal arithmetic (#14172)
Introduce arr.to_struct (#14202)
Supports map fields name of struct (#14203)
make IdxVec generic as UnitVec (#14196)
add new arithmetic kernels (#14026)
Supports unique and hash_rows for null column (#14111)
Implement arithmetic operations for Null columns (#14107)
Add strict/non-strict construction of Boolean/Binary series (#14073)
Improve Series::from_any_values logic (#14052)
Adapt extend_constant to function expr architecture and expressify it (#14058)
add integer negation (#14049)
list & array measures of dispersion (#13245)
gc binview when writing ipc (#14035)
When calling convert_time_zone on time-zone-naive datetime, convert as if converting from UTC (#13960)
DataFrame supports explode by array column (#13958)
improve binary formatting (#13981)
preserve Enum information when going to IPC (#13943)
support kwargs in plugin 'field' functions and raise error on unsupported binview layout (#13944)
support cast decimal to utf8 (#13829)
add SQL support for timestamp precision modifier (#13936)
support negative indexing and expressions for LEFT, RIGHT and SUBSTR SQL string funcs (#13888)
Introduce explode for ArrayNameSpace (#13923)
raise better error message for .dt.time on Date column (#13932)
List set_operations supports float (#13920)
Add ignore_nulls for arr.join (#13919)
register 'set_sorted' as batch/elementwise (#13896)
move Enum/Categorical categories to binview (#13882)
Add ignore_nulls for list.join (#13701)
Add ignore_nulls for pl.concat_str (#13877)
fix parquet for binview (#13873)
support mmap for binview in OOC (#13872)
implement ffi for binview (#13871)
Support zero fill null strategy for binary and string columns (#13869)
Implement/fix unary minus operator -pl.col(...) (#13776)
extend SQL EXTRACT with "century", "millennium", and "timezone" parts (#13634)
fix binview ipc format (#13842)
add SQL support for numeric and/or decimal types (#13739)
improve panic message (#13836)
Expressify str.zfill (#13790)
new implementation for String/Binary type. (#13748)
Add nulls_last for Series.sort (#13794)
Impl count_matches for array namespace (#13675)
Add nulls_last for list/array.sort (#13795)
Rename drop_columns to drop (#13754)
convert fixed-offset timezones to respective Etc timezone from time zone database (#13738)
Expressify str.slice (#13747)
implement binview for polars-row (#13736)
implement binview for polars-json (#13737)
add architecture for polars-flavored IPC (#13734)
implement binview comparison kernels (#13715)
raise default frame/series repr height from 8 to 10 (#13699)
write parquet ColumnOrder (#13672)
Impl contains for ArrayNameSpace (#13638)
improve rolling() expression formatting (#13657)
Implement is_between in Rust (#11945)
Expressify pattern of str.extract (#13607)
Impl join for ArrayNameSpace (#13586)
add SQL engine support for string cast to json (#13624)
add SQL engine support for EXTRACT and DATE_PART (#13603)
add BinaryView to parquet writer/reader. (#13489)
add SQL engine support for POSITION and STRPOS (#13585)
is_in support for array dtype (#13559)
add new str.find expression, returning the index of a regex pattern or literal substring (#13561)
add SQL engine support for LIKE and ILIKE pattern matching (#13522)
improve hive partition pruning (#13358) (#13426)
don't rechunk by default in lazy scans (#13518)
Add cum_count expression function (#13478)
add SQL engine support for IF control flow function (#13491)
add SQL engine support for MOD function (#13502)
return datetime for datetime mean & median (#13417)
add SQL engine support for CONCAT_WS string function (#13483)
BinaryView/Utf8View IPC support (#13464)
Implement wasm Pool::scope (#13476)
add SQL engine support for RIGHT and REVERSE string functions (#13461)
implement BinaryView and Utf8View in polars-arrow (#13243)
add SQL engine support for variadic string CONCAT function (#13428)
add support for AND in SQL join-clause context (#13242)
Impl ordering ops for array namespace (#13414)
add SQL engine support for REPLACE string function (#13431)
add SQL engine support for SIGN function (#13429)
add SQL engine support for IFNULL function (#13432)
additional SQL support for bytes, bit, and hex literals (#13389)

🐞 Bug fixes

deduplicate recursive growables (#14264)
Fix glimpse overload signature (#14258)
allow set operations on list of categoricals (#14110)
any/all_horizontal with single input has incorrect type (#14256)
load numpy array with np array values #14237 (#14238)
Fix join validation for String types (#14229)
make csv parser more robust to edge cases (#14210)
Fix for set_operations of binary dtype (#14152)
fix read_csv date/datetime inference and parsing (#14113)
don't see files as hive partitions (#14128)
allow eval on list of categoricals (#14132)
add missing conditional compile flag for StringFunction::Find (#14129)
Forbid casting from Date to Time and vice versa (#14127)
preserve old naming convention for multi-value pivot (this will change in 1.0 to no longer redundantly have the column name in the middle) (#14120)
Implements gt/lt cmp for null dtype (#14119)
ignore comments at beginning of csv if schema provided (#14115)
fix pivot when multiple columns are passed. Output is now aligned with what tidyverse / pandas.pivot_table would do (#14048)
some temporal conversion errors for datetimes earlier than 1970-01-01 (#14050)
Preserve name when casting from categorical (#14085)
fix cse bug when window function is nested (#14070)
Fix melt panic when there are no value vars (#14057)
json_encode should respect the logical type (#14063)
improve skip row group using statistics condition (#14056)
Raise for .dt.epoch and .dt.timestamp for Duration dtype (#13962)
handle SliceSink with empty data (#14025)
correct field type schema inference (using read_csv) (#14042)
Map AnyValue::Null to datatype Null (#14045)
Use int formatter for unsigned ints (#14043)
quick fix for multiple chunks binary reverse (#14024)
count matches on list categorical (#14021)
list.min/max with empty and/or None elements (#14018)
allow get access to list of categoricals (#14015)
Fix casting from categorical to numeric (#13957)
read_csv preserve whitespace and newlines (#13934)
append decimal with different scale (#13977)
Allow casting integer types to Enum (#13955)
arg_min/max on categoricals should respect ordering (#13998)
serialize decimal type (#13997)
check input type for arr/list.contains (#13959)
Allow dtype merge when inner dtype is enum (#13938)
recurse less in streaming shared sinks (#13930)
ensure order is preserved if streaming from different sources (#13922)
Fix is_not_null for Struct columns (#13921)
make 100 * pl.col(pl.Boolean).mean() work (#13725)
allow extract of numeric from str AnyValue (#13865)
single-element .dt.time() and .dt.date() should always preserve sortedness (#13808)
prune emtpy chunks before set operations (#13898)
treat null columns as zero in sum_horizontal (#13880)
include null count in rolling window validity with min_periods (#13863)
don't return NaN as free memory fraction (#13860)
parquet hybrid RLE encoding did not always align to bit width (#13883)
Add ignore_nulls for list.join (#13701)
.dt.time() was panicking for datetimes prior to unix epoch (#13812)
Correct err message of check_map_output_len (#13854)
allow list creation of decimals (#13851)
Implement abs for Decimal, error on Date/Time/Datetime (#13821)
decompress the right number of rows when reading compressed CSVs (#13721)
rolling nested groups deadlock (#13835)
gather_every should work on agg context (#13810)
When reading Parquet or Arrow, convert +00:00 timezone to UTC (#13816)
Fix segfault of is_in (#13814)
don't panic on full null qcut (#13815)
do not read data for zero-length compressed buffer (#13791)
Fix the non-null test of transpose (#13783)
Raise error instead of panic when joining on wildcard/nth (#13742)
str.concat correctly ignore single null value (#13751)
Selectors by_name and by_dtype should allow empty list as input (#11024)
Use NonZeroUsize for batch_size parameter in write_csv/sink_csv/scan_ndjson (#13726)
error instead of panicking in sql if empty function (#13691)
gather.get schema (#13679)
ensure we hit proper cache in nested rolling expressions (#13666)
Allow av_buffer cast numeric record to temporal type (#13661)
streaming cross join if swapped is hit (#13656)
Make sure rolling key is projected when process projection (#13622)
fix schema inference for json (#13637)
Empty series of AggregatedList should also have list dtype (#13620)
fallback to cast kernel if inline_cast AnyValue raise (#13595)
LazyFrame::join() no longer ignores 3 JoinArgs parameters (#13570)
fix reverse variable row decoding (#13587)
Fix scatter for null values (#13578)
Fix cum_count with regards to start value / null values (#13535)
Fix precision/scale handling and invalid numbers in string-to-decimal conversions. (#13548)
Treat Python None as null value for Object dtype (#13564)
Expr.replace to single value did not replace NULLs (#13551)
AnyValue::StructOwned panic when hashing (#13553)
improve hive partition pruning (#13358) (#13426)
fix projection pushdown for new outer join schema (#13527)
ensure size-hint of TrueIdxIter is correct (#13508)
correct 'outer_coalesce' logic in case of duplicate names (#13501)
raise for out-of-range datetimes in to_datetime/strptime (#13403)
Keep logical type when getting values from list (#13456)
Handle duplicate/ambiguous inputs for replace (#13217)
skip null/empty values if replace_lit_n_char (#13400)
fix is_in operator when comparing string with global categoricals (#13412)
use different generics for shift_and_fill parameters (#13379)

📖 Documentation

fix code block in user-guide/lazy/schemas (#14228)
Fix typo in contributing guide (#14181)
Small improvements Ecosystem page (#14176)
fix code blocks in user-guide/concepts/data-structures (#14146)
Fix bullet point formatting in CI contributing guide (#14117)
Remove outdated reference to horizontal concat feature (#14105)
Replace alternatives page with more objective comparison (#13784)
Improve structure of user guide (#13951)
Improve structure of user guide (#13639)
Introduce ecosystem page in user guide (#13903)
Mention deltalake write support in README (#13890)
Fix typo in deprecation message of with_row_count (#13793)
Fix incorrect "coming from pandas" syntax (#13767)
Improve streaming section of the user guide (#13750)
fix linking to feature flags in user guide (#13644)
Improve documentation on broadcasting (#13394)
Add note about toolchain issue under native Windows (#13590)
update SQL section of the README (#13529)
update polars-business > polars-xdt link (#13509)

📦 Build system

Enable feature nightly with optional sql feature (#14222)
remove horizontal_concat feature (#13390)

🛠️ Other improvements

make gather_chunked completely generic (#14195)
Add .cargo directory to .gitignore (#14191)
take_chunked to polars-ops (#14185)
Enable clippy lint to warn on debug macros (#14178)
Run cargo update (#14160)
merge take kernels (#14137)
improve From<Ca> -> Vec (#14123)
hoist boolean -> string cast (#14122)
Remove DatetimeChunked::convert_time_zone (#14046)
More generic way to present an expression tree diagram (#14020)
Rename LiteralValue::to_anyvalue to LiteralValue::to_any_value (#14033)
make Enums an actual datatype (#14011)
update rustc (#13947)
move filter to polars-compute (#13897)
bump object_store to 0.9 (#13857)
Make functions in expr/general non-anonymous (#13832)
Fix doctests (#13831)
Refactor Python release workflow (#13807)
Make pl.duration non-anonymous (#13762)
Rename pl.count() to pl.len() (#13719)
Deprecate dt.with_time_unit in favor of cast(pl.Int64).cast(pl.Datetime(time_unit, time_zone)) (#13667)
Auto-add 'needs triage' label to bugs (#13671)
make rolling index column visible to optimizer (#13658)
Rename lazy-regex feature to regex to align polars with polars-lazy crate (#13647)
Add Documentation / Build system sections to the changelog (#13594)
Filter unhelpful messages in make build (#13579)
Remove extra line break between checkboxes in GitHub bug report issues (#13576)
Rename row_count_name/row_count_offset parameters in IO functions to row_index_* (#13563)
Rename with_row_count to with_row_index (#13494)
simplify parquet binary ordering function (#13488)
dont panic of ambiguous is of wrong type (#13388)

Thank you to all our contributors for making this release possible!
@29antonioac, @Bromeon, @ByteNybbler, @JulianCologne, @MarcNuebel, @MarcoGorelli, @NedJWestern, @ShivMunagala, @Vincenthays, @Wainberg, @aaarrti, @alexander-beedie, @apcamargo, @bchalk101, @braaannigan, @c-peters, @cgevans, @cmdlineluser, @collinprince, @deanm0000, @dependabot, @dependabot[bot], @dpinol, @edavisau, @eitsupi, @flisky, @grinya007, @hamishs, @henryharbeck, @ion-elgreco, @itamarst, @jacksonthall22, @jcrozum, @kstoneriv3, @langestefan, @lukemanley, @mcrumiller, @mkucijan, @nameexhaustion, @orlp, @petrosbar, @r-brink, @reswqa, @ritchie46, @s-banach, @shritesh, @stinodego, @taki-mekhalfa, @thomasaarholt, @tim-stephenson, @universalmind303, @valorien and @wjandrea

pola-rs/polars rs-0.37.0 Rust Polars 0.37.0 on GitHub

🏆 Highlights

💥 Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

pola-rs/polars rs-0.37.0
Rust Polars 0.37.0

on GitHub