💥 Breaking changes
- Make bottom interval closed in
hist
(#22090) - Change Partition API to
base_path
andfile_path
(#21888)
🚀 Performance improvements
- Add CSE to streaming groupby (#22196)
- Speed-up new streaming predicate filtering (#22179)
- Speedup new-streaming file row count (#22169)
- Fix quadratic behavior when casting Enums (#22008)
- Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
- Fast path for empty inner join (#21965)
- Add native semi/anti join in new streaming engine (#21937)
- Cache regex compilation globally (#21929)
✨ Enhancements
- Add
SPLIT_PART
string function to the SQL interface (#22158) - Allow scalar expr in
Expr.diff
(#22142) - Support additional unsigned int aliases in the SQL interface (#22127)
- Add
STRING_TO_ARRAY
function to the SQL interface (#22129) - Add dt.is_business_day (#21776)
- Add an
eager
parameter topl.cov
(#22098) - Add support for
Int128
parsing/recognition to the SQL interface (#22104) - Add an
eager
parameter topl.coalesce
(#22092) - Add an
eager
parameter topl.corr
(#22097) - Allow sinking to abstract python
io
andfs
classes (#21987) - Add
add_alp_optimize_exprs
toIRBuilder
(#22061) - Add
cat.slice
(#21971) - Support growing schema if line lenght increases during csv schema inference (#21979)
- Replace thread unsafe
GilOnceCell
withMutex
(#21927) - Support modified dsl in file cache (#21907)
🐞 Bug fixes
- Implode in agg (#22197)
- Reduce GIL hold time for IO plugins in new-streaming (#22186)
- Enhance predicate validation and cast safety in
join_where
(#22112) - Handle Parquet with compressed empty DataPage v2 (#22172)
- Schema error during lowering (#22175)
- Rewrite unroll of overlapping groups to mitigate out of range index panic (#22072)
- Incorrect rounding for very large/small numbers (#22173)
- Allow set input to
list.set_*
operations (#22163) - Deadlock in join due to rayon nested task-stealing (#22159)
- Mark
Expr.repeat_by
as elementwise (#22068) - Fix csv serializer panic by supporting ScalarColumn in as_single_chunk (#22146)
- Raise an error if a number doesn't have associated unit in duration strings (#22035)
- Add
i128
as supertype to boolean (#22138) - Fix panic when constructing DF from pyarrow due to duplicate field names (#22114)
- Add broadcasts and error messages for many elementwise operations (#22130)
- Throw error for
n=0
onlist.gather_every
(#22122) - Throw error for unsupported rolling operations (#22121)
- Error on unequal length
str.to_integer
arguments (#22100) - Make bottom interval closed in
hist
(#22090) - Relative path resolution for plugin libraries (#21911)
- Avoiding panic with striptime for out-of-bounds dates (#21208)
- Join revmaps for categoricals in
merge_sorted
(#21976) - Fix glob expansion matching extra files (#21991)
- Ensure SQL dot-notation for nested column fields resolves correctly (#22109)
- Parquet filter performance regression from multiscan dispatch (#22116)
- Panic for unequal length
ewm_mean_by
args (#22093) - Add scalarity checks to
pl.repeat
(#22088) - Type check
n
parameter ofpl.repeat
(#22071) - Mark
bitwise_{count,leading,trailing}_{ones,zeros}
as elementwise (#22044) - Mark
pl.*_ranges
functions correctly as element-wise (#22059) - Correctly type check
pl.arctan2
(#22060) - Mark
pl.business_day_count
as elementwise (#22055) - Check input python type for
str.extract_groups
(#22032) - Check types for
fill_char
instr.pad_{start,end}
(#22036) - Mark
str.to_decimal
properly as non-elementwise (#22040) - Documented return type for
bin.encode
andbin.decode
(#22022) - Revert #22017 and improve block(_in_place)_on doc comment (#22031)
- Remove outdated depth warning (#22030)
- Expression pl.concat was incorrectly marked as elementwise (#22019)
- Use block_in_place_on to start streaming (#22017)
- Panic on empty aggregation in streaming (#22016)
- Error instead of panick for invalid durations in
dt.offset_by()
anddt.round()
(#21982) - Raise error instead of silently appending NULL in NDJSON parsing (#21953)
- Ensure AV is static before pushing to row buffer (#21967)
- Deadlock in new-streaming multiplexer (#21963)
- Release GIL in
collect_with_callback
(#21941) - Panic in new RegexCache (#21935)
- Type hint of
cs.exclude()
isSelectorType
instead ofExpr
(#21892) - Add correct deprecation warning for .str.concat (#21666)
- Use absolute paths by defaults for plugins (#21904)
📖 Documentation
- Add user guide section on working with Sheets in Colab (#22161)
- Update distributed engine docs (#22128)
- Add Polars Cloud release notes (#22021)
- Remove trailing space in settings POLARS_CLOUD_CLIENT_ID (#21995)
- Fix typo (#21954)
- Fix 'pickleable' typo in docs (#21938)
- Change ctx to compute=ctx for all remote query examples (#21930)
🛠️ Other improvements
- Remove old
MultiScanExec
for in-memory (#22184) - Separate
FunctionOptions
from DSL calls (#22133) - Undeprecate
backward_fill
andforward_fill
(#22156) - Handle conversion of Duration specially in pyir (#22101)
- Deprecate duplicate
backward_fill
andforward_fill
interface (#22083) - Solve clippy lints for 1.86 (#22102)
- Remove rust exclusive
MaxBound
andMinBound
fill strategies (#22063) - Change Partition API to
base_path
andfile_path
(#21888) - Fix pydantic model_fields deprecation (#21958)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @EnricoMi, @Jacob640, @JakubValtar, @MarcoGorelli, @MaxJackson, @alexander-beedie, @amotzop, @anath2, @bschoenmaeckers, @cnpryer, @coastalwhite, @dependabot[bot], @eitsupi, @etiennebacher, @hemanth94, @kdn36, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @rgertenbach, @ritchie46, @sebasv, @silannisik, @stijnherfst, @wence-, @zachlefevre and dependabot[bot]