🏆 Highlights
- Support
pytorch
Tensor and Dataset export with newto_torch
DataFrame/Series method (#15931)
🚀 Performance improvements
- Don't traverse deep datasets that we repr as union in CSE (#16096)
- Ensure better chunk sizes (#16071)
✨ Enhancements
- split out rolling_*(..., by='foo') into rolling_*_by('foo', ...) (#16059)
- add date pattern
dd.mm.YYYY
(#16045) - split Expr.top_k and Expr.top_k_by into separate functions (#16041)
- Support non-coalescing joins in default engine (#16036)
- Support
pytorch
Tensor and Dataset export with newto_torch
DataFrame/Series method (#15931) - Minor DB type inference updates (#16030)
- Move diagonal & horizontal concat schema resolving to IR phase (#16034)
- raise more informative error messages in rolling_* aggregations instead of panicking (#15979)
- Convert concat during IR conversion (#16016)
- Improve dynamic supertypes (#16009)
- Additional
uint
datatype support for the SQL interface (#15993) - Add post-optimization callback (#15972)
- Support Decimal read from IPC (#15965)
- Expose plan and expression nodes through
NodeTraverser
to Python (#15776) - Add typed collection from par iterators (#15961)
- Add
by
argument forExpr.top_k
andExpr.bottom_k
(#15468)
🐞 Bug fixes
- Respect user passed 'reader_schema' in 'scan_csv' (#16080)
- Lazy csv + projection; respect null values arg (#16077)
- Materialize dtypes when converting to arrow (#16074)
- Fix casting decimal to decimal for high precision (#16049)
- Fix Series constructor failure for Array types for large integers (#16050)
- Fix printing max scale decimals (#16048)
- Decimal supertype for dyn int (#16046)
- Correctly handle large timedelta objects in Series constructor (#16043)
- Do not close connection just because we're not returning Arrow data in batches (#16031)
- properly handle nulls in DictionaryArray::iter_typed (#16013)
- Fix CSE case where upper plan has no projection (#16011)
- Crash/incorrect group_by/n_unique on categoricals created by (q)cut (#16006)
- converting from numpy datetime64 and overriding dtype with a different resolution was returning incorrect results (#15994)
- Ternary supertype dynamics (#15995)
- Fix PartialEq for DataType::Unknown (#15992)
- Finish adding
typed_lit
to help schema determination in SQL "extract" func (#15955) - Fix dtype parameter in
pandas_to_pyseries
function (#15948) - do not panic when comparing against categorical with incompatible dtype (#15857)
- Join validation for multiple keys (#15947)
- Add missing "truncate_ragged_lines" parameter to
read_csv_batched
(#15944)
📖 Documentation
- Ensure consistent docstring warning in
fill_nan
methods (pointing out thatnan
isn'tnull
) (#16061) - add filter docstring examples to date and datetime (#15996)
- Fix docstring mistake for polars.concat_str (#15937)
- Update reference to
apply
(#15982) - Remove unwanted linebreaks from docstrings (#16002)
- correct default in rolling_* function examples (#16000)
- Improve user-guide doc of UDF (#15923)
- update the link to R API docs (#15973)
🛠️ Other improvements
- Bump
sccache
action (#16088) - Fix failures in test coverage workflow (#16083)
- Update benchmarks/coverage jobs with "requirements-ci" (#16072)
- Add TypeGuard to
is_polars_dtype
util (#16065) - Clean up hypothesis decimal strategy (#16056)
- split Expr.top_k and Expr.top_k_by into separate functions (#16041)
- Use UnionArgs for DSL side (#16017)
- Add some comments (#16008)
- Improve hypothesis strategy for decimals (#16001)
- Set up TPC-H benchmark tests (#15908)
- Even more Pyo3 0.21 Bound<> APIs (#15914)
- Fix failing test (#15936)
Thank you to all our contributors for making this release possible!
@CanglongCl, @JulianCologne, @KDruzhkin, @MarcoGorelli, @alexander-beedie, @avimallu, @bertiewooster, @c-peters, @dependabot, @dependabot[bot], @eitsupi, @haocheng6, @itamarst, @luke396, @marenwestermann, @nameexhaustion, @orlp, @ritchie46, @stinodego, @thalassemia, @wence- and @wsyxbcl