🚀 Performance improvements
- improve pivot performance by using faster series… (#5172)
- improve streaming performance (~15%) (#5170)
- don't block projection pushdown on unnest (#5123)
✨ Enhancements
- batched csv reader (#5212)
- accept expressions in arr.slice (#5191)
- is_sorted aggregation fast path for Utf8Chunked (#5184)
- support
DataFrame
init with Datetime dtypes that specify a timezone (#5174) - frame-level
n_unique()
that can count unique rows or col/expr subsets (#5165) - hybrid streaming query engine (#5139)
- return Datetime/Duration with appropriate timeunit when inferring from pytype (#5127)
- add binary dtype (#5122)
🐞 Bug fixes
- fix asof_join schema (#5213)
- fix single thread loop if schema lenght is off by 1 (#5210)
- improve numeric stability of rolling_variance (#5207)
- fix apply function over object dtype (#5206)
- fix overflow in partitioned groupby mean of int32/… (#5204)
- don't allow categorical append that is not under s… (#5195)
- include offset in arr.get (#5193)
- DataFrame.fill_null include unsigned integers (#5192)
- error on fill_nan on non float dtype (#5185)
- infer missing columns in from_dicts (#5183)
- fix rolling_float in case closure returns None (#5180)
- Implement missing
extract
conversion forTime
datatype (#5161) - implement missing conversion to python
time
object (#5152) - Rendering long docstring lines. (#5150)
- add missing _NUMPY_AVAILABLE check in Series.__getitem__ (#5126)
- wrong operator mapped for LtEq (#5120)
🛠️ Other improvements
- skip failing test until #5177 is resolved (#5205)
- ensure streaming groupby take slice into account (#5178)
- remove aggregate pushdown optimization (#5173)
- Add support for ruff python linter. (#5151)
- improve typing; many
list
types are better defined asSequence
(#5164) - Get rid of unnecessary check in SplitLines iterator (#5141)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @dannyvankooten, @ghuls, @ritchie46 and @sorhawell