🏆 Highlights
- common subexpression elemination (#9632)
⚠️ Deprecations
- Deprecate parsing string inputs as literals for
when-then-otherwise
(#10122) - deprecate "connection_uri" → "connection" param in read/write database methods (#10134)
- remove/deprecate cache and its logic (#10066)
- Add
date_ranges
/time_ranges
expression functions (#10005)
🚀 Performance improvements
✨ Enhancements
- suggest map_dict instead of lambda x: DICT[x] (#10123)
- enable "inefficient apply" warnings from
Series
(#10104) - support writing duration type in json (#10112)
- BytecodeParser can now handle mixed/nested
and/or
control flow (#10085) - inline
lit(Series).cast(..)
to ->lit(Series.cast(..))
(#10092) - Add ArcTan2 to
SQLContext
(#9571) - cse in groupby's (#10062)
- Adds sql
CASE
statement expressions (#10065) - Add
date_ranges
/time_ranges
expression functions (#10005) - comm_subexpr_elim in streaming 'select/with_columns' (#10050)
- add dataframe.flags property (#10037)
- common subexpression elemination (#9632)
- detect and warn about usage of str/int/float python-based casts with
apply
(#10026) - detect and warn about usage of
json.loads
in conjunction withapply
(#10023) - detect and warn about bare
numpy
functions passed toapply
(#10021) - support bytecode identification/mapping of python string-case functions in UDFs (#10007)
- support bytecode identification of
numpy
functions in UDFs that we can map to native expressions (#10003)
🐞 Bug fixes
- adjust for null values in str.replace fast path (#10132)
- clear bit settings in list iteration (#10131)
- use row-encoded for struct::is_sorted (#10129)
- fix(rust, python): don't run file-caching in streaming mode (#10117)
- Allow initialize of pl.Array in Dataframe using schema alone (#10100)
- silence Series.apply inefficient apply warning when calling Expr.apply (#10116)
- don't panic if masked out values are invalid in temporal kernels (#10114)
- Fix struct get field by index out of bounds error. (#10097)
- fix ub in simd-json (#10093)
- fix invalid access when groupby rolling produces empty sets (#10109)
- respect
null_on_oob=False
inlist.take
when pa… (#10105) - undo regression in scan_parquet from s3 (#10098)
- fix is_sorted for structs (#10099)
- add file path to io error in scan_csv (#10076)
- fix false positive in parquet stats evaluation (#10087)
- Address
.col(regex).exclude()
operations not executing. (#10025) - address an inadvertently shallow-copy issue on underlying PySeries (#10086)
- fix Boolean::isin(null values) (#10074)
- predicate pushdown #10058 (#10071)
- map 'postgres' URI prefix to ADBC 'postgresql' module (#10018)
- Fix weighted quantile for 0 weights (#10051)
- eager
time_range
/date_range
dimensions fix (#9996)
🛠️ Other improvements
- get test_udfs running on all python versions again (#10136)
- temporarily turn off fail-fast so that ubuntu tests run (#10133)
- clarify "clones data" in to_numpy (#10095)
- Refactor
when
/then
/otherwise
internals (#9922) - Properly format
Returns
sections of docstrings (#10064) - much-improved
Instruction
matching forBytecodeParser
(#10040) - add pure-python tests and CI for bytecodeparser (#10027)
- split-out expression translation and instruction-rewrite logic from
BytecodeParser
(#10012) - cleans api sections in docs (#10004)
- Bump some dependencies (#9997)
- Add patchelf extra to maturin (#9995)
- restructure all UDF parsing/translation methods into a new
BytecodeParser
class (#9993) - Clean up
date_range
/time_range
(#9985)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @c-peters, @cmdlineluser, @jonashaag, @magarick, @mcrumiller, @rikkaka, @ritchie46 and @stinodego