🚀 Performance improvements
- Improve
unique
performance by adding RangedUniqueKernel for primitive arrays (#17166) - faster decode on Parquet HybridRLE (#17208)
✨ Enhancements
- Add SQL support for
NATURAL
joins and theCOLUMNS
function (#17295) - Add
str.extract_many
expression (#17304) - Support '%' in pathnames for async scan (#17271)
- Support
SQL
Struct/JSON field access operators (#17226) - Exclude directories from glob expansion result (#17174)
- Support SQL
ORDER BY ALL
syntax (#17212) - Support PostgreSQL
^@
("starts with"), and~~
,~~*
,!~~
,!~~*
("like", "ilike") string-matching operators (#17251) - Support SQL
SELECT * ILIKE
wildcard syntax (#17169) - Support
SQL
temporal functionsSTRFTIME
andSTRPTIME
, and typed literal syntax (#17245) - Support date/datetime for hive parts (#17256)
- Expose some more information in translated expression IR to python (#17209)
- Allow no-op
round/ceil/floor
on integer types (#17241) - Support loading from datasets where the hive columns are also stored in the file (#17203)
- Implement serde for Null columns (#17218)
- Support Decimal types in
write_csv/write_json
(#14209) - Improve SQL support for array indexing, increase test coverage (#16972)
- Support reading byte stream split encoded floats and doubles in parquet (#17099)
- Add
float_scientific
option towrite_csv
/sink_csv
(#17111)
🐞 Bug fixes
- Raise proper error for mismatching parquet schema instead of panicking (#17321)
- Raise on invalid shape dataframe arithmetic (#17322)
- Fix panic in window case (#17320)
- Raise errors instead of panicking when
sink_csv
fails (#17313) - Raise if join keys are passed to cross join (#17305)
- Don't null on oob in
list.get
for column index (#17276) - Fix issue where sliced PyArrow record batches were not handled correctly (#17058)
- Don't oob on nulls in
list.get
(#17262) - Fix list getter with nulls (#17261)
- Respect
nulls_last
parameter in aggregatesort_by
(#17249) - Fix literal slice in group by (#17242)
- Fix
DataFrame.top_k
not handling nulls correctly (#17239) - Avoid using the regex dependency when the regex feature is not used (#17206)
- properly check the BMI2 uleb128 (#17191)
📖 Documentation
- Minor layout/terminology improvement for
selector
set ops (#17299) - Fix polars-plan docs.rs build (#17266)
- Add SQL docs for the
CAST
andTRY_CAST
functions (#17214)
🛠️ Other improvements
- Prefer ParquetError::oos to ParquetError::OutOfSpec (#17314)
- remove seqmacro and u8,u16 bitpack (#17290)
- Fix typo in join validation error message (#17296)
- Use typed
iter
inlist.get
(#17286) - add ability to have pipeline blockers in new streaming engine (#17247)
- Support date/datetime for hive parts (#17256)
- Add elementwise
select
andwith_columns
to new streaming engine (#17185) chrono
's ParseErrorKind is now public (#17201)
Thank you to all our contributors for making this release possible!
@IvanIsCoding, @JamesCE2001, @MarcoGorelli, @SeanTater, @adamreeve, @alexander-beedie, @coastalwhite, @datapythonista, @flisky, @itamarst, @jqnatividad, @lukeshingles, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @stinodego and @wence-