🚀 Performance improvements
- Fix pathological perf issue in window-order-by (#17650)
- Cache path resolving of
scan
functions (#17616) - Add
ArrayChunks
to optimize codegen of BatchDecoder (#17632) - Rechunk before we go into grouped gathers (#17623)
- Cache schema resolve back to DSL (#17610)
- Add fastpath for when rounding by single constant durations (#17580)
- Improve parallelism in writing hive parquet (#17512)
- Support datetime in predicate during hive partition pruning (#17545)
- Batch nested embed parquet decoding (#17549)
- Batch nested Parquet decoding (#17542)
- Collect Parquet dictionary binary as view (#17475)
✨ Enhancements
- Hugging Face path expansion (#17665)
- Add DSL validation for cloud eligible check (#17287)
- Raise informative error message if non-IntoExpr is passed by name in *Frame.group_by (#17654)
- Add
infer_schema
parameter toread_csv
/scan_csv
(#17617) - Change API for writing partitioned Parquet to reduce code duplication (#17586)
- Cache schema resolve back to DSL (#17610)
- Expose
returns_scalar
to map_elements (#17613) - Add option to include file path for Parquet, IPC, CSV scans (#17563)
- Support
describe
on decimal (#15092) - Support datetime in predicate during hive partition pruning (#17545)
- Raise more informative error message for directories containing files with mixed extensions (#17480)
- Exclude empty files from directory/glob expansion (#17478)
- Support use of SQLAlchemy "Connectable" in
write_database
(#17470)
🐞 Bug fixes
- Support duplicate expression names when calling ufuncs (#17641)
- Interpret %y consistently with Chrono in to_date/to_datetime/strptime (#17661)
- Fix explode invalid check (#17651)
- Raise for overlapping index/column names in pandas dataframes post string coercion (#17628)
- Expand brackets in async glob expansion (#17630)
- Fix row index disappearing after projection pushdown in NDJSON (#17631)
- Fix struct -> enum is_in (#17622)
- Don't needlessly unwrap in
pivot_schema
(#17611) - Reject literal input in
sort_by_exprs()
(#17606) - Don't enforce row order in join test results where not guaranteed (#17596)
- Bitmap collect into safety (#17588)
- Make schema picklable (#17524)
- Handle current position of file objects (#17543)
- Set
O_CLOEXEC
on duplicated file descriptor (#17537) - Method dt.truncate was sometimes returning incorrect results for pre-1970 datetimes (#17582)
- Defer path expansion until
collect
in file scan methods (#17532) - Fix
retries
parameter in scan functions not taking effect when it was set to0
(#17564) - Don't unwrap send attempt to oneshot channel (#17566)
- Fix scanning from HTTP cloud paths (#17571)
- Properly implement struct (#17522)
- Add right to lazyframe join docstring (#17529)
- Fix predicate pushdown for
.list.(get|gather)
(#17511) - Make sure
scan_ipc
does not go through fsspec (#17495) - Turn panic into error when serializing Object types (#17353)
- Fix struct expansion and raise on exclude (#17489)
- Normalize path in
sink_csv
(#17476)
📖 Documentation
- Update
plot
docs to refer to docstrings (#17504) - Rename
str.lengths
tostr.len_bytes
in description text (#11577) (#17626) - Create example for
polars.Expr.bin.decode
(#17508) - Add right join in the user guide (#17608)
- Adjust rendering of links in
read_database_uri
docstring (#17536) - Update SQL examples in README (#17568)
- Fixup "deprecated" directive for
DataFrame.melt
andLazyFrame.melt
(#17530) - Add
write_parquet_partitioned
(#17488) - Add example for writing hive partitioned parquet to user guide (#17483)
- Fix typo in Getting Started section of user guide (#17465)
🛠️ Other improvements
- Add DSL validation for cloud eligible check (#17287)
- Add
ArrayChunks
to optimize codegen of BatchDecoder (#17632) - Move path logic to from
utils
topath_utils
in polars-io (#17635) - Fix struct gather (#17621)
- Back to StructChunked name (#17609)
- Remove unused
with_column
method of PyLazyFrame (#17607) - Re-enable struct related tests (#17597)
- Completely redo structure of Parquet decoder (#17589)
- Fix struct outer validity;fmt;is_in;cast;cmp (#17590)
- Add/fix version-gating in some SQLAlchemy and Pandas tests (#17538)
- Add
style
accessor toDataFrame
(#17502) - Remove unused
is_supported_cloud
util (#17493)
Thank you to all our contributors for making this release possible!
@Julian-J-S, @MarcoGorelli, @alexander-beedie, @anergictcell, @arnabanimesh, @brandon-b-miller, @cmdlineluser, @coastalwhite, @deanm0000, @eitsupi, @flisky, @henryharbeck, @itamarst, @jonaylor89, @moritzwilksch, @nameexhaustion, @orlp, @phi-friday, @r-brink, @rcorty, @ritchie46, @ruihe774, @stinodego, @tylerriccio33 and @wence-