🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches
(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Skip files in
scan_iceberg
with filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
- Native streaming
.mode()
expression (#24459)
✨ Enhancements
- Add
LazyFrame.{sink,collect}_batches
(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefix
parameter toscan_parquet
(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
- Add unstable
pl.Config.set_default_credential_provider
(#24434) - Roundtrip
BinaryOffset
type through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct
(#24320) - Support reading parquet metadata from cloud storage (#24443)
- Add user guide section on AWS role assumption (#24421)
- Support
unique
/n_unique
/arg_unique
forarray
columns (#24406)
🐞 Bug fixes
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::join
for cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Restore support for np.datetime64() in pl.lit() (#24527)
- Ignore Iceberg list element ID if missing (#24479)
- Fix panic on streaming full join with coalesce (#23409)
- Fix
AggState
onall_literal
inBinaryExpr
(#24461) - Show IR sort options in
explain
(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExpr
with single rowliteral
in agg context (#24422) - Fix planner schema for dividing
pl.Float32
by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Fix
iterable_to_pydf(..., infer_schema_length=None)
to scan all data (#23405) - Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False
(#24404) - Implement
approx_n_unique
for temporal dtypes and Null (#24417)
📖 Documentation
- Rename
avg_birthday
->avg_age
in examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
- Fix typo in
set_expr_depth_warning
docstring (#24427)
🛠️ Other improvements
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Update versions (#24508)
- Add additional unit tests for
pl.concat
(#24487) - Refactor parametric tests for
as_struct
on aggstates (#24493) - Use
PlanCallback
inname.map_*
(#24484) - Pin
xlsvwriter
to3.2.5
or before (#24485) - Add dataclass to hold resolved iceberg scan data (#24418)
- Fix iceberg test failure in CI (#24456)
- Move CompressionUtils to polars-utils (#24430)
- Update github template to dispatch to cloud client (#24416)
Thank you to all our contributors for making this release possible!
@Gusabary, @Kevin-Patyk, @Matt711, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dongchao-1, @dsprenkels, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst