pola-rs/polars rs-0.50.0 on GitHub

🏆 Highlights

Make Selector a concrete part of the DSL (#23351)
Rework Categorical/Enum to use (Frozen)Categories (#23016)

🚀 Performance improvements

Lower Expr.slice to streaming engine (#23683)
Elide bound check (#23653)
Preserve Column repr in ColumnTransform operations (#23648)
Lower any() and all() to streaming engine (#23640)
Lower row-separable functions in streaming engine (#23633)
Lower int_range(len()) to with_row_index (#23576)
Avoid double field resolution in with_columns (#23530)
Rolling quantile lower time complexity (#23443)
Use single-key optimization with Categorical (#23436)
Improve null-preserving identification for boolean functions (#23317)
Improve boolean bitwise aggregate performance (#23325)
Enable Parquet expressions and dedup is_in values in Parquet predicates (#23293)
Re-write join types during filter pushdown (#23275)
Generate PQ ZSTD decompression context once (#23200)
Trigger cache/cse optimizations when multiplexing (#23274)
Cache FileInfo upon DSL -> IR conversion (#23263)
Push more filters past joins (#23240)

✨ Enhancements

Expand on DataTypeExpr (#23249)
Lower row-separable functions in streaming engine (#23633)
Add scalar checks to range expressions (#23632)
Expose POLARS_DOT_SVG_VIEWER to automatically dispatch to SVG viewer (#23592)
Implement mean function in arr namespace (#23486)
Implement vec_hash for List and Array (#23578)
Add unstable pl.row_index() expression (#23556)
Add Categories on the Python side (#23543)
Implement partitioned sinks for the in-memory engine (#23522)
Expose IRFunctionExpr::Rank in the python visitor (#23512)
Raise and Warn on UDF's without return_dtype set (#23353)
IR pruning (#23499)
Expose IRFunctionExpr::FillNullWithStrategy in the python visitor (#23479)
Support min/max reducer for null dtype in streaming engine (#23465)
Implement streaming Categorical/Enum min/max (#23440)
Allow cast to Categorical inside list.eval (#23432)
Support pathlib.Path as source for read/scan_delta() (#23411)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Pass payload in ExprRegistry (#23412)
Support reading nanosecond/Int96 timestamps and schema evolved datasets in scan_delta() (#23398)
Support row group skipping with filters when cast_options is given (#23356)
Execute bitwise reductions in streaming engine (#23321)
Use scan_parquet().collect_schema() for read_parquet_schema (#23359)
Add dtype to str.to_integer() (#22239)
Add arr.slice, arr.head and arr.tail methods to arr namespace (#23150)
Add is_close method (#23273)
Drop superfluous casts from optimized plan (#23269)
Added drop_nulls option to to_dummies (#23215)
Support comma as decimal separator for CSV write (#23238)
Don't format keys if they're empty in dot (#23247)
Improve arity simplification (#23242)

🐞 Bug fixes

Fix credential refresh logic (#23730)
Fix to_datetime() fallible identification (#23735)
Correct output datatype for dt.with_time_unit (#23734)
Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
Allow DataType expressions with selectors (#23720)
Match output type to engine for interpolate on Decimal (#23706)
Remaining bugs in with_exprs_and_input and pruning (#23710)
Match output dtype to engine for cum_sum_horizontal (#23686)
Field names for pl.struct in group-by (#23703)
Fix output for str.extract_groups with empty string pattern (#23698)
Match output type to engine for rolling_map (#23702)
Fix incorrect join on single Int128 column for in-memory engine (#23694)
Match output field name to lhs for BusinessDaycount (#23679)
Correct the planner output datatype for strptime (#23676)
Sort and Scan with_exprs_and_input (#23675)
Revert to old behavior with name.keep (#23670)
Fix panic loading from arrow Map containing timestamps (#23662)
Selectors in self part of list.eval (#23668)
Fix output field dtype for ToInteger (#23664)
Allow decimal_comma with , separator in read_csv (#23657)
Fix handling of UTF-8 in write_csv to IO[str] (#23647)
Selectors in {Lazy,Data}Frame.filter (#23631)
Stop splitfields iterator at eol in simd branch (#23652)
Correct output datatype of dt.year and dt.mil (#23646)
Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
Order-preserving equi-join didn't always flush final matches (#23639)
Fix ColumnNotFound error when joining on col().cast() (#23622)
Fix agg groups on when/then in group_by context (#23628)
Output type for sign (#23572)
Apply agg_fn on null values in pivot (#23586)
Remove nonsensical duration variance (#23621)
Don't panic when sinking nested categorical to Parquet (#23610)
Correctly set value count output field name (#23611)
Casting unused columns in to_torch (#23606)
Allow inferring of hours-only timezone offset (#23605)
Bug in Categorical <-> str compare with nulls (#23609)
Honor n=0 in all cases of str.replace (#23598)
Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
Relabel duplicate sequence IDs in distributor (#23593)
Round-trip Enum and Categorical metadata in plugins (#23588)
Fix incorrect join_asof with by followed by head/slice (#23585)
Allow writing nested Int128 data to Parquet (#23580)
Enum serialization assert (#23574)
Output type for peak_min / peak_max (#23573)
Make Scalar Categorical, Enum and Struct values serializable (#23565)
Preserve row order within partition when sinking parquet (#23462)
Panic in create_multiple_physical_plans when branching from a single cache node (#23561)
Prevent in-mem partition sink deadlock (#23562)
Update AWS cloud documentation (#23563)
Correctly handle null values when comparing structs (#23560)
Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
Make Expr.append serializable (#23515)
Float by float division dtype (#23529)
Division on empty DataFrame generating null row (#23516)
Partition sink copy_exprs and with_exprs_and_input (#23511)
Unreachable with pl.self_dtype (#23507)
Rolling median incorrect min_samples with nulls (#23481)
Make Int128 roundtrippable via Parquet (#23494)
Fix panic when common subplans contain IEJoins (#23487)
Properly handle non-finite floats in rolling_sum/mean (#23482)
Make read_csv_batched respect skip_rows and skip_lines (#23484)
Always use cloudpickle for the python objects in cloud plans (#23474)
Support string literals in index_of() on categoricals (#23458)
Don't panic for finish_callback with nested datatypes (#23464)
Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
Fix var/moment dtypes (#23453)
Fix agg_groups dtype (#23450)
Clear cached_schema when apply changes dtype (#23439)
Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
Null handling in full-null group_by_dynamic mean/sum (#23435)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Fix index calculation for nearest interpolation (#23418)
Fix compilation failure with --no-default-features and --features lazy,strings (#23384)
Parse parquet footer length into unsigned integer (#23357)
Fix incorrect results with group_by aggregation on empty groups (#23358)
Fix boolean min() in group_by aggregation (streaming) (#23344)
Respect data-model in map_elements (#23340)
Properly join URI paths in PlPath (#23350)
Ignore null values in bitwise aggregation on bools (#23324)
Fix panic filtering after left join (#23310)
Out-of-bounds index in hot hash table (#23311)
Fix scanning '?' from cloud with glob=False (#23304)
Fix filters on inserted columns did not remove rows (#23303)
Don't ignore return_dtype (#23309)
Use safe parsing for get_normal_components (#23284)
Fix output column names/order of streaming coalesced right-join (#23278)
Restore concat_arr inputs expansion (#23271)

📖 Documentation

Point the R Polars version on R-multiverse (#23660)
Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
Add page about billing to Polars Cloud user guide (#23564)
Small user-guide improvement and fixes (#23549)
Correct note in from_pandas about data being cloned (#23552)
Fix a few typos in the "Streaming" section (#23536)
Update streaming page (#23535)
Update structure of Polars Cloud documentation (#23496)
Update when_then in user guide (#23245)

📦 Build system

Update all rand code (#23387)
Bump up rand & rand_distr (#22619)

🛠️ Other improvements

Remove incorrect DeletionFilesList::slice (#23796)
Remove old schema file (#23798)
Remove Default for StreamingExecutionState (#23729)
Explicit match to smaller dtypes before cast to Int32 in asof join (#23776)
Expose PlPathRef via polars::prelude (#23754)
Add hashes json (#23758)
Add AExpr::is_expr_equal_to (#23740)
Fix rank test to respect maintain order (#23723)
IR inputs and exprs iterators (#23722)
Store more granular schema hashes to reduce merge conflicts (#23709)
Add assertions for unique ID (#23711)
Use RelaxedCell in multiscan (#23712)
Debug assert ColumnTransform cast is non-strict (#23717)
Use UUID for UniqueID (#23704)
Remove scan id (#23697)
Propagate Iceberg physical ID schema to IR (#23671)
Remove unused and confusing match arm (#23691)
Remove unused ALLOW_GROUP_AWARE flag (#23690)
Remove unused evaluate_inline (#23687)
Remove unused field from AggregationContext (#23685)
Remove node_to_lp (#23678)
Underscore prefix for get_backing_series/to_new_from_backing (#23659)
Make helper functions private for equality assertions and update test (#23650)
Use RelaxedCell for fully relaxed atomics (#23644)
Replace PlSmallStr::from_static("item") with LIST_VALUES_NAME (#23645)
Fix cloud bytes scanning and read_* functions (#23642)
Group By maintain order on test (#23643)
Add maintain_order tests for streaming joins (#23577)
Add logic to support struct field renames on arbitrary nesting levels (#23532)
Continue on cloud testing (#23616)
Add pyo3-polars (#23571)
Remove _fetch (#23607)
Replace agg_list in AExpr::to_field with is_scalar_ae (#23582)
Mark select test case as write_disk (#23566)
Rolling order checking of test (#23568)
Multiple in-mem plans with reused cache #23561 (#23567)
Reduce warning in docs serve (#23534)
Remove left-behind print statement (#23533)
Make list.to_struct and arr.to_struct serializable (#23504)
Small conftest improvement (#23508)
Improve Categories error message (#23510)
Add test to ensure the global categories gets cleaned up (#23502)
Add more testing to group_by sorted test (#23500)
Pruning follow-up (#23501)
Make arg_min, arg_max, arg_sort and product into concrete DSL and IR constructs (#23493)
Simpify arena iterators (#23495)
Remove unnecessary may_fail_auto_streaming (#23477)
Remove StringCache from the test suite (#23473)
Make Selector a concrete part of the DSL (#23351)
Add streaming engine to code-coverage (#23441)
Remove hashbrown_nightly_hack (#23445)
Move options out of RollingFunction (#23430)
Drop New from RowEncodingCategoricalContext (#23431)
Remove unneeded allocations when creating PlPath (#23417)
Rework Categorical/Enum to use (Frozen)Categories (#23016)
Do not depends on pyo3 without the python feature (#23420)
Ignore Sort if 'by' is empty (#23320)
Rename from_buffer()/FromBuffer to reinterpret()/Reinterpret (#23362)
Clean up ChunkFilter implementation (#23378)
Only conver to ExprIR once in with_columns (#23352)
Update rust version in nix flake (#23347)
Update toolchain and fix clippy issues (#23334)
Optimize equality comparisons and fix error handling (#23281)
Improve cloud tests (#23312)
Casting from binview to primitives code moved from polars-ops to polars-compute (#23234)
Improve DSL source cache (#23282)
Add new PlPath that abstracts over PathBuf and URI (#23280)
Add may_fail_cloud mark for pytest (#23279)
Organize dsl_to_ir logic into modules (#23277)
Add flag for auto distributed testing (#23220)
Remove unused PyDataType (#23265)
Split FileScan in FileScanDsl and FileScanIR (#23260)

Thank you to all our contributors for making this release possible!
@Declow, @JakubValtar, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @TheLostLambda, @Washiil, @alexander-beedie, @borchero, @c-peters, @cmdlineluser, @coastalwhite, @deanm0000, @eitsupi, @etiennebacher, @florian-klein, @gfvioli, @habaneraa, @itamarst, @kdn36, @ldhwaddell, @math-hiyoko, @mpasa, @nameexhaustion, @orlp, @othijssens, @r-brink, @ritchie46 and @stijnherfst

pola-rs/polars rs-0.50.0 Rust Polars 0.50.0 on GitHub

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

pola-rs/polars rs-0.50.0
Rust Polars 0.50.0

on GitHub