github pola-rs/polars rs-0.51.0
Rust Polars 0.51.0

11 hours ago

πŸ’₯ Breaking changes

  • Remove, deprecate or change eager Exprs to be lazy compatible (#24027)

πŸš€ Performance improvements

  • Use specialized decoding for all predicates for Parquet dictionary encoding (#24403)
  • Allocate only for read items when reading Parquet with predicate (#24401)
  • Don't aggregate groups for strict cast if original len (#24381)
  • Allocate only for read items when reading Parquet with predicate (#24324)
  • Native streaming int_range with len or count (#24280)
  • Lower arg_unique natively to the streaming engine (#24279)
  • Move unordering optimization to end (#24286)
  • Do ordering simplification step after common sub-plan elimination (#24269)
  • Always simplify order requirements in IR (#24192)
  • Basic de-duplication of filter expressions (#24220)
  • Cache the IR in pipe_with_schema (#24213)
  • Lower arg_where natively to streaming engine (#24088)
  • Lower Expr.shift to streaming engine (#24106)
  • Lower order-preserving groupby to streaming engine (#24053)
  • Lower .sort(maintain_order=True).head() to streaming top_k (#24014)
  • Lower top-k to streaming engine (#23979)
  • Allow order pass through Filters and relax to row-seperable instead of elementwise (#23969)

✨ Enhancements

  • Roundtrip BinaryOffset type through Parquet (#24344)
  • Add opt-in unstable functionality to load interval types as Struct (#24320)
  • Add user guide section on AWS role assumption (#24421)
  • Support unique / n_unique / arg_unique for array columns (#24406)
  • Support S3 virtual-hosted–style URI (#24405)
  • Remove explicit file create for local async writes (#24358)
  • Support Partitioning sinks in cloud (#24399)
  • User-friendly error message on empty path expansion (#24337)
  • Add Polars security policy (#24314)
  • Allow pl.Expr.log to take in an expression (#24226)
  • Implement diff() in streaming engine (#24189)
  • Enable Expr.diff(n) for negative n (#24200)
  • Allow upcasting null-typed columns to nested column types in scans (#24185)
  • Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
  • Add a deprecation warning for pl.Series.shift(Null) (#24114)
  • Improve Debug formatting of DataType (#24056)
  • Add cum_* as native streaming nodes (#23977)
  • Add peak_{min,max} support for booleans (#24068)
  • Add DataFrame.map_columns for eager evaluation (#23821)
  • Add native streaming for peaks_{min,max} (#24039)
  • IR graph arrows, monospace font, box nodes (#24021)
  • Add DataTypeExpr.default_value (#23973)
  • Lower rle to a native streaming engine node (#23929)
  • Add support for Int128 to pyo3-polars (#23959)
  • Lower rle_id to a native streaming node (#23894)
  • Pass endpoint_url loaded from CredentialProviderAWS to scan/write_delta (#23812)
  • Dispatch scan_iceberg to native by default (#23912)
  • Lower unique_counts and value_counts to streaming engine (#23890)
  • Implement dt.days_in_month function (#23119)
  • Fix errors on native scan_iceberg (#23811)
  • Reinterpret binary data to fixed size numerical array (#22840)
  • Make rolling_map serializable (#23848)

🐞 Bug fixes

  • Fix AggState on all_literal in BinaryExpr (#24461)
  • Replace unsafe with collect (#24494)
  • Show IR sort options in explain (#24465)
  • Benchmark CI import (#24463)
  • Fix schema on ApplyExpr with single row literal in agg context (#24422)
  • Fix planner schema for dividing pl.Float32 by int (#24432)
  • Fix panic scanning from AWS legacy global endpoint URL (#24450)
  • Emit proper tuple for Log in expression nodes (#24426)
  • Do not propagate struct of nulls with null (#24420)
  • Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
  • Implement approx_n_unique for temporal dtypes and Null (#24417)
  • Correct sink_ipc overload for compression (#24398)
  • Enable all integer dtypes for by parameter in join_asof (#24384)
  • Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373)
  • Fix incorrect output ordering for row-separable exprs (#24354)
  • Fix Series.__arrow_c_stream__ for Decimal and other logical types (#24120)
  • Match output type to engine for Struct arithmetic (#23805)
  • Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343)
  • Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338)
  • Incorrect logic in negative streaming slice (#24326)
  • Do not error on non-list Sequence for columns parameter in read_excel (#23967)
  • Invalid conversion from non-bit numpy bools (#24312)
  • Make dt.epoch('s') serializable (#24302)
  • Make Expr.rechunk serializable (#24303)
  • Schema mismatch for 'log' operation (#24300)
  • Incorrect first/last aggregate in streaming engine (#24289)
  • Fix group offsets in sliced groups (#24274)
  • Panic in inexact date(time) conversion (#24268)
  • The index_of feature should not depends on the object feature (#24256)
  • Keep DSL cache after serialization and deserialization (#24265)
  • Sanitize and warn about eval usage (#24262)
  • Unique with keep="none" in new optimization pass (#24261)
  • Correct size limits for Decimal cast (#24252)
  • Unordered unions in check order observing pass (#24253)
  • Fix dtype for slice on Literal in agg context (#24137)
  • Fix incorrect filter(lit(True)) when scanning hive (#24237)
  • In-memory group_by on 128-bit integers (#24242)
  • Fix panic in gather inside groupby with invalid indices (#24182)
  • Release the GIL in map_groups (#24225)
  • Remove extra explode in LazyGroupBy.{head,tail} (#24221)
  • Fix panic in polars cloud CSV scan (#24197)
  • Fix panic when loading categorical columns from IO plugin (#24205)
  • Fix engine type for concat_list on AggScalar implode (#24160)
  • Rolling_mean handle centered weights with len(values) < window_size (#24158)
  • Reading is_in predicate for Parquet plain strings (#24184)
  • Make PyCategories pickleable (#24170)
  • Remove unused unsound function to_mutable_slice (#24173)
  • PyO3 extension types giving compat_level errors (#24166)
  • Allow non-elementwise by in top_k (#24164)
  • Fix sort_by for group_by_dynamic context (#24152)
  • Input-independent length aggregations in streaming (#24153)
  • Release GIL when iterating df in to_arrow (#24151)
  • Respect non-elementwise join_where conditions (#24135)
  • Resolve schema mismatch for div on Boolean (#24111)
  • Keep name when doing empty group-aware aggregation (#24098)
  • Implode instead of reshape_list (#24078)
  • Rolling mean with weights incorrect when min_samples < window_size (#23485)
  • Allow merge_sorted for all types (#24077)
  • Include datatypes in row_encode expression (#24074)
  • Include UDF materialized type in serialization (#24073)
  • Correct .rolling() output type for non-aggregations (#24072)
  • Correct planner output schema for join_asof (#24071)
  • Allow %B to work without specifying day (#24009)
  • Correct output for fold and reduce (#24069)
  • Expr.meta.output_name for struct fields (#24064)
  • Ensure upcast operations on pl.Date default to microsecond precision (#23981)
  • Add peak_{min,max} support for booleans (#24068)
  • Planner output type for mean with strange input type (#24052)
  • Remove, deprecate or change eager Exprs to be lazy compatible (#24027)
  • Scan of multiple sources with null datatype (#24065)
  • Categorical in nested data in row encoding (#24051)
  • Missing length update in builder for pl.Array repetition (#24055)
  • Race condition in global categories init (#24045)
  • Revert "fix: Don't encode entire CategoricalMapping when going to Arrow (#24036)" (#24044)
  • Error when using named functions (#24041)
  • Don't encode entire CategoricalMapping when going to Arrow (#24036)
  • Fix cast on arithmetic with lit (#23941)
  • Incorrect slice-slice pushdown (#24032)
  • Dedup common cache subplan in IR graph (#24028)
  • Allow join on Decimal in in-memory engine (#24026)
  • Fix datatypes for eval.list in aggregation context (#23911)
  • Allocator capsule fallback panic (#24022)
  • Accept another zlib "magic header" file signature (#24013)
  • Fix truediv dtypes so cast in list.eval is not dropped (#23936)
  • Don't reuse cached return_dtype for expanded map expressions (#24010)
  • Cache id is not a valid dot node id (#24005)
  • Align map_elements with and without return_dtype (#24007)
  • Fix column dtype lifetime for csv_write segfault on Categorical (#23986)
  • Allow serializing LazyGroupBy.map_groups (#23964)
  • Correct allocator name in PyCapsule (#23968)
  • Mismatched types for write function for windows (#23915)
  • Fix unpivot panic when index= column not found (#23958)
  • Fix assert_frame_equal with check_dtypes=False for all-null series with different types (#23943)
  • Return correct python package version (#23951)
  • Categorical namespace functions fail on Enum columns (#23925)
  • Properly set sumwise complete on filter for missing columns (#23877)
  • Restore Arrow-FFI-based Python<->Rust conversion in pyo3-polars (#23881)
  • Group By with filters (#23917)
  • Fix read_csv ignoring Decimal schema for header-only data (#23886)
  • Ensure collect() native Iceberg always scans latest when no snapshot_id is given (#23907)
  • Writing List(Array) columns to JSON without panic (#23875)
  • Fill Iceberg missing fields with partition values if present in metadata (#23900)
  • Create file for streaming sink even if unspawned (#23672)
  • Update cloud testing environment (#23908)
  • Parquet filtering on multiple RGs with literal predicate (#23903)
  • Incorrect datatype passed to libc::write (#23904)
  • Properly feature gate TZ_AWARE_RE usage (#23888)
  • Improve identification of "non group-key" aggregates in SQL GROUP BY queries (#23191)
  • Spawning tokio task outside reactor (#23884)
  • Correctly raise DuplicateError on asof_join with suffix="" (#23864)
  • Fix errors on native scan_iceberg (#23811)
  • Fix index out of bounds panic filtering parquet (#23850)
  • Fix error on empty range requests (#23844)
  • Fix handling of hive partitioning hive_start_idx parameter (#23843)

πŸ“– Documentation

  • Rename avg_birthday -> avg_age in examples aggregation (#23726)
  • Update Polars Cloud user guide (#24366)
  • Update to Polars Cloud user guide (#24187)
  • Update distributed page (#24323)
  • Add Polars security policy (#24314)
  • Fix few typos (#24305)
  • Add missing reference to LazyFrame.pipe_with_schema() on the website (#24285)
  • Fix formatting of Series.value_counts examples (#24245)
  • Add DataFrame.map_columns to API (#24128)
  • Update multiple pages in the Polars Cloud user guide (#23661)
  • Improve StackOverflow links in contributing guide (#23895)
  • Fix pyo3 documentation page link (#23839)
  • Document the pureness requirements of udfs (#23787)

πŸ“¦ Build system

  • Re-enable macos-x86-64 (#24266)
  • Drop binary support for macos_x86-64 (#24257)

πŸ› οΈ Other improvements

  • Use PlanCallback in name.map_* (#24484)
  • Replace unsafe with collect (#24494)
  • Move dataset expansion to end and refactor not to use stack optimizer (#24457)
  • Pin xlsvwriter to 3.2.5 or before (#24485)
  • Add methods to EnumUnitVec and shorten name (#24415)
  • Move CompressionUtils to polars-utils (#24430)
  • Update github template to dispatch to cloud client (#24416)
  • Bump c-api (#24412)
  • Add a regression test for #7631 (#24363)
  • Update cloud test InteractiveQuery to DirectQuery (#24287)
  • Mark some tests as slow (#24327)
  • Mark more tests as ready for cloud (#24315)
  • Remove unnecessary stable_features for AVX512 (#24321)
  • Remove PDS-H code (#24301)
  • Get ready for even more cloud tests (#24292)
  • Add tests for slices with caches (#24288)
  • Readd ordering tests (#24284)
  • Expand BitRepr to u8/u16 and use in in_memory group_by (#24248)
  • Fix Makefile venv path (#24251)
  • Remove unnecessary parentheses (#24244)
  • Remove some transmutes (#24246)
  • Wrap Py* data structures in polars-python in locks (#24209)
  • Make non-nested shift{,_and_fill} ops generic (#24224)
  • Remove unused Wrap (#24214)
  • Propagate some python feature flags (#24201)
  • Allow upcasting null-typed columns to nested column types in scans (#24185)
  • Automatically label a few more types of PR (#24147)
  • Update toolchain (#24156)
  • InMemoryJoin should be coloured as InMemoryFallback (#24154)
  • Fool-proof retrieve_error_msg (#24132)
  • Add order_sensitive property for AExpr (#24116)
  • Mark more tests as not possible on cloud (#24103)
  • Turn AggExpr::Count from tuple to struct (#24096)
  • Mark tests that may fail in cloud (#24067)
  • Make CI perf failures more lenient (#24066)
  • Fix hive partition string encoding in CI by upgrading deltalake (#24018)
  • Avoid unreachable if dtype feature is not enabled (#24062)
  • Make tests with sinks run on cloud again (#24048)
  • Update pyo3-polars versions (#24031)
  • Remove insert_error_function (#24023)
  • Remove cache hits, clean up in-mem prefill (#24019)
  • Use .venv instead of venv in pyo3-polars examples (#24024)
  • Fix test failing mypy (#24017)
  • Remove outdated comment (#23998)
  • Add a _plr.pyi to remove mypy issues (#23970)
  • Don't define CountStar as dyn OptimizationRule (#23976)
  • Rename atol and rtol to abs_tol and rel_tol (#23961)
  • Introduce Row{Encode,Decode} as FunctionExpr (#23933)
  • Dispatch through pl.map_batches and AnonymousColumnsUdf (#23867)
  • Ensure clippy and rustfmt run in CI when changing pyo3-polars (#23930)
  • Split column_selector.rs (#23921)
  • Fix pyo3-polars proc-macro re-exports (#23918)
  • Make GetBatchState polling functions unsafe (#23795)
  • Rewrite evaluate_on_groups for .gather / .get (#23700)
  • Remove Context from logical layer (#23863)
  • Add proptest strategy for Polars DataType schemas (#23854)
  • Move Python C API to python-polars (#23876)
  • Refactor directory structure of streaming multi-scan (#23865)
  • Add subphase and query task spawning to StreamingExecState (#23725)
  • Update Rust Polars versions (#23861)
  • Make polars-parquet optional (#23860)
  • Relax constraint on maximum Python version for numba (#23838)

Thank you to all our contributors for making this release possible!
@Gusabary, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Matt711, @NeejWeej, @VictorAtIfInsurance, @agossard, @alexander-beedie, @aparna2198, @borchero, @c-peters, @camriddell, @cgevans, @cmdlineluser, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @etiennebacher, @gab23r, @gfvioli, @henryharbeck, @iishutov, @itamarst, @jarondl, @jimmmmmmmmmmmy, @jjurm, @joshuamarkovic, @juansolm, @kdn36, @kuril, @math-hiyoko, @mcrumiller, @mpasa, @mrkn, @mroeschke, @nameexhaustion, @nesb1, @orlp, @pka, @pomo-mondreganto, @r-brink, @rawhuul, @ritchie46, @stijnherfst, @vdrn and @wence-

Don't miss a new polars release

NewReleases is sending notifications on new releases.