github pola-rs/polars py-1.25.2
Python Polars 1.25.2

2 days ago

🏆 Highlights

  • Enable common subplan elimination across plans in collect_all (#21747)
  • Add lazy sinks (#21733)
  • Add PartitionByKey for new streaming sinks (#21689)
  • Enable new streaming memory sinks by default (#21589)

🚀 Performance improvements

  • Implement linear-time rolling_min/max (#21770)
  • Improve InputIndependentSelect by delegating to InMemorySourceNode (#21767)
  • Enable common subplan elimination across plans in collect_all (#21747)
  • Allow elementwise functions in recursive lowering (#21653)
  • Add primitive single-key hashtable to new-streaming join (#21712)
  • Remove unnecessary black_boxes in Kahan summation (#21679)
  • Box large enum variants (#21657)
  • Improve join performance for new-streaming engine (#21620)
  • Pre-fill caches (#21646)
  • Optimize only a single cache input (#21644)
  • Collect parquet statistics in one contiguous buffer (#21632)
  • Update Cargo.lock (mainly for zstd 1.5.7) (#21612)
  • Don't maintain order when maintain_order=False in new streaming sinks (#21586)
  • Pre-sort groups in group-by-dynamic (#21569)

✨ Enhancements

  • Add support for rolling_(sum/min/max) for booleans through casting (#21748)
  • Support multi-column sort for all nested types and nested search-sorted (#21743)
  • Add lazy sinks (#21733)
  • Add PartitionByKey for new streaming sinks (#21689)
  • Fix replace flags (#21731)
  • Add mkdir flag to sinks (#21717)
  • Enable joins on list/array dtypes (#21687)
  • Add a config option to specify the default engine to attempt to use during lazyframe calls (#20717)
  • Support all elementwise functions in IO plugin predicates (#21705)
  • Stabilize Enum datatype (#21686)
  • Support Polars int128 in from arrow (#21688)
  • Use FFI to read dataframe instead of transmute (#21673)
  • Enable new streaming memory sinks by default (#21589)
  • Cloud support for new-streaming scans and sinks (#21621)
  • Add len method to arr (#21618)
  • Closeable files on unix (#21588)
  • Add new PartitionMaxSize sink (#21573)
  • Support engine callback for LazyFrame.profile (#21534)
  • Dispatch new-streaming CSV negative slice to separate node (#21579)
  • Add NDJSON source to new streaming engine (#21562)
  • Support passing token in storage_options for GCP cloud (#21560)

🐞 Bug fixes

  • Expose and document partitions (#21765)
  • Fix lazy schema for truediv ops involving List/Array dtypes (#21764)
  • Fix error due to race condition in file cache (#21753)
  • Clear NaNs due to zero-weight division in rolling var/std (#21761)
  • Allow init from BigQuery Arrow data containing ExtensionType cols with irrelevant metadata (#21492)
  • Disallow cast from boolean to categorical/enum (#21714)
  • Don't check sortedness in join_asof when 'by' groups supplied, but issue warning (#21724)
  • Incorrect multithread path taken for aggregations (#21727)
  • Disallow cast to empty Enum (#21715)
  • Fix list.mean and list.median returning Float64 for temporal types (#21144)
  • Incorrect (FixedSize)ListArrayBuilder gather implementation (#21716)
  • Always fallback in SkipBatchPredicate (#21711)
  • New streaming multiscan deadlock (#21694)
  • Ensure new-streaming join BuildState is correct even if never fed morsels (#21708)
  • IO plugin; support empty iterator (#21704)
  • Support nulls in multi-column sort (#21702)
  • Window function check length of groups state (#21697)
  • Support 128 sum reduction on new streaming (#21691)
  • IPC round-trip of list of empty view with non-empty bufferset (#21671)
  • Variance can never be negative (#21678)
  • Incorrect loop length in new-streaming group by (#21670)
  • Right join on multiple columns not coalescing left_on columns (#21669)
  • Casting Struct to String panics if n_chunks > 1 (#21656)
  • FixFuture attached to different loop error on read_database_uri (#21641)
  • Fix deadlock in cache + hconcat (#21640)
  • Properly handle phase transitions in row-wise sinks (#21600)
  • Enable new streaming memory sinks by default (#21589)
  • Always use global registry for object (#21622)
  • Check enum categories when reading csv (#21619)
  • Unspecialized prefiltering on nullable arrays (#21611)
  • Release the gil on explain (#21607)
  • Take into account scalar/partitioned columns in DataFrame::split_chunks (#21606)
  • Bad null handling in unordered row encoding (#21603)
  • Fix deadlock in new streaming CSV / NDJSON sinks (#21598)
  • Bad view index in BinaryViewBuilder (#21590)
  • Fix CSV count with comment prefix skipped empty lines (#21577)
  • New streaming IPC enum scan (#21570)
  • Several aspects related to ParquetColumnExpr (#21563)
  • Don't hit parquet::pre-filtered in case of pre-slice (#21565)

📖 Documentation

  • Add skrub to ecosystem.md (#21760)
  • Add example for percentile rank (#21746)
  • Make python/rust getting-started consistent and clarify performance risk of infer_schema_length=None (#21734)
  • Add expression composability to PySpark comparison (#21473)
  • Document read_().lazy() antipattern (#21623)
  • Update Polars Cloud interactive workflow examples (#21609)
  • Add a Plotnine example to the visualization docs (#21597)
  • Add cloud api reference to Ref guide (#21566)

🛠️ Other improvements

  • Remove variance numerical stability hack (#21749)
  • Only use chrono_tz timezones in hypothesis testing (#21721)
  • Remove order check from flaky test (#21730)
  • Add sinks into the DSL before optimization (#21713)
  • Add missing test case for #21701 (#21709)
  • Remove old-streaming from engine argument (#21667)
  • Add as_phys_any to PrivateSeries for downcasting (#21696)
  • Use FFI to read dataframe instead of transmute (#21673)
  • Work around typos ignore bug (#21672)
  • Added Test For datetime_range Nanosecond Overflow (#21354)
  • Update to edition 2024 (#21662)
  • Update rustc (#21647)
  • Support object from chunks (#21636)
  • Push versioned docs on workflow dispatch (#21630)
  • Fail docs early (#21629)
  • Check major/minor in docs (#21626)
  • Add docs workflow (#21624)
  • Add test for 21581 (#21617)
  • Remove even more parquet multiscan handling (#21601)
  • Remove multiscan handling from new streaming parquet source (#21584)
  • Prepare skeleton for partitioning sinks (#21536)

Thank you to all our contributors for making this release possible!
@GaelVaroquaux, @Kevin-Patyk, @MarcoGorelli, @Matt711, @NathanHu725, @alexander-beedie, @coastalwhite, @dependabot[bot], @jrycw, @kdn36, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @ritchie46, @wence- and dependabot[bot]

Don't miss a new polars release

NewReleases is sending notifications on new releases.