github pola-rs/polars py-1.36.0-beta.2
Python Polars 1.36.0-beta.2

pre-release4 hours ago

๐Ÿ† Highlights

  • Add Extension types (#25322)

โœจ Enhancements

  • Add SQL support for ROW_NUMBER, RANK, and DENSE_RANK functions (#25409)
  • Add SQL support for named WINDOW references (#25400)
  • Add BIT_NOT support to the SQL interface (#25094)
  • Add LazyFrame.pivot (#25016)
  • Add allow_empty flag to item (#25048)
  • Add empty_as_null and keep_nulls flags to Expr.explode (#25289)
  • Add empty_as_null and keep_nulls to {Lazy,Data}Frame.explode (#25369)
  • Add having to group_by context (#23550)
  • Add ignore_nulls to first / last (#25105)
  • Add maintain_order to Expr.mode (#25377)
  • Add quantile for missing temporals (#25464)
  • Add leftmost option to str.replace_many / str.find_many / str.extract_many (#25398)
  • Add strict parameter to pl.concat(how='horizontal') (#25452)
  • Add support for Float16 dtype (#25185)
  • Add unstable Schema.to_arrow (#25149)
  • Allow Expr.rolling in aggregation contexts (#25258)
  • Allow Expr.unique on List/Array with non-numeric types (#25285)
  • Allow glimpse to return a DataFrame (#24803)
  • Allow hash for all List dtypes (#25372)
  • Allow implode and aggregation in aggregation context (#25357)
  • Allow slice on scalar in aggregation context (#25358)
  • Allow arbitrary Expressions in "subset" parameter of unique frame method (#25099)
  • Allow arbitrary expressions as the Expr.rolling index_column (#25117)
  • Allow bare .row on a single-row DataFrame, equivalent to .item on a single-element DataFrame (#25229)
  • Allow elementwise Expr.over in aggregation context (#25402)
  • Allow pl.Object in pivot value (#25533)
  • Automatically Parquet dictionary encode floats (#25387)
  • Display function of streaming physical plan map node (#25368)
  • Documentation on Polars Cloud manifests (#25295)
  • Expose and document pl.Categories (#25443)
  • Expose fields for generating physical plan visualization data (#25562)
  • Extend SQL UNNEST support to handle multiple array expressions (#25418)
  • Improve SQL UNNEST behaviour (#22546)
  • Improve error message on unsupported SQL subquery comparisons (#25135)
  • Make DSL-hash skippable (#25140)
  • Minor improvement for as_struct repr (#25529)
  • Move GraphMetrics into StreamingQuery (#25310)
  • Raise suitable error on non-integer "n" value for clear (#25266)
  • Rewrite IR::Scan to IR::DataFrameScan in expand_datasets when applicable (#25106)
  • Set polars/ user-agent (#25112)
  • Streaming {Expr,LazyFrame}.rolling (#25058)
  • Support BYTE_ARRAY backed Decimals in Parquet (#25076)
  • Support ewm_var/std in streaming engine (#25109)
  • Support unique_counts for all datatypes (#25379)
  • Support additional forms of SQL "CREATE TABLE" statements (#25191)
  • Support arbitrary expressions in SQL JOIN constraints (#25132)
  • Support column-positional SQL "UNION" operations (#25183)
  • Support decimals in search_sorted (#25450)
  • Temporal quantile in rolling context (#25479)
  • Use reference to Graph pipes when flushing metrics (#25442)

๐Ÿš€ Performance improvements

  • Add parquet prefiltering for string regexes (#25381)
  • Add streaming native LazyFrame.group_by_dynamic (#25342)
  • Add streaming sorted Group-By (#25013)
  • Allow detecting plan sortedness in more cases (#25408)
  • Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
  • Enable predicate expressions on unsigned integers (#25416)
  • Fast find start window in group_by_dynamic with large offset (#25376)
  • Faster kernels for rle_lengths (#25448)
  • Fuse positive slice into streaming LazyFrame.rolling (#25338)
  • Lazy gather for {forward,backward}_fill in group-by contexts (#25115)
  • Mark Expr.reshape((-1,)) as row separable (#25326)
  • Mark output of more non-order-maintaining ops as unordered (#25419)
  • Optimize ipc stream read performance (#24671)
  • Reduce HuggingFace API calls (#25521)
  • Return references from aexpr_to_leaf_names_iter (#25319)
  • Skip filtering scan IR if no paths were filtered (#25037)
  • Use bitmap instead of Vec in first/last w. skip_nulls (#25318)
  • Use fast path for agg_min/agg_max when nulls present (#25374)
  • Use strong hash instead of traversal for CSPE equality (#25537)

๐Ÿž Bug fixes

  • Add .rolling_rank support for temporal types and pl.Boolean (#25509)
  • Address issues with SQL OVER clause behaviour for window functions (#25249)
  • Aggregation with drop_nulls on literal (#25356)
  • Allow Null dtype values in scatter (#25245)
  • Allow broadcast in group_by for ApplyExpr and BinaryExpr (#25053)
  • Allow empty list in sort_by in list.eval context (#25481)
  • Allow for negative time in group_by_dynamic iterator (#25041)
  • Always respect return_dtype in map_elements and map_rows (#25504)
  • AnyValue::to_physical for categoricals (#25341)
  • Apply CSV dict overrides by name only (#25436)
  • Block predicate pushdown when group_by key values are changed (#25032)
  • Bugs in pl.from_repr with signed exponential floats and line wrapping (#25331)
  • Correct drop_items for scalar input (#25351)
  • Correct eq_missing for struct with nulls (#25363)
  • Correct {first,last}_non_null if there are empty chunks (#25279)
  • Correct handle requested stops in streaming shift (#25239)
  • Correctly prune projected columns in hints (#25250)
  • DSL_SCHEMA_HASH should not changed by line endings (#25123)
  • Don't push down predicates passed inserted cache nodes (#25042)
  • Don't quietly allow unsupported SQL SELECT clauses (#25282)
  • Don't trigger DeprecationWarning from SQL "IN" constraints that use subqueries (#25111)
  • Enhanced column resolution/tracking through multi-way SQL joins (#25181)
  • Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
  • Ensure out-of-range integers and other edge case values don't give wrong results for index_of (#24369)
  • Fix CSV select(len) off by 1 with comment prefix (#25069)
  • Fix arr.{eval,agg} in aggregation context (#25390)
  • Fix format_str in case of multiple chunks (#25162)
  • Fix groups update on slices with different offsets (#25097)
  • Fix assertion panic on group_by (#25179)
  • Fix building polars-expr without timezones feature (#25254)
  • Fix building polars-mem-engine with the async feature (#25300)
  • Fix building polars-plan with features lazy,concat_str (but no strings) (#25306)
  • Fix dictionary replacement error in write_ipc (#25497)
  • Fix expr slice pushdown causing shape error on literals (#25485)
  • Fix field metadata for nested categorical PyCapsule export (#25052)
  • Fix group lengths check in sort_by with AggregatedScalar (#25503)
  • Fix handling Null dtype in ApplyExpr on group_by (#25077)
  • Fix incorrect .list.eval after slicing operations (#25540)
  • Fix incorrect reshape on sliced lists (#25139)
  • Fix length preserving check for eval expressions in streaming engine (#25294)
  • Fix occurence of exact matches of .join_asof(strategy="nearest", allow_exact_matches=False, ...) (#25506)
  • Fix off-by-one bug in ColumnPredicates generation for inequalities operating on integer columns (#25412)
  • Fix panic if scan predicate produces 0 length mask (#25089)
  • Fix panic in dt.truncate for invalid duration strings (#25124)
  • Fix panic in is_between support in streaming Parquet predicate push down (#25476)
  • Fix panic when using struct field as join key (#25059)
  • Fix serialization of lazyframes containing huge tables (#25190)
  • Fix single-column CSV header duplication with leading empty lines (#25186)
  • Fix small bug with PyExpr to PyObject conversion (#25265)
  • Group-By aggregation problems caused by AmortSeries (#25043)
  • Handle some unusual pl.col.<colname> edge-cases (#25153)
  • Incorrect result in aggregated first/last with ignore_nulls (#25414)
  • Incorrect results for aggregated {n_,}unique on bools (#25275)
  • Invert drop_nans filtering in group-by context (#25146)
  • Make str.json_decode output deterministic with lists (#25240)
  • Mark {forward,backward}_fill as length_preserving (#25352)
  • Minor improvement to internal is_pycapsule utility function (#25073)
  • Nested dtypes in streaming first_non_null/last_non_null (#25375)
  • Nested dtypes in streaming first/last (#25298)
  • Panic exception when calling Expr.rolling in .over (#25283)
  • Panic in group_by_dynamic with group_by and multiple chunks (#25075)
  • Parquet is_in for mixed validity pages (#25313)
  • Prevent panic when joining sorted LazyFrame with itself (#25453)
  • Raise error for all/any on list instead of panic (#25018)
  • Raise error on out-of-range dates in temporal operations (#25471)
  • Remove Expr casts in pl.lit invocations (#25373)
  • Resolve edge-case with SQL aggregates that have the same name as one of the "GROUP BY" keys (#25362)
  • Return the correct string-case Expr reprs (#25101)
  • Reverse on chunked struct (#25281)
  • Revert pl.format behavior with nulls (#25370)
  • Rolling mean/median for temporals (#25512)
  • Run async DB queries with regular asyncio if not inside a running loop (#25268)
  • SQL "NATURAL" joins should coalesce the key columns (#25353)
  • Schema mismatch with list.agg, unique and scalar (#25348)
  • Solve multiple issues relating to arena mutation in SQL subqueries (#25110)
  • Strict conversion AnyValue to Struct (#25536)
  • Support "index" as column name in group_by iterator (#25138)
  • Support AggregatedList in list.{eval,agg} context (#25385)
  • The SQL interface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091)
  • Unique key names in streaming sort/top_k (#25082)
  • Unique on literal in aggregation context (#25359)
  • Use (i64, u64) for VisualizationData (offset, length) slices (#25203)
  • Use Cargo.template.toml to prevent git dependencies from using template (#25392)
  • Validate list.slice parameters are not lists (#25458)
  • Wide-table join performance regression (#25222)

๐Ÿ“– Documentation

  • Add Extension and BaseExtension to doc index (#25444)
  • Add LazyFrame.pivot to reference guide (#25482)
  • Add having API references (#25428)
  • Add docstring example showing str.slice taking Expression params (#25461)
  • Add polars-on-premise documentation (#25431)
  • Clarify bitwise behaviour of and_, or_, and not_ Expressions on integer columns (#25092)
  • Correct link to datetime_range instead of date_range in resampling page (#25532)
  • Deprecate Categorical functions for lexical ordering and local checks (#25514)
  • Document schema parameter in meta methods (#25543)
  • Explain aggregation & sorting of lists (#25260)
  • Fix LanceDB URL (#25198)
  • Fix incorrect 'bitwise' in any_horizontal/all_horizontal docstring (#25469)
  • Fix link errors reported by markdown-link-check (#25314)
  • Fix non-existent replace_all reference in replace docs (#25161)
  • Fix source path (#25170)
  • Fix typo in public dataset URL (#25044)
  • Mention Narwhals in ecosystem page (#25100)
  • Remove lzo from parquet write options (#25522)
  • Update LazyFrame.collect_schema docstring (#25508)
  • Update LazyFrame.remote signature (#25175)
  • Update on-premise documentation (#25489)
  • Update user guide for QueryProgress rename to QueryProfile (#25195)

๐Ÿงช Tests

  • Add assert_sql_matches coverage for SQL "DISTINCT" and "DISTINCT ON" syntax (#25440)
  • Add reliable test for pl.format on multiple chunks (#25164)
  • Add test for unique with column subset (#25241)
  • Better coverage for group_by aggregations (#25290)
  • Test for group_by(...).having(...) (#25430)

๐Ÿ”ง CI

  • Automatically label pull requests that change the DSL (#25177)
  • Avoid relabelling changes-dsl on every commit (#25216)
  • Print expected DSL schema hashes if mismatched (#25526)
  • Skip existing files in pypi upload (#25576)

๐Ÿ—๏ธ Build system

  • Fix make fmt and make lint commands (#25200)
  • Make building the docs on macOS more reliable (#25095)

๐Ÿ› ๏ธ Other improvements

  • Add Final type-qualifier to module-level constants (#25556)
  • Add proptest AnyValue strategies (#25510)
  • Add proptest DataFrame strategy (#25446)
  • Add proptest strategies for Series logical types (#24849)
  • Add proptest strategies for Series nested types (#25220)
  • Add some cleanup (#25445)
  • Add toolchain file to runtimes for sdist (#25311)
  • Enable more streaming tests (#25364)
  • Fix --uv argument for benchmark-remote (#25513)
  • Fix Decimal precision annotation (#25227)
  • Fix feature gating TZ_AWARE_RE again (#25493)
  • Fix template path in release-python workflow (#25565)
  • Fix typo in CI release workflow (#25309)
  • Make python docs build again (#25165)
  • Remove Column::Partitioned (#25324)
  • Remove debug file write from test suite (#25393)
  • Remove unused import (#25365)
  • Run maturin with --uv option (#25490)
  • Silence unused mut warning (#25093)
  • Skip rust integration tests for coverage in CI (#25558)
  • Update markdown link checker (#25201)
  • Update toolchain (#25007)
  • Update versions (#25141)
  • Upgrade to schemars 0.9.0 (#25158)
  • Upgraded ruff and typos and made the necessary lint updates (#25196)

โ™ป๏ธ Refactoring

  • Accept multiple files in pipe_with_schema (#25388)
  • Add IR for scan_lines (#25066)
  • Add ElementExpr for _eval expressions (#25199)
  • Add asserts and tests for list.eval on multiple chunks with slicing (#25559)
  • Add functions for scan_lines (#25136)
  • Add oneshot channel to polars-stream (#25378)
  • Add stateful EwmCov kernel (#25065)
  • Change group length mismatch error to ShapeError (#25004)
  • Clean up CSPE callsite (#25215)
  • Directly take CloudScheme in parse_cloud_options (#25304)
  • Disable recursive CSPE for now (#25085)
  • Dispatch Series.set to zip_with_same_dtype (#25327)
  • Fix unsoundness in ChunkedArray::{first, last} (#25449)
  • Make pipe_with_schema work on Arced schema (#25155)
  • Move EwmMeanState to polars-compute (#25034)
  • Move asof tolerance type coercion to IR conversion (#25033)
  • Move ewm variance code to polars-compute (#25188)
  • Move supertype determination and casting to IR for date_range and related functions (#24084)
  • Refactor dt_range functions (#25225)
  • Refactor sink IR (#25308)
  • Remove ClosableFile (#25330)
  • Remove PyPartitioning (#25303)
  • Remove aggregation context Context (#25424)
  • Remove incorrect cast in reduce code (#25321)
  • Remove lower_ir conversion from Scan to InMemorySource (#25150)
  • Remove old join projection pushdown logic (#25088)
  • Remove some dead argminmax impl code (#25501)
  • Remove unused optimization_toggle (#25130)
  • Remove unused row-count (#25080)
  • Remove verbose prints on file opens (#25523)
  • Rename URL_ENCODE_CHARSET to HIVE_ENCODE_CHARSET (#25554)
  • Simplify sink parameter passing from Python (#25302)
  • Support for named/anonymous aggregations (#25118)
  • Take &dyn Any instead of Box<dyn Any> in python object converters (#25421)
  • Take sync parameter in Writeable::close (#25475)
  • Update partitioned sink IR (#25524)
  • Use dedicated runtime packages from template (#25284)

Thank you to all our contributors for making this release possible!
@AndreaBozzo, @DannyStoll1, @EndPositive, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @MarcoGorelli, @TNieuwdorp, @alexander-beedie, @borchero, @c-peters, @cBournhonesque, @carnarez, @cmdlineluser, @coastalwhite, @cr7pt0gr4ph7, @davanstrien, @dsprenkels, @etiennebacher, @feliblo, @itamarst, @jannickj, @jetuk, @kdn36, @lun3x, @marinegor, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @vyasr, @wtn and more!

Don't miss a new polars release

NewReleases is sending notifications on new releases.