pola-rs/polars py-1.36.0-beta.2 on GitHub

🏆 Highlights

Add Extension types (#25322)

✨ Enhancements

Add SQL support for ROW_NUMBER, RANK, and DENSE_RANK functions (#25409)
Add SQL support for named WINDOW references (#25400)
Add BIT_NOT support to the SQL interface (#25094)
Add LazyFrame.pivot (#25016)
Add allow_empty flag to item (#25048)
Add empty_as_null and keep_nulls flags to Expr.explode (#25289)
Add empty_as_null and keep_nulls to {Lazy,Data}Frame.explode (#25369)
Add having to group_by context (#23550)
Add ignore_nulls to first / last (#25105)
Add maintain_order to Expr.mode (#25377)
Add quantile for missing temporals (#25464)
Add leftmost option to str.replace_many / str.find_many / str.extract_many (#25398)
Add strict parameter to pl.concat(how='horizontal') (#25452)
Add support for Float16 dtype (#25185)
Add unstable Schema.to_arrow (#25149)
Allow Expr.rolling in aggregation contexts (#25258)
Allow Expr.unique on List/Array with non-numeric types (#25285)
Allow glimpse to return a DataFrame (#24803)
Allow hash for all List dtypes (#25372)
Allow implode and aggregation in aggregation context (#25357)
Allow slice on scalar in aggregation context (#25358)
Allow arbitrary Expressions in "subset" parameter of unique frame method (#25099)
Allow arbitrary expressions as the Expr.rolling index_column (#25117)
Allow bare .row on a single-row DataFrame, equivalent to .item on a single-element DataFrame (#25229)
Allow elementwise Expr.over in aggregation context (#25402)
Allow pl.Object in pivot value (#25533)
Automatically Parquet dictionary encode floats (#25387)
Display function of streaming physical plan map node (#25368)
Documentation on Polars Cloud manifests (#25295)
Expose and document pl.Categories (#25443)
Expose fields for generating physical plan visualization data (#25562)
Extend SQL UNNEST support to handle multiple array expressions (#25418)
Improve SQL UNNEST behaviour (#22546)
Improve error message on unsupported SQL subquery comparisons (#25135)
Make DSL-hash skippable (#25140)
Minor improvement for as_struct repr (#25529)
Move GraphMetrics into StreamingQuery (#25310)
Raise suitable error on non-integer "n" value for clear (#25266)
Rewrite IR::Scan to IR::DataFrameScan in expand_datasets when applicable (#25106)
Set polars/ user-agent (#25112)
Streaming {Expr,LazyFrame}.rolling (#25058)
Support BYTE_ARRAY backed Decimals in Parquet (#25076)
Support ewm_var/std in streaming engine (#25109)
Support unique_counts for all datatypes (#25379)
Support additional forms of SQL "CREATE TABLE" statements (#25191)
Support arbitrary expressions in SQL JOIN constraints (#25132)
Support column-positional SQL "UNION" operations (#25183)
Support decimals in search_sorted (#25450)
Temporal quantile in rolling context (#25479)
Use reference to Graph pipes when flushing metrics (#25442)

🚀 Performance improvements

Add parquet prefiltering for string regexes (#25381)
Add streaming native LazyFrame.group_by_dynamic (#25342)
Add streaming sorted Group-By (#25013)
Allow detecting plan sortedness in more cases (#25408)
Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
Enable predicate expressions on unsigned integers (#25416)
Fast find start window in group_by_dynamic with large offset (#25376)
Faster kernels for rle_lengths (#25448)
Fuse positive slice into streaming LazyFrame.rolling (#25338)
Lazy gather for {forward,backward}_fill in group-by contexts (#25115)
Mark Expr.reshape((-1,)) as row separable (#25326)
Mark output of more non-order-maintaining ops as unordered (#25419)
Optimize ipc stream read performance (#24671)
Reduce HuggingFace API calls (#25521)
Return references from aexpr_to_leaf_names_iter (#25319)
Skip filtering scan IR if no paths were filtered (#25037)
Use bitmap instead of Vec in first/last w. skip_nulls (#25318)
Use fast path for agg_min/agg_max when nulls present (#25374)
Use strong hash instead of traversal for CSPE equality (#25537)

🐞 Bug fixes

Add .rolling_rank support for temporal types and pl.Boolean (#25509)
Address issues with SQL OVER clause behaviour for window functions (#25249)
Aggregation with drop_nulls on literal (#25356)
Allow Null dtype values in scatter (#25245)
Allow broadcast in group_by for ApplyExpr and BinaryExpr (#25053)
Allow empty list in sort_by in list.eval context (#25481)
Allow for negative time in group_by_dynamic iterator (#25041)
Always respect return_dtype in map_elements and map_rows (#25504)
AnyValue::to_physical for categoricals (#25341)
Apply CSV dict overrides by name only (#25436)
Block predicate pushdown when group_by key values are changed (#25032)
Bugs in pl.from_repr with signed exponential floats and line wrapping (#25331)
Correct drop_items for scalar input (#25351)
Correct eq_missing for struct with nulls (#25363)
Correct {first,last}_non_null if there are empty chunks (#25279)
Correct handle requested stops in streaming shift (#25239)
Correctly prune projected columns in hints (#25250)
DSL_SCHEMA_HASH should not changed by line endings (#25123)
Don't push down predicates passed inserted cache nodes (#25042)
Don't quietly allow unsupported SQL SELECT clauses (#25282)
Don't trigger DeprecationWarning from SQL "IN" constraints that use subqueries (#25111)
Enhanced column resolution/tracking through multi-way SQL joins (#25181)
Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
Ensure out-of-range integers and other edge case values don't give wrong results for index_of (#24369)
Fix CSV select(len) off by 1 with comment prefix (#25069)
Fix arr.{eval,agg} in aggregation context (#25390)
Fix format_str in case of multiple chunks (#25162)
Fix groups update on slices with different offsets (#25097)
Fix assertion panic on group_by (#25179)
Fix building polars-expr without timezones feature (#25254)
Fix building polars-mem-engine with the async feature (#25300)
Fix building polars-plan with features lazy,concat_str (but no strings) (#25306)
Fix dictionary replacement error in write_ipc (#25497)
Fix expr slice pushdown causing shape error on literals (#25485)
Fix field metadata for nested categorical PyCapsule export (#25052)
Fix group lengths check in sort_by with AggregatedScalar (#25503)
Fix handling Null dtype in ApplyExpr on group_by (#25077)
Fix incorrect .list.eval after slicing operations (#25540)
Fix incorrect reshape on sliced lists (#25139)
Fix length preserving check for eval expressions in streaming engine (#25294)
Fix occurence of exact matches of .join_asof(strategy="nearest", allow_exact_matches=False, ...) (#25506)
Fix off-by-one bug in ColumnPredicates generation for inequalities operating on integer columns (#25412)
Fix panic if scan predicate produces 0 length mask (#25089)
Fix panic in dt.truncate for invalid duration strings (#25124)
Fix panic in is_between support in streaming Parquet predicate push down (#25476)
Fix panic when using struct field as join key (#25059)
Fix serialization of lazyframes containing huge tables (#25190)
Fix single-column CSV header duplication with leading empty lines (#25186)
Fix small bug with PyExpr to PyObject conversion (#25265)
Group-By aggregation problems caused by AmortSeries (#25043)
Handle some unusual pl.col.<colname> edge-cases (#25153)
Incorrect result in aggregated first/last with ignore_nulls (#25414)
Incorrect results for aggregated {n_,}unique on bools (#25275)
Invert drop_nans filtering in group-by context (#25146)
Make str.json_decode output deterministic with lists (#25240)
Mark {forward,backward}_fill as length_preserving (#25352)
Minor improvement to internal is_pycapsule utility function (#25073)
Nested dtypes in streaming first_non_null/last_non_null (#25375)
Nested dtypes in streaming first/last (#25298)
Panic exception when calling Expr.rolling in .over (#25283)
Panic in group_by_dynamic with group_by and multiple chunks (#25075)
Parquet is_in for mixed validity pages (#25313)
Prevent panic when joining sorted LazyFrame with itself (#25453)
Raise error for all/any on list instead of panic (#25018)
Raise error on out-of-range dates in temporal operations (#25471)
Remove Expr casts in pl.lit invocations (#25373)
Resolve edge-case with SQL aggregates that have the same name as one of the "GROUP BY" keys (#25362)
Return the correct string-case Expr reprs (#25101)
Reverse on chunked struct (#25281)
Revert pl.format behavior with nulls (#25370)
Rolling mean/median for temporals (#25512)
Run async DB queries with regular asyncio if not inside a running loop (#25268)
SQL "NATURAL" joins should coalesce the key columns (#25353)
Schema mismatch with list.agg, unique and scalar (#25348)
Solve multiple issues relating to arena mutation in SQL subqueries (#25110)
Strict conversion AnyValue to Struct (#25536)
Support "index" as column name in group_by iterator (#25138)
Support AggregatedList in list.{eval,agg} context (#25385)
The SQL interface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091)
Unique key names in streaming sort/top_k (#25082)
Unique on literal in aggregation context (#25359)
Use (i64, u64) for VisualizationData (offset, length) slices (#25203)
Use Cargo.template.toml to prevent git dependencies from using template (#25392)
Validate list.slice parameters are not lists (#25458)
Wide-table join performance regression (#25222)

📖 Documentation

Add Extension and BaseExtension to doc index (#25444)
Add LazyFrame.pivot to reference guide (#25482)
Add having API references (#25428)
Add docstring example showing str.slice taking Expression params (#25461)
Add polars-on-premise documentation (#25431)
Clarify bitwise behaviour of and_, or_, and not_ Expressions on integer columns (#25092)
Correct link to datetime_range instead of date_range in resampling page (#25532)
Deprecate Categorical functions for lexical ordering and local checks (#25514)
Document schema parameter in meta methods (#25543)
Explain aggregation & sorting of lists (#25260)
Fix LanceDB URL (#25198)
Fix incorrect 'bitwise' in any_horizontal/all_horizontal docstring (#25469)
Fix link errors reported by markdown-link-check (#25314)
Fix non-existent replace_all reference in replace docs (#25161)
Fix source path (#25170)
Fix typo in public dataset URL (#25044)
Mention Narwhals in ecosystem page (#25100)
Remove lzo from parquet write options (#25522)
Update LazyFrame.collect_schema docstring (#25508)
Update LazyFrame.remote signature (#25175)
Update on-premise documentation (#25489)
Update user guide for QueryProgress rename to QueryProfile (#25195)

🧪 Tests

Add assert_sql_matches coverage for SQL "DISTINCT" and "DISTINCT ON" syntax (#25440)
Add reliable test for pl.format on multiple chunks (#25164)
Add test for unique with column subset (#25241)
Better coverage for group_by aggregations (#25290)
Test for group_by(...).having(...) (#25430)

🔧 CI

Automatically label pull requests that change the DSL (#25177)
Avoid relabelling changes-dsl on every commit (#25216)
Print expected DSL schema hashes if mismatched (#25526)
Skip existing files in pypi upload (#25576)

🏗️ Build system

Fix make fmt and make lint commands (#25200)
Make building the docs on macOS more reliable (#25095)

🛠️ Other improvements

Add Final type-qualifier to module-level constants (#25556)
Add proptest AnyValue strategies (#25510)
Add proptest DataFrame strategy (#25446)
Add proptest strategies for Series logical types (#24849)
Add proptest strategies for Series nested types (#25220)
Add some cleanup (#25445)
Add toolchain file to runtimes for sdist (#25311)
Enable more streaming tests (#25364)
Fix --uv argument for benchmark-remote (#25513)
Fix Decimal precision annotation (#25227)
Fix feature gating TZ_AWARE_RE again (#25493)
Fix template path in release-python workflow (#25565)
Fix typo in CI release workflow (#25309)
Make python docs build again (#25165)
Remove Column::Partitioned (#25324)
Remove debug file write from test suite (#25393)
Remove unused import (#25365)
Run maturin with --uv option (#25490)
Silence unused mut warning (#25093)
Skip rust integration tests for coverage in CI (#25558)
Update markdown link checker (#25201)
Update toolchain (#25007)
Update versions (#25141)
Upgrade to schemars 0.9.0 (#25158)
Upgraded ruff and typos and made the necessary lint updates (#25196)

♻️ Refactoring

Accept multiple files in pipe_with_schema (#25388)
Add IR for scan_lines (#25066)
Add ElementExpr for _eval expressions (#25199)
Add asserts and tests for list.eval on multiple chunks with slicing (#25559)
Add functions for scan_lines (#25136)
Add oneshot channel to polars-stream (#25378)
Add stateful EwmCov kernel (#25065)
Change group length mismatch error to ShapeError (#25004)
Clean up CSPE callsite (#25215)
Directly take CloudScheme in parse_cloud_options (#25304)
Disable recursive CSPE for now (#25085)
Dispatch Series.set to zip_with_same_dtype (#25327)
Fix unsoundness in ChunkedArray::{first, last} (#25449)
Make pipe_with_schema work on Arced schema (#25155)
Move EwmMeanState to polars-compute (#25034)
Move asof tolerance type coercion to IR conversion (#25033)
Move ewm variance code to polars-compute (#25188)
Move supertype determination and casting to IR for date_range and related functions (#24084)
Refactor dt_range functions (#25225)
Refactor sink IR (#25308)
Remove ClosableFile (#25330)
Remove PyPartitioning (#25303)
Remove aggregation context Context (#25424)
Remove incorrect cast in reduce code (#25321)
Remove lower_ir conversion from Scan to InMemorySource (#25150)
Remove old join projection pushdown logic (#25088)
Remove some dead argminmax impl code (#25501)
Remove unused optimization_toggle (#25130)
Remove unused row-count (#25080)
Remove verbose prints on file opens (#25523)
Rename URL_ENCODE_CHARSET to HIVE_ENCODE_CHARSET (#25554)
Simplify sink parameter passing from Python (#25302)
Support for named/anonymous aggregations (#25118)
Take &dyn Any instead of Box<dyn Any> in python object converters (#25421)
Take sync parameter in Writeable::close (#25475)
Update partitioned sink IR (#25524)
Use dedicated runtime packages from template (#25284)

Thank you to all our contributors for making this release possible!
@AndreaBozzo, @DannyStoll1, @EndPositive, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @MarcoGorelli, @TNieuwdorp, @alexander-beedie, @borchero, @c-peters, @cBournhonesque, @carnarez, @cmdlineluser, @coastalwhite, @cr7pt0gr4ph7, @davanstrien, @dsprenkels, @etiennebacher, @feliblo, @itamarst, @jannickj, @jetuk, @kdn36, @lun3x, @marinegor, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @vyasr, @wtn and more!

pola-rs/polars py-1.36.0-beta.2 Python Polars 1.36.0-beta.2 on GitHub

🏆 Highlights

✨ Enhancements

🚀 Performance improvements

🐞 Bug fixes

📖 Documentation

🧪 Tests

🔧 CI

🏗️ Build system

🛠️ Other improvements

♻️ Refactoring

pola-rs/polars py-1.36.0-beta.2
Python Polars 1.36.0-beta.2

on GitHub