๐ Highlights
- Add Extension types (#25322)
โจ Enhancements
- Add SQL support for the QUALIFY clause (#25652)
- Add bin.slice(), bin.head(), and bin.tail() methods (#25647)
- Add SQL syntax support for CROSS JOIN UNNEST(col) (#25623)
- Add separate env var to log tracked metrics (#25586)
- Expose fields for generating physical plan visualization data (#25562)
- Allow pl.Object in pivot value (#25533)
- Minor improvement for as_struct repr (#25529)
- Temporal quantile in rolling context (#25479)
- Add quantile for missing temporals (#25464)
- Add strict parameter to pl.concat(how='horizontal') (#25452)
- Support decimals in search_sorted (#25450)
- Expose and document pl.Categories (#25443)
- Use reference to Graph pipes when flushing metrics (#25442)
- Extend SQL UNNEST support to handle multiple array expressions (#25418)
- Add SQL support for ROW_NUMBER, RANK, and DENSE_RANK functions (#25409)
- Allow elementwise Expr.over in aggregation context (#25402)
- Add SQL support for named WINDOW references (#25400)
- Add leftmost option to str.replace_many / str.find_many / str.extract_many (#25398)
- Automatically Parquet dictionary encode floats (#25387)
- Support unique_counts for all datatypes (#25379)
- Add maintain_order to Expr.mode (#25377)
- Allow hash for all List dtypes (#25372)
- Add empty_as_null and keep_nulls to {Lazy,Data}Frame.explode (#25369)
- Display function of streaming physical plan map node (#25368)
- Allow slice on scalar in aggregation context (#25358)
- Allow implode and aggregation in aggregation context (#25357)
- Move GraphMetrics into StreamingQuery (#25310)
- Documentation on Polars Cloud manifests (#25295)
- Add empty_as_null and keep_nulls flags to Expr.explode (#25289)
- Allow Expr.unique on List/Array with non-numeric types (#25285)
- Raise suitable error on non-integer "n" value for clear (#25266)
- Allow Expr.rolling in aggregation contexts (#25258)
- Allow bare .row() on a single-row DataFrame, equivalent to .item() on a single-element DataFrame (#25229)
- Support additional forms of SQL CREATE TABLE statements (#25191)
- Add support for Float16 dtype (#25185)
- Support column-positional SQL "UNION" operations (#25183)
- Add unstable
Schema.to_arrow()(#25149) - Make DSL-hash skippable (#25140)
- Improve error message on unsupported SQL subquery comparisons (#25135)
- Support arbitrary expressions in SQL
JOINconstraints (#25132) - Allow arbitrary expressions as the
Expr.rollingindex_column(#25117) - Set polars/ user-agent (#25112)
- Support
ewm_var/stdin streaming engine (#25109) - Rewrite
IR::ScantoIR::DataFrameScaninexpand_datasetswhen applicable (#25106) - Add ignore_nulls to first / last (#25105)
- Allow arbitrary Expressions in "subset" parameter of
uniqueframe method (#25099) - Add
BIT_NOTsupport to the SQL interface (#25094) - Streaming
{Expr,LazyFrame}.rolling(#25058) - Add LazyFrame.pivot (#25016)
- Add SQL support for
LEADandLAGfunctions (#23956) - Add having to group_by context (#23550)
- Add show methods for DataFrame and LazyFrame (#19634)
๐ Performance improvements
- Set parallelization threshold in
take_unchecked_impl(#25672) - New single file IO sink pipeline enabled for sink_parquet (#25670)
- Correct overly eager local predicate insertion for unpivot (#25644)
- New partitioned IO sink pipeline enabled for sink_parquet (#25629)
- Use strong hash instead of traversal for CSPE equality (#25537)
- Reduce HuggingFace API calls (#25521)
- Fix panic in is_between support in streaming Parquet predicate push down (#25476)
- Faster kernels for rle_lengths (#25448)
- Mark output of more non-order-maintaining ops as unordered (#25419)
- Enable predicate expressions on unsigned integers (#25416)
- Allow detecting plan sortedness in more cases (#25408)
- Add parquet prefiltering for string regexes (#25381)
- Fast find start window in group_by_dynamic with large offset (#25376)
- Use fast path for agg_min/agg_max when nulls present (#25374)
- Add streaming native LazyFrame.group_by_dynamic (#25342)
- Fuse positive slice into streaming LazyFrame.rolling (#25338)
- Mark Expr.reshape((-1,)) as row separable (#25326)
- Return references from aexpr_to_leaf_names_iter (#25319)
- Use bitmap instead of Vec in first/last w. skip_nulls (#25318)
- Lazy gather for
{forward,backward}_fillin group-by contexts (#25115) - Add streaming sorted Group-By (#25013)
๐ Bug fixes
- Rechunk on nested dtypes in take_unchecked_impl parallel path (#25662)
- Fix streaming SchemaMismatch panic on list.drop_nulls (#25661)
- Fix panic on Boolean rolling_sum calculation for list or array eval (#25660)
- Fix "dtype is unknown" panic in cross joins with literals (#25658)
- Fix panic edge-case when scanning hive partitioned data (#25656)
- Fix "unreachable code" panic in UDF dtype inference (#25655)
- Address potential "batch_size" parameter collision in scan_pyarrow_dataset (#25654)
- Fix empty format handling (#25638)
- Improve SQL GROUP BY and ORDER BY expression resolution, handling aliasing edge-cases (#25637)
- Preserve List inner dtype during chunked take operations (#25634)
- Fix lifetime for AmortSeries lazy group iterator (#25620)
- Fix spearman panicking on nulls (#25619)
- Properly resolve HAVING clause during SQL GROUP BY operations (#25615)
- Prevent false positives in is_in for large integers (#25608)
- Differentiate between empty list an no list for unpivot (#25597)
- Bug in boolean unique_counts (#25587)
- Hang in multi-chunk DataFrame .rows() (#25582)
- Correct arr_to_any_value for object arrays (#25581)
- Have PySeries::new_f16 receive pf16s instead of f32s (#25579)
- Set Float16 parquet schema type to Float16 (#25578)
- Fix incorrect .list.eval after slicing operations (#25540)
- Strict conversion AnyValue to Struct (#25536)
- Rolling mean/median for temporals (#25512)
- Add .rolling_rank() support for temporal types and pl.Boolean (#25509)
- Fix occurence of exact matches of .join_asof(strategy="nearest", allow_exact_matches=False, ...) (#25506)
- Always respect return_dtype in map_elements and map_rows (#25504)
- Fix group lengths check in sort_by with AggregatedScalar (#25503)
- Fix dictionary replacement error in write_ipc() (#25497)
- Fix expr slice pushdown causing shape error on literals (#25485)
- Allow empty list in sort_by in list.eval context (#25481)
- Raise error on out-of-range dates in temporal operations (#25471)
- Validate list.slice parameters are not lists (#25458)
- Make sum on strings error in group_by context (#25456)
- Prevent panic when joining sorted LazyFrame with itself (#25453)
- Apply CSV dict overrides by name only (#25436)
- Incorrect result in aggregated first/last with ignore_nulls (#25414)
- Fix off-by-one bug in
ColumnPredicatesgeneration for inequalities operating on integer columns (#25412) - Use Cargo.template.toml to prevent git dependencies from using template (#25392)
- Fix arr.{eval,agg} in aggregation context (#25390)
- Support AggregatedList in list.{eval,agg} context (#25385)
- Nested dtypes in streaming first_non_null/last_non_null (#25375)
- Remove Expr casts in pl.lit invocations (#25373)
- Optimize projection pushdown through HConcat (#25371)
- Revert pl.format behavior with nulls (#25370)
- Correct eq_missing for struct with nulls (#25363)
- Resolve edge-case with SQL aggregates that have the same name as one of the GROUP BY keys (#25362)
- Unique on literal in aggregation context (#25359)
- Aggregation with drop_nulls on literal (#25356)
- SQL NATURAL joins should coalesce the key columns (#25353)
- Mark {forward,backward}_fill as length_preserving (#25352)
- Correct drop_items for scalar input (#25351)
- Schema mismatch with list.agg, unique and scalar (#25348)
- AnyValue::to_physical for categoricals (#25341)
- Bugs in pl.from_repr with signed exponential floats and line wrapping (#25331)
- Remove ClosableFile (#25330)
- Increase precision when constructing
floatSeries (#25323) - Fix link errors reported by markdown-link-check (#25314)
- Parquet is_in for mixed validity pages (#25313)
- Fix building polars-plan with features lazy,concat_str (but no strings) (#25306)
- Fix building polars-mem-engine with the async feature (#25300)
- Nested dtypes in streaming first/last (#25298)
- Fix length preserving check for eval expressions in streaming engine (#25294)
- Panic exception when calling Expr.rolling in .over (#25283)
- Don't quietly allow unsupported SQL SELECT clauses (#25282)
- Reverse on chunked struct (#25281)
- Correct {first,last}_non_null if there are empty chunks (#25279)
- Incorrect results for aggregated {n_,}unique on bools (#25275)
- Run async DB queries with regular asyncio if not inside a running loop (#25268)
- Fix small bug with
PyExprtoPyObjectconversion (#25265) - Fix building polars-expr without timezones feature (#25254)
- Correctly prune projected columns in hints (#25250)
- Address multiple issues with SQL OVER clause behaviour for window functions (#25249)
- Allow Null dtype values in scatter (#25245)
- Make str.json_decode output deterministic with lists (#25240)
- Correct handle requested stops in streaming shift (#25239)
- Use (i64, u64) for VisualizationData (offset, length) slices (#25203)
- Fix serialization of lazyframes containing huge tables (#25190)
- Fix single-column CSV header duplication with leading empty lines (#25186)
- Enhanced column resolution/tracking through multi-way SQL joins (#25181)
- Fix
format_strin case of multiple chunks (#25162) - Handle some unusual
pl.col.<colname>edge-cases (#25153) - Fix incorrect reshape on sliced lists (#25139)
- Support "index" as column name in
group_byiterator (#25138) - Fix panic in
dt.truncatefor invalid duration strings (#25124) - DSL_SCHEMA_HASH should not changed by line endings (#25123)
- Don't trigger
DeprecationWarningfrom SQL "IN" constraints that use subqueries (#25111) - Solve multiple issues relating to arena mutation in SQL subqueries (#25110)
- Return the correct string-case
Exprreprs (#25101) - Fix
groupsupdate on slices with different offsets (#25097) - Unique key names in streaming sort/top_k (#25082)
- Fix CSV
select(len())off by 1 with comment prefix (#25069) - Raise error for all/any on list instead of panic (#25018)
- Ensure out-of-range integers and other edge case values don't give wrong results for index_of() (#24369)
- Improve SQL UNNEST behaviour (#22546)
๐ Documentation
- Document schema parameter in meta methods (#25543)
- Correct link to datetime_range instead of date_range in resampling page (#25532)
- Remove lzo from parquet write options (#25522)
- Deprecate Categorical functions for lexical ordering and local checks (#25514)
- Update LazyFrame.collect_schema() docstring (#25508)
- Update on-premise documentation (#25489)
- Add LazyFrame.pivot to reference guide (#25482)
- Fix incorrect 'bitwise' in any_horizontal/all_horizontal docstring (#25469)
- Add docstring example showing str.slice taking Expression params (#25461)
- Add Extension and BaseExtension to doc index (#25444)
- Add polars-on-premise documentation (#25431)
- Add having API references (#25428)
- Explain aggregation & sorting of lists (#25260)
- Fix LanceDB URL (#25198)
- Update user guide for QueryProgress rename to QueryProfile (#25195)
- Update
LazyFrame.remotesignature (#25175) - Fix source path (#25170)
- Fix non-existent
replace_allreference inreplacedocs (#25161) - Mention Narwhals in ecosystem page (#25100)
- Clarify bitwise behaviour of
and_,or_, andnot_Expressions on integer columns (#25092)
๐งช Tests
- Add reliable test for
pl.formaton multiple chunks (#25164)
๐ง CI
- Avoid relabelling changes-dsl on every commit (#25216)
- Automatically label pull requests that change the DSL (#25177)
๐๏ธ Build system
๐ ๏ธ Other improvements
- Add "panic" and "streaming" tagging to issue-labeler workflow (#25657)
- Use dtype for group_aware evaluation on ApplyExpr (#25639)
- Ensure literal-only SELECT broadcast conforms to SQL semantics (#25633)
- Ensure we hash all attributes and visit all children in traverse_and_hash_aexpr (#25627)
- Rename polars-on-premise to polars-on-premises (#25617)
- Constrain new issue-labeler workflow to the Issue title (#25614)
- Avoid rechunk requirement for Series.iter() (#25603)
- Help categorise Issues by automatically applying labels (using the same patterns used for labelling PRs) (#25599)
- Add disk-cleaning step for Ubuntu runners (#25593)
- Show on streaming engine (#25589)
- Ignore a couple of unexplained typing errors (#25580)
- Skip existing files in pypi upload (#25576)
- Fix template path in release-python workflow (#25565)
- Add asserts and tests for list.eval on multiple chunks with slicing (#25559)
- Skip rust integration tests for coverage in CI (#25558)
- Add Final type-qualifier to module-level constants (#25556)
- Print expected DSL schema hashes if mismatched (#25526)
- Update partitioned sink IR (#25524)
- Fix --uv argument for benchmark-remote (#25513)
- Add
proptestAnyValuestrategies (#25510) - Fix rolling kernel dispatch with monotonic group attribute (#25494)
- Fix feature gating TZ_AWARE_RE again (#25493)
- Run maturin with --uv option (#25490)
- Add proptest DataFrame strategy (#25446)
- Add some cleanup (#25445)
- Add assert_sql_matches coverage for SQL DISTINCT and DISTINCT ON syntax (#25440)
- Test for group_by(...).having(...) (#25430)
- Remove aggregation context Context (#25424)
- Remove debug file write from test suite (#25393)
- Remove unused import (#25365)
- Enable more streaming tests (#25364)
- Dispatch Series.set to zip_with_same_dtype (#25327)
- Remove Column::Partitioned (#25324)
- Add toolchain file to runtimes for sdist (#25311)
- Fix typo in CI release workflow (#25309)
- Refactor sink IR (#25308)
- Directly take CloudScheme in parse_cloud_options() (#25304)
- Remove PyPartitioning (#25303)
- Simplify sink parameter passing from Python (#25302)
- Better coverage for group_by aggregations (#25290)
- Use dedicated runtime packages from template (#25284)
- Add test for unique with column subset (#25241)
- Fix Decimal precision annotation (#25227)
- Refactor dt_range functions (#25225)
- Add
propteststrategies for Series nested types (#25220) - Update markdown link checker (#25201)
- Add ElementExpr for _eval expressions (#25199)
- Upgraded
ruffandtyposand made the necessary lint updates (#25196) - Make python docs build again (#25165)
- Upgrade to schemars 0.9.0 (#25158)
- Update versions (#25141)
- Silence unused mut warning (#25093)
- Add
propteststrategies for Series logical types (#24849)
โป๏ธ Refactoring
- Make polars-plan constants more consistent (#25645)
- Add support for multi-column reductions (#25640)
- Simplify _write_any_value (#25622)
- Add parquet file write pipeline for new IO sinks (#25618)
- Add streaming IO sink components (#25594)
- Add
arg_sort()andWriteable::as_buffered()(#25583) - Take task priority argument in
parallelize_first_to_local(#25563) - Rename
URL_ENCODE_CHARSETtoHIVE_ENCODE_CHARSET(#25554) - Remove verbose prints on file opens (#25523)
- Remove some dead argminmax impl code (#25501)
- Take
syncparameter inWriteable::close()(#25475) - Fix unsoundness in ChunkedArray::{first, last} (#25449)
- Take
&dyn Anyinstead ofBox<dyn Any>in python object converters (#25421) - Accept multiple files in
pipe_with_schema(#25388) - Add oneshot channel to polars-stream (#25378)
- Remove incorrect cast in reduce code (#25321)
- Clean up CSPE callsite (#25215)
- Move ewm variance code to polars-compute (#25188)
- Make
pipe_with_schemawork on Arced schema (#25155) - Remove lower_ir conversion from Scan to InMemorySource (#25150)
- Add functions for
scan_lines(#25136) - Remove unused
optimization_toggle(#25130) - Support for named/anonymous aggregations (#25118)
- Remove old join projection pushdown logic (#25088)
- Remove unused row-count (#25080)
- Add IR for
scan_lines(#25066) - Add stateful
EwmCovkernel (#25065) - Move
EwmMeanStatetopolars-compute(#25034) - Move asof
tolerancetype coercion to IR conversion (#25033) - Move supertype determination and casting to IR for
date_rangeand related functions (#24084)
Thank you to all our contributors for making this release possible!
@AndreaBozzo, @DannyStoll1, @EndPositive, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @MarcoGorelli, @TNieuwdorp, @Voultapher, @alexander-beedie, @borchero, @c-peters, @cBournhonesque, @camriddell, @carnarez, @cmdlineluser, @coastalwhite, @cr7pt0gr4ph7, @davanstrien, @davidia, @dsprenkels, @etiennebacher, @feliblo, @guilhem-dvr, @itamarst, @jannickj, @jetuk, @kdn36, @lun3x, @marinegor, @mcrumiller, @nameexhaustion, @orlp, @pomo-mondreganto, @ritchie46, @vyasr, @wtn, and more!