๐ Highlights
- Add Extension types (#25322)
โจ Enhancements
- Add SQL support for
ROW_NUMBER,RANK, andDENSE_RANKfunctions (#25409) - Add SQL support for named
WINDOWreferences (#25400) - Add
BIT_NOTsupport to the SQL interface (#25094) - Add
LazyFrame.pivot(#25016) - Add
allow_emptyflag toitem(#25048) - Add
empty_as_nullandkeep_nullsflags toExpr.explode(#25289) - Add
empty_as_nullandkeep_nullsto{Lazy,Data}Frame.explode(#25369) - Add
havingtogroup_bycontext (#23550) - Add
ignore_nullstofirst/last(#25105) - Add
maintain_ordertoExpr.mode(#25377) - Add
quantilefor missing temporals (#25464) - Add leftmost option to
str.replace_many / str.find_many / str.extract_many(#25398) - Add strict parameter to pl.concat(how='horizontal') (#25452)
- Add support for
Float16dtype (#25185) - Add unstable
Schema.to_arrow(#25149) - Allow
Expr.rollingin aggregation contexts (#25258) - Allow
Expr.uniqueonList/Arraywith non-numeric types (#25285) - Allow
glimpseto return aDataFrame(#24803) - Allow
hashfor allListdtypes (#25372) - Allow
implodeand aggregation in aggregation context (#25357) - Allow
sliceon scalar in aggregation context (#25358) - Allow arbitrary Expressions in "subset" parameter of
uniqueframe method (#25099) - Allow arbitrary expressions as the
Expr.rollingindex_column(#25117) - Allow bare
.rowon a single-row DataFrame, equivalent to.itemon a single-element DataFrame (#25229) - Allow elementwise
Expr.overin aggregation context (#25402) - Allow pl.Object in pivot value (#25533)
- Automatically Parquet dictionary encode floats (#25387)
- Display function of streaming physical plan
mapnode (#25368) - Documentation on Polars Cloud manifests (#25295)
- Expose and document pl.Categories (#25443)
- Expose fields for generating physical plan visualization data (#25562)
- Extend SQL
UNNESTsupport to handle multiple array expressions (#25418) - Improve SQL
UNNESTbehaviour (#22546) - Improve error message on unsupported SQL subquery comparisons (#25135)
- Make DSL-hash skippable (#25140)
- Minor improvement for
as_structrepr (#25529) - Move GraphMetrics into StreamingQuery (#25310)
- Raise suitable error on non-integer "n" value for
clear(#25266) - Rewrite
IR::ScantoIR::DataFrameScaninexpand_datasetswhen applicable (#25106) - Set polars/ user-agent (#25112)
- Streaming
{Expr,LazyFrame}.rolling(#25058) - Support BYTE_ARRAY backed Decimals in Parquet (#25076)
- Support
ewm_var/stdin streaming engine (#25109) - Support
unique_countsfor all datatypes (#25379) - Support additional forms of SQL "CREATE TABLE" statements (#25191)
- Support arbitrary expressions in SQL
JOINconstraints (#25132) - Support column-positional SQL "UNION" operations (#25183)
- Support decimals in search_sorted (#25450)
- Temporal
quantilein rolling context (#25479) - Use reference to Graph pipes when flushing metrics (#25442)
๐ Performance improvements
- Add parquet prefiltering for string regexes (#25381)
- Add streaming native
LazyFrame.group_by_dynamic(#25342) - Add streaming sorted Group-By (#25013)
- Allow detecting plan sortedness in more cases (#25408)
- Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
- Enable predicate expressions on unsigned integers (#25416)
- Fast find start window in
group_by_dynamicwith largeoffset(#25376) - Faster kernels for rle_lengths (#25448)
- Fuse positive
sliceinto streamingLazyFrame.rolling(#25338) - Lazy gather for
{forward,backward}_fillin group-by contexts (#25115) - Mark
Expr.reshape((-1,))as row separable (#25326) - Mark output of more non-order-maintaining ops as unordered (#25419)
- Optimize ipc stream read performance (#24671)
- Reduce HuggingFace API calls (#25521)
- Return references from
aexpr_to_leaf_names_iter(#25319) - Skip filtering scan IR if no paths were filtered (#25037)
- Use bitmap instead of Vec in first/last w. skip_nulls (#25318)
- Use fast path for
agg_min/agg_maxwhen nulls present (#25374) - Use strong hash instead of traversal for CSPE equality (#25537)
๐ Bug fixes
- Add
.rolling_ranksupport for temporal types andpl.Boolean(#25509) - Address issues with SQL
OVERclause behaviour for window functions (#25249) - Aggregation with
drop_nullson literal (#25356) - Allow
Nulldtype values inscatter(#25245) - Allow broadcast in
group_byforApplyExprandBinaryExpr(#25053) - Allow empty list in
sort_byinlist.evalcontext (#25481) - Allow for negative time in
group_by_dynamiciterator (#25041) - Always respect return_dtype in map_elements and map_rows (#25504)
- AnyValue::to_physical for categoricals (#25341)
- Apply CSV dict overrides by name only (#25436)
- Block predicate pushdown when
group_bykey values are changed (#25032) - Bugs in pl.from_repr with signed exponential floats and line wrapping (#25331)
- Correct
drop_itemsfor scalar input (#25351) - Correct
eq_missingfor struct with nulls (#25363) - Correct
{first,last}_non_nullif there are empty chunks (#25279) - Correct handle requested stops in streaming shift (#25239)
- Correctly prune projected columns in hints (#25250)
- DSL_SCHEMA_HASH should not changed by line endings (#25123)
- Don't push down predicates passed inserted cache nodes (#25042)
- Don't quietly allow unsupported SQL
SELECTclauses (#25282) - Don't trigger
DeprecationWarningfrom SQL "IN" constraints that use subqueries (#25111) - Enhanced column resolution/tracking through multi-way SQL joins (#25181)
- Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
- Ensure out-of-range integers and other edge case values don't give wrong results for index_of (#24369)
- Fix CSV
select(len)off by 1 with comment prefix (#25069) - Fix
arr.{eval,agg}in aggregation context (#25390) - Fix
format_strin case of multiple chunks (#25162) - Fix
groupsupdate on slices with different offsets (#25097) - Fix assertion panic on
group_by(#25179) - Fix building polars-expr without timezones feature (#25254)
- Fix building polars-mem-engine with the async feature (#25300)
- Fix building polars-plan with features lazy,concat_str (but no strings) (#25306)
- Fix dictionary replacement error in
write_ipc(#25497) - Fix expr slice pushdown causing shape error on literals (#25485)
- Fix field metadata for nested categorical PyCapsule export (#25052)
- Fix group lengths check in
sort_bywithAggregatedScalar(#25503) - Fix handling
Nulldtype inApplyExprongroup_by(#25077) - Fix incorrect
.list.evalafter slicing operations (#25540) - Fix incorrect reshape on sliced lists (#25139)
- Fix length preserving check for
evalexpressions in streaming engine (#25294) - Fix occurence of exact matches of
.join_asof(strategy="nearest", allow_exact_matches=False, ...)(#25506) - Fix off-by-one bug in
ColumnPredicatesgeneration for inequalities operating on integer columns (#25412) - Fix panic if scan predicate produces 0 length mask (#25089)
- Fix panic in
dt.truncatefor invalid duration strings (#25124) - Fix panic in is_between support in streaming Parquet predicate push down (#25476)
- Fix panic when using struct field as join key (#25059)
- Fix serialization of lazyframes containing huge tables (#25190)
- Fix single-column CSV header duplication with leading empty lines (#25186)
- Fix small bug with
PyExprtoPyObjectconversion (#25265) - Group-By aggregation problems caused by
AmortSeries(#25043) - Handle some unusual
pl.col.<colname>edge-cases (#25153) - Incorrect result in aggregated
first/lastwithignore_nulls(#25414) - Incorrect results for aggregated
{n_,}uniqueon bools (#25275) - Invert
drop_nansfiltering in group-by context (#25146) - Make
str.json_decodeoutput deterministic with lists (#25240) - Mark
{forward,backward}_fillaslength_preserving(#25352) - Minor improvement to internal
is_pycapsuleutility function (#25073) - Nested dtypes in streaming
first_non_null/last_non_null(#25375) - Nested dtypes in streaming
first/last(#25298) - Panic exception when calling
Expr.rollingin.over(#25283) - Panic in
group_by_dynamicwithgroup_byand multiple chunks (#25075) - Parquet
is_infor mixed validity pages (#25313) - Prevent panic when joining sorted LazyFrame with itself (#25453)
- Raise error for all/any on list instead of panic (#25018)
- Raise error on out-of-range dates in temporal operations (#25471)
- Remove
Exprcasts inpl.litinvocations (#25373) - Resolve edge-case with SQL aggregates that have the same name as one of the "GROUP BY" keys (#25362)
- Return the correct string-case
Exprreprs (#25101) - Reverse on chunked
struct(#25281) - Revert
pl.formatbehavior with nulls (#25370) - Rolling
mean/medianfor temporals (#25512) - Run async DB queries with regular
asyncioif not inside a running loop (#25268) - SQL "NATURAL" joins should coalesce the key columns (#25353)
- Schema mismatch with
list.agg,uniqueand scalar (#25348) - Solve multiple issues relating to arena mutation in SQL subqueries (#25110)
- Strict conversion AnyValue to Struct (#25536)
- Support "index" as column name in
group_byiterator (#25138) - Support
AggregatedListinlist.{eval,agg}context (#25385) - The
SQLinterface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091) - Unique key names in streaming sort/top_k (#25082)
- Unique on literal in aggregation context (#25359)
- Use (i64, u64) for VisualizationData (offset, length) slices (#25203)
- Use Cargo.template.toml to prevent git dependencies from using template (#25392)
- Validate list.slice parameters are not lists (#25458)
- Wide-table join performance regression (#25222)
๐ Documentation
- Add Extension and BaseExtension to doc index (#25444)
- Add
LazyFrame.pivotto reference guide (#25482) - Add
havingAPI references (#25428) - Add docstring example showing
str.slicetaking Expression params (#25461) - Add polars-on-premise documentation (#25431)
- Clarify bitwise behaviour of
and_,or_, andnot_Expressions on integer columns (#25092) - Correct link to
datetime_rangeinstead ofdate_rangein resampling page (#25532) - Deprecate
Categoricalfunctions for lexical ordering and local checks (#25514) - Document schema parameter in meta methods (#25543)
- Explain aggregation & sorting of lists (#25260)
- Fix LanceDB URL (#25198)
- Fix incorrect 'bitwise' in
any_horizontal/all_horizontaldocstring (#25469) - Fix link errors reported by
markdown-link-check(#25314) - Fix non-existent
replace_allreference inreplacedocs (#25161) - Fix source path (#25170)
- Fix typo in public dataset URL (#25044)
- Mention Narwhals in ecosystem page (#25100)
- Remove lzo from parquet write options (#25522)
- Update
LazyFrame.collect_schemadocstring (#25508) - Update
LazyFrame.remotesignature (#25175) - Update on-premise documentation (#25489)
- Update user guide for QueryProgress rename to QueryProfile (#25195)
๐งช Tests
- Add
assert_sql_matchescoverage for SQL "DISTINCT" and "DISTINCT ON" syntax (#25440) - Add reliable test for
pl.formaton multiple chunks (#25164) - Add test for unique with column subset (#25241)
- Better coverage for
group_byaggregations (#25290) - Test for
group_by(...).having(...)(#25430)
๐ง CI
- Automatically label pull requests that change the DSL (#25177)
- Avoid relabelling changes-dsl on every commit (#25216)
- Print expected DSL schema hashes if mismatched (#25526)
- Skip existing files in pypi upload (#25576)
๐๏ธ Build system
๐ ๏ธ Other improvements
- Add
Finaltype-qualifier to module-level constants (#25556) - Add
proptestAnyValuestrategies (#25510) - Add
proptestDataFramestrategy (#25446) - Add
propteststrategies for Series logical types (#24849) - Add
propteststrategies for Series nested types (#25220) - Add some cleanup (#25445)
- Add toolchain file to runtimes for sdist (#25311)
- Enable more streaming tests (#25364)
- Fix --uv argument for benchmark-remote (#25513)
- Fix Decimal precision annotation (#25227)
- Fix feature gating TZ_AWARE_RE again (#25493)
- Fix template path in release-python workflow (#25565)
- Fix typo in CI release workflow (#25309)
- Make python docs build again (#25165)
- Remove
Column::Partitioned(#25324) - Remove debug file write from test suite (#25393)
- Remove unused import (#25365)
- Run
maturinwith--uvoption (#25490) - Silence unused mut warning (#25093)
- Skip rust integration tests for coverage in CI (#25558)
- Update markdown link checker (#25201)
- Update toolchain (#25007)
- Update versions (#25141)
- Upgrade to schemars 0.9.0 (#25158)
- Upgraded
ruffandtyposand made the necessary lint updates (#25196)
โป๏ธ Refactoring
- Accept multiple files in
pipe_with_schema(#25388) - Add IR for
scan_lines(#25066) - Add
ElementExprfor_evalexpressions (#25199) - Add asserts and tests for
list.evalon multiple chunks with slicing (#25559) - Add functions for
scan_lines(#25136) - Add oneshot channel to polars-stream (#25378)
- Add stateful
EwmCovkernel (#25065) - Change group length mismatch error to
ShapeError(#25004) - Clean up CSPE callsite (#25215)
- Directly take
CloudSchemeinparse_cloud_options(#25304) - Disable recursive CSPE for now (#25085)
- Dispatch
Series.settozip_with_same_dtype(#25327) - Fix unsoundness in ChunkedArray::{first, last} (#25449)
- Make
pipe_with_schemawork on Arced schema (#25155) - Move
EwmMeanStatetopolars-compute(#25034) - Move asof
tolerancetype coercion to IR conversion (#25033) - Move ewm variance code to polars-compute (#25188)
- Move supertype determination and casting to IR for
date_rangeand related functions (#24084) - Refactor
dt_rangefunctions (#25225) - Refactor sink IR (#25308)
- Remove
ClosableFile(#25330) - Remove
PyPartitioning(#25303) - Remove aggregation context
Context(#25424) - Remove incorrect cast in reduce code (#25321)
- Remove lower_ir conversion from Scan to InMemorySource (#25150)
- Remove old join projection pushdown logic (#25088)
- Remove some dead argminmax impl code (#25501)
- Remove unused
optimization_toggle(#25130) - Remove unused row-count (#25080)
- Remove verbose prints on file opens (#25523)
- Rename
URL_ENCODE_CHARSETtoHIVE_ENCODE_CHARSET(#25554) - Simplify sink parameter passing from Python (#25302)
- Support for named/anonymous aggregations (#25118)
- Take
&dyn Anyinstead ofBox<dyn Any>in python object converters (#25421) - Take
syncparameter inWriteable::close(#25475) - Update partitioned sink IR (#25524)
- Use dedicated runtime packages from template (#25284)
Thank you to all our contributors for making this release possible!
@AndreaBozzo, @DannyStoll1, @EndPositive, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @MarcoGorelli, @TNieuwdorp, @alexander-beedie, @borchero, @c-peters, @cBournhonesque, @carnarez, @cmdlineluser, @coastalwhite, @cr7pt0gr4ph7, @davanstrien, @dsprenkels, @etiennebacher, @feliblo, @itamarst, @jannickj, @jetuk, @kdn36, @lun3x, @marinegor, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @vyasr, @wtn and more!