pola-rs/polars rs-0.53.0 on GitHub

🏆 Highlights

Add Extension types (#25322)

🚀 Performance improvements

Don't always rechunk on gather of nested types (#26478)
Enable zero-copy object_store put upload for IPC sink (#26288)
Resolve file schema's and metadata concurrently (#26325)
Run elementwise CSEE for the streaming engine (#26278)
Disable morsel splitting for fast-count on streaming engine (#26245)
Implement streaming decompression for scan_ndjson and scan_lines (#26200)
Improve string slicing performance (#26206)
Refactor scan_delta to use python dataset interface (#26190)
Add dedicated kernel for group-by arg_max/arg_min (#26093)
Add streaming merge-join (#25964)
Generalize Bitmap::new_zeroed opt for Buffer::zeroed (#26142)
Reduce fs stat calls in path expansion (#26173)
Lower streaming group_by n_unique to unique().len() (#26109)
Speed up SQL interface "UNION" clauses (#26039)
Speed up SQL interface "ORDER BY" clauses (#26037)
Add fast kernel for is_nan and use it for numpy NaN->null conversion (#26034)
Optimize ArrayFromIter implementations for ObjectArray (#25712)
New streaming NDJSON sink pipeline (#25948)
New streaming CSV sink pipeline (#25900)
Dispatch partitioned usage of sink_* functions to new-streaming by default (#25910)
Replace ryu with faster zmij (#25885)
Reduce memory usage for .item() count in grouped first/last (#25787)
Skip schema inference if schema provided for scan_csv/ndjson (#25757)
Add width-aware chunking to prevent degradation with wide data (#25764)
Use new sink pipeline for write/sink_ipc (#25746)
Reduce memory usage when scanning multiple parquet files in streaming (#25747)
Don't call cluster_with_columns optimization if not needed (#25724)
Tune partitioned sink_parquet cloud performance (#25687)
New single file IO sink pipeline enabled for sink_parquet (#25670)
New partitioned IO sink pipeline enabled for sink_parquet (#25629)
Correct overly eager local predicate insertion for unpivot (#25644)
Reduce HuggingFace API calls (#25521)
Use strong hash instead of traversal for CSPE equality (#25537)
Fix panic in is_between support in streaming Parquet predicate push down (#25476)
Faster kernels for rle_lengths (#25448)
Allow detecting plan sortedness in more cases (#25408)
Enable predicate expressions on unsigned integers (#25416)
Mark output of more non-order-maintaining ops as unordered (#25419)
Fast find start window in group_by_dynamic with large offset (#25376)
Add streaming native LazyFrame.group_by_dynamic (#25342)
Add streaming sorted Group-By (#25013)
Add parquet prefiltering for string regexes (#25381)
Use fast path for agg_min/agg_max when nulls present (#25374)
Fuse positive slice into streaming LazyFrame.rolling (#25338)
Mark Expr.reshape((-1,)) as row separable (#25326)
Use bitmap instead of Vec<bool> in first/last w. skip_nulls (#25318)
Return references from aexpr_to_leaf_names_iter (#25319)

✨ Enhancements

Add primitive filter -> agg lowering in streaming GroupBy (#26459)
Support for the SQL FETCH clause (#26449)
Add get() to retrieve a byte from binary data (#26454)
Remove with_context in SQL lowering (#26416)
Avoid OOM for scan_ndjson and scan_lines if input is compressed and negative slice (#26396)
Add JoinBuildSide (#26403)
Support annoymous agg in-mem (#26376)
Add unstable arrow_schema parameter to sink_parquet (#26323)
Improve error message formatting for structs (#26349)
Remove parquet field overwrites (#26236)
Enable zero-copy object_store put upload for IPC sink (#26288)
Improved disambiguation for qualified wildcard columns in SQL projections (#26301)
Expose upload_concurrency through env var (#26263)
Allow quantile to compute multiple quantiles at once (#25516)
Allow empty LazyFrame in LazyFrame.group_by(...).map_groups (#26275)
Use delta file statistics for batch predicate pushdown (#26242)
Add streaming UnorderedUnion (#26240)
Implement compression support for sink_ndjson (#26212)
Add unstable record batch statistics flags to {sink/scan}_ipc (#26254)
Cloud retry/backoff configuration via storage_options (#26204)
Use same sort order for expanded paths across local / cloud / directory / glob (#26191)
Expose physical plan NodeStyle (#26184)
Add streaming merge-join (#25964)
Serialize optimization flags for cloud plan (#26168)
Add compression support to write_csv and sink_csv (#26111)
Add scan_lines (#26112)
Support regex in str.split (#26060)
Add unstable IPC Statistics read/write to scan_ipc/sink_ipc (#26079)
Add nulls support for all rolling_by operations (#26081)
ArrowStreamExportable and sink_delta (#25994)
Release musl builds (#25894)
Implement streaming decompression for CSV COUNT(*) fast path (#25988)
Add nulls support for rolling_mean_by (#25917)
Add lazy collect_all (#25991)
Add streaming decompression for NDJSON schema inference (#25992)
Improved handling of unqualified SQL JOIN columns that are ambiguous (#25761)
Expose record batch size in {sink,write}_ipc (#25958)
Add null_on_oob parameter to expr.get (#25957)
Suggest correct timezone if timezone validation fails (#25937)
Support streaming IPC scan from S3 object store (#25868)
Implement streaming CSV schema inference (#25911)
Support hashing of meta expressions (#25916)
Improve SQLContext recognition of possible table objects in the Python globals (#25749)
Add pl.Expr.(min|max)_by (#25905)
Improve MemSlice Debug impl (#25913)
Implement or fix json encode/decode for (U)Int128, Categorical, Enum, Decimal (#25896)
Expand scatter to more dtypes (#25874)
Implement streaming CSV decompression (#25842)
Add Series sql method for API consistency (#25792)
Mark Polars as safe for free-threading (#25677)
Support Binary and Decimal in arg_(min|max) (#25839)
Allow Decimal parsing in str.json_decode (#25797)
Add shift support for Object data type (#25769)
Add node status to NodeMetrics (#25760)
Allow scientific notation when parsing Decimals (#25711)
Allow creation of Object literal (#25690)
Don't collect schema in SQL union processing (#25675)
Add bin.slice(), bin.head(), and bin.tail() methods (#25647)
Add SQL support for the QUALIFY clause (#25652)
New partitioned IO sink pipeline enabled for sink_parquet (#25629)
Add SQL syntax support for CROSS JOIN UNNEST(col) (#25623)
Add separate env var to log tracked metrics (#25586)
Expose fields for generating physical plan visualization data (#25562)
Allow pl.Object in pivot value (#25533)
Extend SQL UNNEST support to handle multiple array expressions (#25418)
Minor improvement for as_struct repr (#25529)
Temporal quantile in rolling context (#25479)
Add support for Float16 dtype (#25185)
Add strict parameter to pl.concat(how='horizontal') (#25452)
Add leftmost option to str.replace_many / str.find_many / str.extract_many (#25398)
Add quantile for missing temporals (#25464)
Expose and document pl.Categories (#25443)
Support decimals in search_sorted (#25450)
Use reference to Graph pipes when flushing metrics (#25442)
Add SQL support for named WINDOW references (#25400)
Add Extension types (#25322)
Add having to group_by context (#23550)
Allow elementwise Expr.over in aggregation context (#25402)
Add SQL support for ROW_NUMBER, RANK, and DENSE_RANK functions (#25409)
Automatically Parquet dictionary encode floats (#25387)
Add empty_as_null and keep_nulls to {Lazy,Data}Frame.explode (#25369)
Allow hash for all List dtypes (#25372)
Support unique_counts for all datatypes (#25379)
Add maintain_order to Expr.mode (#25377)
Display function of streaming physical plan map node (#25368)
Allow slice on scalar in aggregation context (#25358)
Allow implode and aggregation in aggregation context (#25357)
Add empty_as_null and keep_nulls flags to Expr.explode (#25289)
Add ignore_nulls to first / last (#25105)
Move GraphMetrics into StreamingQuery (#25310)
Allow Expr.unique on List/Array with non-numeric types (#25285)
Allow Expr.rolling in aggregation contexts (#25258)
Support additional forms of SQL CREATE TABLE statements (#25191)
Add LazyFrame.pivot (#25016)
Support column-positional SQL UNION operations (#25183)
Allow arbitrary expressions as the Expr.rolling index_column (#25117)
Allow arbitrary Expressions in "subset" parameter of unique frame method (#25099)
Support arbitrary expressions in SQL JOIN constraints (#25132)

🐞 Bug fixes

Do not overwrite used names in cluster_with_columns pushdown (#26467)
Do not mark output of concat_str on multiple inputs as sorted (#26468)
Fix CSV schema inference content line duplication bug (#26452)
Fix InvalidOperationError using scan_delta with filter (#26448)
Alias giving missing column after streaming GroupBy CSE (#26447)
Ensure by_name selector selects only names (#26437)
Restore compatibility of strings written to parquet with pyarrow filter (#26436)
Update schema in cluster_with_columns optimization (#26430)
Fix negative slice in groups slicing (#26442)
Don't run CPU check on aarch64 musl (#26439)
Remove the POLARS_IDEAL_MORSEL_SIZE monkeypatching in the parametric merge-join test (#26418)
Correct off-by-one in RLE row counting for nullable dictionary-encoded columns (#26411)
Support very large integers in env var limits (#26399)
Fix PlPath panic from incorrect slicing of UTF8 boundaries (#26389)
Fix Float dtype for spearman correlation (#26392)
Fix optimizer panic in right joins with type coercion (#26365)
Don't serialize retry config from local environment vars (#26289)
Fix PartitionBy with scalar key expressions and diff() (#26370)
Add {Float16, Float32} -> Float32 lossless upcast (#26373)
Fix panic using with_columns and collect_all (#26366)
Add multi-page support for writing dictionary-encoded Parquet columns (#26360)
Ensure slice advancement when skipping non-inlinable values in is_in with inlinable needles (#26361)
Pin xlsx2csv version temporarily (#26352)
Bugs in ViewArray total_bytes_len (#26328)
Overflow in i128::abs in Decimal fits check (#26341)
Make Expr.hash on Categorical mapping-independent (#26340)
Clone shared GroupBy node before mutation in physical plan creation (#26327)
Fix lazy evaluation of replace_strict by making it fallible (#26267)
Consider the "current location" of an item when computing rolling_rank_by (#26287)
Reset is_count_star flag between queries in collect_all (#26256)
Fix incorrect is_between filter on scan_parquet (#26284)
Lower AnonymousStreamingAgg in group-by as aggregate (#26258)
Avoid overflow in pl.duration scalar arguments case (#26213)
Broadcast arr.get on single array with multiple indices (#26219)
Fix panic on CSPE with sorts (#26231)
Fix UB in DataFrame::transpose_from_dtype (#26203)
Eager DataFrame.slice with negative offset and length=None (#26215)
Use correct schema side for streaming merge join lowering (#26218)
Implement expression keys for merge-join (#26202)
Overflow panic in scan_csv with multiple files and skip_rows + n_rows larger than total row count (#26128)
Respect allow_object flag after cache (#26196)
Raise error on non-elementwise PartitionBy keys (#26194)
Allow ordered categorical dictionary in scan_parquet (#26180)
Allow excess bytes on IPC bitmap compressed length (#26176)
Address buggy quadratic scaling fix in scan_csv (#26175)
Address a macOS-specific compile issue (#26172)
Fix deadlock on hash_rows() of 0-width DataFrame (#26154)
Fix NameError filtering pyarrow dataset (#26166)
Fix concat_arr panic when using categoricals/enums (#26146)
Fix NDJSON/scan_lines negative slice splitting with extremely long lines (#26132)
Incorrect group_by min/max fast path (#26139)
Remove a source of non-determinism from lowering (#26137)
Error when with_row_index or unpivot create duplicate columns on a LazyFrame (#26107)
Panics on shift with head (#26099)
Optimize slicing support on compressed IPC (#26071)
CPU check for musl builds (#26076)
Fix slicing on compressed IPC (#26066)
Release GIL on collect_batches (#26033)
Missing buffer update in String is_in Parquet pushdown (#26019)
Make struct.with_fields data model coherent (#25610)
Incorrect output order for order sensitive operations after join_asof (#25990)
Use SeriesExport for pyo3-polars FFI (#26000)
Don't write Parquet min/max statistics for i128 (#25986)
Ensure chunk consistency in in-memory join (#25979)
Fix varying block metadata length in IPC reader (#25975)
Implement collect_batches properly in Rust (#25918)
Fix panic on arithmetic with bools in list (#25898)
Convert to index type with strict cast in some places (#25912)
Empty dataframe in streaming non-strict hconcat (#25903)
Infer large u64 in json as i128 (#25904)
Set http client timeouts to 10 minutes (#25902)
Prevent panic when comparing Date with Duration types (#25856)
Correct lexicographic ordering for Parquet BYTE_ARRAY statistics (#25886)
Raise error on duplicate group_by names in upsample() (#25811)
Correctly export view buffer sizes nested in Extension types (#25853)
Fix DataFrame.estimated_size not handling overlapping chunks correctly (#25775)
Ensure Kahan sum does not introduce NaN from infinities (#25850)
Trim excess bytes in parquet decode (#25829)
Reshape checks size to match exactly (#25571)
Fix panic/deadlock sinking parquet with rows larger than 64MB estimated size (#25836)
Fix quantile midpoint interpolation (#25824)
Don't use cast when converting from physical in list.get (#25831)
Invalid null count on int -> categorical cast (#25816)
Update groups in list.eval (#25826)
Use downcast before FFI conversion in PythonScan (#25815)
Double-counting of row metrics (#25810)
Cast nulls to expected type in streaming union node (#25802)
Incorrect slice pushdown into map_groups (#25809)
Fix panic writing parquet with single bool column (#25807)
Fix upsample with group_by incorrectly introduced NULLs on group key columns (#25794)
Panic in top_k pruning (#25798)
Fix documentation for new() (#25791)
Fix incorrect collect_schema for unpivot followed by join (#25782)
Fix documentation for tail() (#25784)
Verify arr namespace is called from array column (#25650)
Ensure LazyFrame.serialize() unchanged after collect_schema() (#25780)
Function map_(rows|elements) with return_dtype = pl.Object (#25753)
Avoid visiting nodes multiple times in PhysicalPlanVisualizationDataGenerator (#25737)
Fix incorrect cargo sub-feature (#25738)
Fix deadlock on empty scan IR (#25716)
Don't invalidate node in cluster-with-columns (#25714)
Move boto3 extra from s3fs in dev requirements (#25667)
Binary slice methods missing from Series and docs (#25683)
Mix-up of variable_name/value_name in unpivot (#25685)
Invalid usage of drop_first in to_dummies when nulls present (#25435)
Rechunk on nested dtypes in take_unchecked_impl parallel path (#25662)
New single file IO sink pipeline enabled for sink_parquet (#25670)
Fix streaming SchemaMismatch panic on list.drop_nulls (#25661)
Correct overly eager local predicate insertion for unpivot (#25644)
Fix "dtype is unknown" panic in cross joins with literals (#25658)
Fix panic on Boolean rolling_sum calculation for list or array eval (#25660)
Preserve List inner dtype during chunked take operations (#25634)
Fix panic edge-case when scanning hive partitioned data (#25656)
Fix lifetime for AmortSeries lazy group iterator (#25620)
Improve SQL GROUP BY and ORDER BY expression resolution, handling aliasing edge-cases (#25637)
Fix empty format handling (#25638)
Prevent false positives in is_in for large integers (#25608)
Optimize projection pushdown through HConcat (#25371)
Differentiate between empty list an no list for unpivot (#25597)
Properly resolve HAVING clause during SQL GROUP BY operations (#25615)
Fix spearman panicking on nulls (#25619)
Increase precision when constructing float Series (#25323)
Make sum on strings error in group_by context (#25456)
Hang in multi-chunk DataFrame .rows() (#25582)
Bug in boolean unique_counts (#25587)
Set Float16 parquet schema type to Float16 (#25578)
Correct arr_to_any_value for object arrays (#25581)
Have PySeries::new_f16 receive pf16s instead of f32s (#25579)
Fix occurence of exact matches of .join_asof(strategy="nearest", allow_exact_matches=False, ...) (#25506)
Raise error on out-of-range dates in temporal operations (#25471)
Fix incorrect .list.eval after slicing operations (#25540)
Reduce HuggingFace API calls (#25521)
Strict conversion AnyValue to Struct (#25536)
Fix panic in is_between support in streaming Parquet predicate push down (#25476)
Always respect return_dtype in map_elements and map_rows (#25504)
Rolling mean/median for temporals (#25512)
Add .rolling_rank() support for temporal types and pl.Boolean (#25509)
Fix dictionary replacement error in write_ipc() (#25497)
Fix group lengths check in sort_by with AggregatedScalar (#25503)
Fix expr slice pushdown causing shape error on literals (#25485)
Allow empty list in sort_by in list.eval context (#25481)
Prevent panic when joining sorted LazyFrame with itself (#25453)
Apply CSV dict overrides by name only (#25436)
Incorrect result in aggregated first/last with ignore_nulls (#25414)
Fix off-by-one bug in ColumnPredicates generation for inequalities operating on integer columns (#25412)
Fix arr.{eval,agg} in aggregation context (#25390)
Support AggregatedList in list.{eval,agg} context (#25385)
Improve SQL UNNEST behaviour (#22546)
Remove ClosableFile (#25330)
Use Cargo.template.toml to prevent git dependencies from using template (#25392)
Resolve edge-case with SQL aggregates that have the same name as one of the GROUP BY keys (#25362)
Revert pl.format behavior with nulls (#25370)
Remove Expr casts in pl.lit invocations (#25373)
Nested dtypes in streaming first_non_null/last_non_null (#25375)
Correct eq_missing for struct with nulls (#25363)
Unique on literal in aggregation context (#25359)
Allow implode and aggregation in aggregation context (#25357)
Aggregation with drop_nulls on literal (#25356)
Address multiple issues with SQL OVER clause behaviour for window functions (#25249)
Schema mismatch with list.agg, unique and scalar (#25348)
Correct drop_items for scalar input (#25351)
SQL NATURAL joins should coalesce the key columns (#25353)
Mark {forward,backward}_fill as length_preserving (#25352)
Nested dtypes in streaming first/last (#25298)
AnyValue::to_physical for categoricals (#25341)
Fix link errors reported by markdown-link-check (#25314)
Parquet is_in for mixed validity pages (#25313)
Fix length preserving check for eval expressions in streaming engine (#25294)
Fix building polars-plan with features lazy,concat_str (but no strings) (#25306)
Fix building polars-mem-engine with the async feature (#25300)
Don't quietly allow unsupported SQL SELECT clauses (#25282)
Fix small bug with PyExpr to PyObject conversion (#25265)
Reverse on chunked struct (#25281)
Panic exception when calling Expr.rolling in .over (#25283)
Correct {first,last}_non_null if there are empty chunks (#25279)
Incorrect results for aggregated {n_,}unique on bools (#25275)
Fix building polars-expr without timezones feature (#25254)
Ensure out-of-range integers and other edge case values don't give wrong results for index_of() (#24369)
Correctly prune projected columns in hints (#25250)
Allow Null dtype values in scatter (#25245)
Correct handle requested stops in streaming shift (#25239)
Make str.json_decode output deterministic with lists (#25240)
Wide-table join performance regression (#25222)
Fix single-column CSV header duplication with leading empty lines (#25186)
Enhanced column resolution/tracking through multi-way SQL joins (#25181)
Fix serialization of lazyframes containing huge tables (#25190)
Use (i64, u64) for VisualizationData (offset, length) slices (#25203)
Fix assertion panic on group_by (#25179)
Fix format_str in case of multiple chunks (#25162)
Fix incorrect drop_nans() result when used in group_by() / over() (#25146)

📖 Documentation

Fix typo in max_by docstring (#26404)
Remove deprecated cublet_id (#26260)
Update for new release (#26255)
Update MCP server section with new URL (#26241)
Fix unmatched paren and punctuation in pandas migration guide (#26251)
Add observatory database_path to docs (#26201)
Note plugins in Python user-defined functions (#26138)
Clarify min_by/max_by behavior on ties (#26077)
Add QUALIFY clause and SUBSTRING function to the SQL docs (#25779)
Update mixed-offset datetime parsing example in user guide (#25915)
Update bare-metal docs for mounted anonymous results (#25801)
Fix credential parameter name in cloud-storage.py (#25788)
Configuration options update (#25756)
Fix typos in Excel and Pandas migration guides (#25709)
Add "right" to how options in join() docstrings (#25678)
Document schema parameter in meta methods (#25543)
Correct link to datetime_range instead of date_range in resampling page (#25532)
Explain aggregation & sorting of lists (#25260)
Update LazyFrame.collect_schema() docstring (#25508)
Remove lzo from parquet write options (#25522)
Update on-premise documentation (#25489)
Fix incorrect 'bitwise' in any_horizontal/all_horizontal docstring (#25469)
Add Extension and BaseExtension to doc index (#25444)
Add polars-on-premise documentation (#25431)
Fix link errors reported by markdown-link-check (#25314)
Fix LanceDB URL (#25198)

📦 Build system

Address remaining Python 3.14 issues with make requirements-all (#26195)
Address a macOS-specific compile issue (#26172)
Fix make fmt and make lint commands (#25200)

🛠️ Other improvements

Move IO source metrics instrumentation to PolarsObjectStore (#26414)
More SQL to IR conversion execute_isolated (#26455)
Cleanup unused attributes in optimizer (#26464)
Use Expr::Display as catch all for IR - DSL asymmetry (#26471)
Remove the POLARS_IDEAL_MORSEL_SIZE monkeypatching in the parametric merge-join test (#26418)
Move IO metrics struct to polars-io and use new timer (#26397)
Reduce blocking on computational executor threads in multiscan init (#26407)
Cleanup the parametric merge-join test (#26413)
Ensure local doctests skip from_torch if module not installed (#26405)
Implement various deprecations (#26314)
Refactor MinBy and MaxBy as IRFunctions (#26307)
Rename Operator::Divide to RustDivide (#26339)
Properly disable the Pyodide tests (#26382)
Add LiveTimer (#26384)
Use derived serialization on PlRefPath (#26167)
Add metadata to ArrowSchema struct (#26318)
Remove unused field (#26367)
Fix runtime nesting (#26359)
Remove xlsx2csv dependency pin (#26355)
Allow unchecked IPC reads (#26354)
Use outer runtime if exists in to_alp (#26353)
Make CategoricalMapping::new pub(crate) to avoid misuse (#26308)
Clarify IPC buffer read limit/length paramter (#26334)
Improve accuracy of active IO time metric (#26315)
Mark VarState as repr(C) (#26309)
IO metrics for streaming Parquet / IPC sources (#26300)
Replace panicking index access with error handling in dictionaries_to_encode (#26059)
Remove unnecessary match and move early return in testing (#26297)
Add dtype test coverage for delta predicate filter (#26291)
Add property-based tests for Scalar::cast_with_options (#25744)
Add AI policy (#26286)
Remove MemSlice (#26259)
Remove recursion from upsample_impl (#26250)
Remove all non CSV fast-count paths (#26233)
Replace MemReader with Cursor (#26216)
Add serde(default) to new CSV compression fields (#26210)
Add a couple of SAFETY comments in merge-join node (#26197)
Expose physical plan NodeStyle (#26184)
Ensure optimization flag modification happens local (#26185)
Use NullChunked as default for Series (#26181)
In merge-sorted node, when buffering, request a stop on *every* unbuffered morsel (#26178)
Rename io_sinks2 -> io_sinks (#26159)
Lint leftover fixme (#26122)
Move Buffer and SharedStorage to polars-buffer crate (#26113)
Remove old sink IR (#26130)
Use derived serialization on PlRefPath (#26062)
Improve backtrace for POLARS_PANIC_ON_ERR (#26125)
Fix Python docs build (#26117)
Remove old streaming sink implementation (#26102)
Disable unused-ignore mypy lint (#26110)
Remove unused equality impls for IR / FunctionIR (#26106)
Ignore mypy warning (#26105)
Preserve order for string concatenation (#26101)
Raise error on file://hostname/path (#26061)
Disable debug info for docs workflow (#26086)
Remove IR / physical plan visualization data generators (#26090)
Update docs for next polars cloud release (#26091)
Support Python 3.14 in dev environment (#26073)
Mark top slow normal tests as slow (#26080)
Simplify PlPath (#26053)
Update breaking deps (#26055)
Fix for upstream url bug and update deps (#26052)
Properly pin chrono (#26051)
Don't run rust doctests (#26046)
Update deps (#26042)
Ignore very slow test (#26041)
Add Send bound for SharedStorage owner (#26040)
Update rust compiler (#26017)
Improve csv test coverage (#25980)
Use from_any_values_and_dtype in Series::extend_constant (#26006)
Pass sync_on_close and num_pipelines via start_file_writer for IO sinks (#25950)
Add broadcast_nulls field to RowEncodingVariant and _get_rows_encoded_{ca,arr} (#26001)
Ramp up CSV read size (#25997)
Rename FileType to FileWriteFormat (#25951)
Don't unwrap first sink morsel send (#25981)
Update ruff action and simplify version handling (#25940)
Cleanup Rust DataFrame interface (#25976)
Export PhysNode related struct (#25987)
Restructure Sort variant in logical and physical plans visualization data (#25978)
Run python lint target as part of pre-commit (#25982)
Allow multiple inputs to streaming GroupBy node (#25961)
Disable HTTP timeout for receiving response body (#25970)
Add AI contribution policy (#25956)
Remove unused sink code (#25949)
Add detailed Sink info to IRNodeProperties (#25954)
Wrap FileScanIR::Csv enum variant in Arc (#25952)
Use PlSmallStr for CSV format strings (#25901)
Add unsafe bound to MemSlice::from_arc (#25920)
Improve MemSlice Debug impl (#25913)
Remove manual cmp impls for &[u8] (#25890)
Remove and deprecate batched csv reader (#25884)
Remove unused AnonymousScan functions (#25872)
Use Buffer<T> instead of Arc<[T]> to store stringview buffers (#25870)
Add TakeableRowsProvider for IO sinks (#25858)
Filter DeprecationWarning from pyparsing indirectly through pyiceberg (#25854)
Various small improvements (#25835)
Clear venv with appropriate version of Python (#25851)
Move CSV write logic to CsvSerializer (#25828)
Ensure Polars Object extension type is registered (#25813)
Harden Python object process ID (#25812)
Skip schema inference if schema provided for scan_csv/ndjson (#25757)
Ensure proper async connection cleanup on DB test exit (#25766)
Flip has_residual_predicate -> no_residual_predicate (#25755)
Track original length before file filtering in scan IR (#25717)
Ensure we uninstall other Polars runtimes in CI (#25739)
Make 'make requirements' more robust (#25693)
Remove duplicate compression level types (#25723)
Replace async blocks with named components in new parquet write pipeline (#25695)
Move Object lit fix earlier in the function (#25713)
Remove unused decimal file (#25701)
Move boto3 extra from s3fs in dev requirements (#25667)
Upgrade to latest version of sqlparser-rs (#25673)
Update slab to version without RUSTSEC (#25686)
Fix typo (#25684)
Avoid rechunk requirement for Series.iter() (#25603)
Use dtype for group_aware evaluation on ApplyExpr (#25639)
Make polars-plan constants more consistent (#25645)
Add "panic" and "streaming" tagging to issue-labeler workflow (#25657)
Add support for multi-column reductions (#25640)
Fix rolling kernel dispatch with monotonic group attribute (#25494)
Simplify _write_any_value (#25622)
Ensure we hash all attributes and visit all children in traverse_and_hash_aexpr (#25627)
Ensure literal-only SELECT broadcast conforms to SQL semantics (#25633)
Add parquet file write pipeline for new IO sinks (#25618)
Rename polars-on-premise to polars-on-premises (#25617)
Constrain new issue-labeler workflow to the Issue title (#25614)
Add streaming IO sink components (#25594)
Help categorise Issues by automatically applying labels (using the same patterns used for labelling PRs) (#25599)
Show on streaming engine (#25589)
Add arg_sort() and Writeable::as_buffered() (#25583)
Take task priority argument in parallelize_first_to_local (#25563)
Skip existing files in pypi upload (#25576)
Fix template path in release-python workflow (#25565)
Skip rust integration tests for coverage in CI (#25558)
Add asserts and tests for list.eval on multiple chunks with slicing (#25559)
Rename URL_ENCODE_CHARSET to HIVE_ENCODE_CHARSET (#25554)
Add assert_sql_matches coverage for SQL DISTINCT and DISTINCT ON syntax (#25440)
Use strong hash instead of traversal for CSPE equality (#25537)
Update partitioned sink IR (#25524)
Print expected DSL schema hashes if mismatched (#25526)
Remove verbose prints on file opens (#25523)
Add proptest AnyValue strategies (#25510)
Fix --uv argument for benchmark-remote (#25513)
Add proptest DataFrame strategy (#25446)
Run maturin with --uv option (#25490)
Remove some dead argminmax impl code (#25501)
Fix feature gating TZ_AWARE_RE again (#25493)
Take sync parameter in Writeable::close() (#25475)
Fix unsoundness in ChunkedArray::{first, last} (#25449)
Add some cleanup (#25445)
Test for group_by(...).having(...) (#25430)
Accept multiple files in pipe_with_schema (#25388)
Remove aggregation context Context (#25424)
Take &dyn Any instead of Box<dyn Any> in python object converters (#25421)
Refactor sink IR (#25308)
Remove ClosableFile (#25330)
Remove debug file write from test suite (#25393)
Add ElementExpr for _eval expressions (#25199)
Dispatch Series.set to zip_with_same_dtype (#25327)
Better coverage for group_by aggregations (#25290)
Add oneshot channel to polars-stream (#25378)
Enable more streaming tests (#25364)
Remove Column::Partitioned (#25324)
Remove incorrect cast in reduce code (#25321)
Add toolchain file to runtimes for sdist (#25311)
Remove PyPartitioning (#25303)
Directly take CloudScheme in parse_cloud_options() (#25304)
Refactor dt_range functions (#25225)
Fix typo in CI release workflow (#25309)
Use dedicated runtime packages from template (#25284)
Add proptest strategies for Series nested types (#25220)
Simplify sink parameter passing from Python (#25302)
Add test for unique with column subset (#25241)
Fix Decimal precision annotation (#25227)
Add LazyFrame.pivot (#25016)
Clean up CSPE callsite (#25215)
Avoid relabelling changes-dsl on every commit (#25216)
Move ewm variance code to polars-compute (#25188)
Upgrade to schemars 0.9.0 (#25158)
Update markdown link checker (#25201)
Automatically label pull requests that change the DSL (#25177)
Add reliable test for pl.format on multiple chunks (#25164)
Move supertype determination and casting to IR for date_range and related functions (#24084)
Make python docs build again (#25165)
Make pipe_with_schema work on Arced schema (#25155)
Add functions for scan_lines (#25136)
Remove lower_ir conversion from Scan to InMemorySource (#25150)
Update versions (#25141)

Thank you to all our contributors for making this release possible!
@AndreaBozzo, @Atarust, @DannyStoll1, @EndPositive, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @LeeviLindgren, @MarcoGorelli, @Matt711, @MrAttoAttoAtto, @TNieuwdorp, @Voultapher, @WaffleLapkin, @agossard, @alex-gregory-ds, @alexander-beedie, @anosrepenilno, @arlyon, @azimafroozeh, @bayoumi17m, @borchero, @c-peters, @cBournhonesque, @camriddell, @carnarez, @cmdlineluser, @coastalwhite, @cr7pt0gr4ph7, @davanstrien, @davidia, @dependabot[bot], @dsprenkels, @edizeqiri, @eitanf, @etiennebacher, @feliblo, @gab23r, @guilhem-dvr, @hallmason17, @hamdanal, @henryharbeck, @hutch3232, @ion-elgreco, @itamarst, @jamesfricker, @jannickj, @jetuk, @jqnatividad, @kdn36, @lun3x, @m1guelperez, @marinegor, @mcrumiller, @nameexhaustion, @orlp, @pomo-mondreganto, @qxzcode, @r-brink, @ritchie46, @sachinn854, @stijnherfst, @sweb, @tlauli, @vyasr, @wtn, @yonikremer and dependabot[bot]

pola-rs/polars rs-0.53.0 Rust Polars 0.53.0 on GitHub

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

pola-rs/polars rs-0.53.0
Rust Polars 0.53.0

on GitHub