🏆 Highlights
- Add Extension types (#25322)
🚀 Performance improvements
- Don't always rechunk on gather of nested types (#26478)
- Enable zero-copy object_store
putupload for IPC sink (#26288) - Resolve file schema's and metadata concurrently (#26325)
- Run elementwise CSEE for the streaming engine (#26278)
- Disable morsel splitting for fast-count on streaming engine (#26245)
- Implement streaming decompression for scan_ndjson and scan_lines (#26200)
- Improve string slicing performance (#26206)
- Refactor
scan_deltato use python dataset interface (#26190) - Add dedicated kernel for group-by
arg_max/arg_min(#26093) - Add streaming merge-join (#25964)
- Generalize Bitmap::new_zeroed opt for Buffer::zeroed (#26142)
- Reduce fs stat calls in path expansion (#26173)
- Lower streaming group_by n_unique to unique().len() (#26109)
- Speed up
SQLinterface "UNION" clauses (#26039) - Speed up
SQLinterface "ORDER BY" clauses (#26037) - Add fast kernel for is_nan and use it for numpy NaN->null conversion (#26034)
- Optimize ArrayFromIter implementations for ObjectArray (#25712)
- New streaming NDJSON sink pipeline (#25948)
- New streaming CSV sink pipeline (#25900)
- Dispatch partitioned usage of
sink_*functions to new-streaming by default (#25910) - Replace ryu with faster zmij (#25885)
- Reduce memory usage for .item() count in grouped first/last (#25787)
- Skip schema inference if schema provided for
scan_csv/ndjson(#25757) - Add width-aware chunking to prevent degradation with wide data (#25764)
- Use new sink pipeline for write/sink_ipc (#25746)
- Reduce memory usage when scanning multiple parquet files in streaming (#25747)
- Don't call cluster_with_columns optimization if not needed (#25724)
- Tune partitioned sink_parquet cloud performance (#25687)
- New single file IO sink pipeline enabled for sink_parquet (#25670)
- New partitioned IO sink pipeline enabled for sink_parquet (#25629)
- Correct overly eager local predicate insertion for unpivot (#25644)
- Reduce HuggingFace API calls (#25521)
- Use strong hash instead of traversal for CSPE equality (#25537)
- Fix panic in is_between support in streaming Parquet predicate push down (#25476)
- Faster kernels for rle_lengths (#25448)
- Allow detecting plan sortedness in more cases (#25408)
- Enable predicate expressions on unsigned integers (#25416)
- Mark output of more non-order-maintaining ops as unordered (#25419)
- Fast find start window in
group_by_dynamicwith largeoffset(#25376) - Add streaming native
LazyFrame.group_by_dynamic(#25342) - Add streaming sorted Group-By (#25013)
- Add parquet prefiltering for string regexes (#25381)
- Use fast path for
agg_min/agg_maxwhen nulls present (#25374) - Fuse positive
sliceinto streamingLazyFrame.rolling(#25338) - Mark
Expr.reshape((-1,))as row separable (#25326) - Use bitmap instead of Vec<bool> in first/last w. skip_nulls (#25318)
- Return references from
aexpr_to_leaf_names_iter(#25319)
✨ Enhancements
- Add primitive filter -> agg lowering in streaming GroupBy (#26459)
- Support for the SQL
FETCHclause (#26449) - Add get() to retrieve a byte from binary data (#26454)
- Remove with_context in SQL lowering (#26416)
- Avoid OOM for scan_ndjson and scan_lines if input is compressed and negative slice (#26396)
- Add JoinBuildSide (#26403)
- Support annoymous agg in-mem (#26376)
- Add unstable
arrow_schemaparameter tosink_parquet(#26323) - Improve error message formatting for structs (#26349)
- Remove parquet field overwrites (#26236)
- Enable zero-copy object_store
putupload for IPC sink (#26288) - Improved disambiguation for qualified wildcard columns in SQL projections (#26301)
- Expose
upload_concurrencythrough env var (#26263) - Allow quantile to compute multiple quantiles at once (#25516)
- Allow empty LazyFrame in
LazyFrame.group_by(...).map_groups(#26275) - Use delta file statistics for batch predicate pushdown (#26242)
- Add streaming UnorderedUnion (#26240)
- Implement compression support for sink_ndjson (#26212)
- Add unstable record batch statistics flags to
{sink/scan}_ipc(#26254) - Cloud retry/backoff configuration via
storage_options(#26204) - Use same sort order for expanded paths across local / cloud / directory / glob (#26191)
- Expose physical plan
NodeStyle(#26184) - Add streaming merge-join (#25964)
- Serialize optimization flags for cloud plan (#26168)
- Add compression support to write_csv and sink_csv (#26111)
- Add
scan_lines(#26112) - Support regex in
str.split(#26060) - Add unstable IPC Statistics read/write to
scan_ipc/sink_ipc(#26079) - Add nulls support for all rolling_by operations (#26081)
- ArrowStreamExportable and sink_delta (#25994)
- Release musl builds (#25894)
- Implement streaming decompression for CSV
COUNT(*)fast path (#25988) - Add nulls support for rolling_mean_by (#25917)
- Add lazy
collect_all(#25991) - Add streaming decompression for NDJSON schema inference (#25992)
- Improved handling of unqualified SQL
JOINcolumns that are ambiguous (#25761) - Expose record batch size in
{sink,write}_ipc(#25958) - Add
null_on_oobparameter toexpr.get(#25957) - Suggest correct timezone if timezone validation fails (#25937)
- Support streaming IPC scan from S3 object store (#25868)
- Implement streaming CSV schema inference (#25911)
- Support hashing of meta expressions (#25916)
- Improve
SQLContextrecognition of possible table objects in the Python globals (#25749) - Add pl.Expr.(min|max)_by (#25905)
- Improve MemSlice Debug impl (#25913)
- Implement or fix json encode/decode for (U)Int128, Categorical, Enum, Decimal (#25896)
- Expand scatter to more dtypes (#25874)
- Implement streaming CSV decompression (#25842)
- Add Series
sqlmethod for API consistency (#25792) - Mark Polars as safe for free-threading (#25677)
- Support Binary and Decimal in arg_(min|max) (#25839)
- Allow Decimal parsing in str.json_decode (#25797)
- Add
shiftsupport for Object data type (#25769) - Add node status to NodeMetrics (#25760)
- Allow scientific notation when parsing Decimals (#25711)
- Allow creation of
Objectliteral (#25690) - Don't collect schema in SQL union processing (#25675)
- Add
bin.slice(),bin.head(), andbin.tail()methods (#25647) - Add SQL support for the
QUALIFYclause (#25652) - New partitioned IO sink pipeline enabled for sink_parquet (#25629)
- Add SQL syntax support for
CROSS JOIN UNNEST(col)(#25623) - Add separate env var to log tracked metrics (#25586)
- Expose fields for generating physical plan visualization data (#25562)
- Allow pl.Object in pivot value (#25533)
- Extend SQL
UNNESTsupport to handle multiple array expressions (#25418) - Minor improvement for
as_structrepr (#25529) - Temporal
quantilein rolling context (#25479) - Add support for
Float16dtype (#25185) - Add strict parameter to pl.concat(how='horizontal') (#25452)
- Add leftmost option to
str.replace_many / str.find_many / str.extract_many(#25398) - Add
quantilefor missing temporals (#25464) - Expose and document pl.Categories (#25443)
- Support decimals in search_sorted (#25450)
- Use reference to Graph pipes when flushing metrics (#25442)
- Add SQL support for named
WINDOWreferences (#25400) - Add Extension types (#25322)
- Add
havingtogroup_bycontext (#23550) - Allow elementwise
Expr.overin aggregation context (#25402) - Add SQL support for
ROW_NUMBER,RANK, andDENSE_RANKfunctions (#25409) - Automatically Parquet dictionary encode floats (#25387)
- Add
empty_as_nullandkeep_nullsto{Lazy,Data}Frame.explode(#25369) - Allow
hashfor allListdtypes (#25372) - Support
unique_countsfor all datatypes (#25379) - Add
maintain_ordertoExpr.mode(#25377) - Display function of streaming physical plan
mapnode (#25368) - Allow
sliceon scalar in aggregation context (#25358) - Allow
implodeand aggregation in aggregation context (#25357) - Add
empty_as_nullandkeep_nullsflags toExpr.explode(#25289) - Add
ignore_nullstofirst/last(#25105) - Move GraphMetrics into StreamingQuery (#25310)
- Allow
Expr.uniqueonList/Arraywith non-numeric types (#25285) - Allow
Expr.rollingin aggregation contexts (#25258) - Support additional forms of SQL
CREATE TABLEstatements (#25191) - Add
LazyFrame.pivot(#25016) - Support column-positional SQL
UNIONoperations (#25183) - Allow arbitrary expressions as the
Expr.rollingindex_column(#25117) - Allow arbitrary Expressions in "subset" parameter of
uniqueframe method (#25099) - Support arbitrary expressions in SQL
JOINconstraints (#25132)
🐞 Bug fixes
- Do not overwrite used names in cluster_with_columns pushdown (#26467)
- Do not mark output of concat_str on multiple inputs as sorted (#26468)
- Fix CSV schema inference content line duplication bug (#26452)
- Fix InvalidOperationError using
scan_deltawith filter (#26448) - Alias giving missing column after streaming GroupBy CSE (#26447)
- Ensure
by_nameselector selects only names (#26437) - Restore compatibility of strings written to parquet with pyarrow filter (#26436)
- Update schema in cluster_with_columns optimization (#26430)
- Fix negative slice in groups slicing (#26442)
- Don't run CPU check on aarch64 musl (#26439)
- Remove the
POLARS_IDEAL_MORSEL_SIZEmonkeypatching in the parametric merge-join test (#26418) - Correct off-by-one in RLE row counting for nullable dictionary-encoded columns (#26411)
- Support very large integers in env var limits (#26399)
- Fix PlPath panic from incorrect slicing of UTF8 boundaries (#26389)
- Fix Float dtype for spearman correlation (#26392)
- Fix optimizer panic in right joins with type coercion (#26365)
- Don't serialize retry config from local environment vars (#26289)
- Fix
PartitionBywith scalar key expressions anddiff()(#26370) - Add {Float16, Float32} -> Float32 lossless upcast (#26373)
- Fix panic using
with_columnsandcollect_all(#26366) - Add multi-page support for writing dictionary-encoded Parquet columns (#26360)
- Ensure slice advancement when skipping non-inlinable values in
is_inwith inlinable needles (#26361) - Pin
xlsx2csvversion temporarily (#26352) - Bugs in ViewArray total_bytes_len (#26328)
- Overflow in i128::abs in Decimal fits check (#26341)
- Make Expr.hash on Categorical mapping-independent (#26340)
- Clone shared GroupBy node before mutation in physical plan creation (#26327)
- Fix lazy evaluation of replace_strict by making it fallible (#26267)
- Consider the "current location" of an item when computing
rolling_rank_by(#26287) - Reset
is_count_starflag between queries in collect_all (#26256) - Fix incorrect is_between filter on scan_parquet (#26284)
- Lower AnonymousStreamingAgg in group-by as aggregate (#26258)
- Avoid overflow in
pl.durationscalar arguments case (#26213) - Broadcast arr.get on single array with multiple indices (#26219)
- Fix panic on CSPE with sorts (#26231)
- Fix UB in
DataFrame::transpose_from_dtype(#26203) - Eager
DataFrame.slicewith negative offset andlength=None(#26215) - Use correct schema side for streaming merge join lowering (#26218)
- Implement expression keys for merge-join (#26202)
- Overflow panic in
scan_csvwith multiple files andskip_rows + n_rowslarger than total row count (#26128) - Respect
allow_objectflag after cache (#26196) - Raise error on non-elementwise PartitionBy keys (#26194)
- Allow ordered categorical dictionary in scan_parquet (#26180)
- Allow excess bytes on IPC bitmap compressed length (#26176)
- Address buggy quadratic scaling fix in scan_csv (#26175)
- Address a macOS-specific compile issue (#26172)
- Fix deadlock on
hash_rows()of 0-width DataFrame (#26154) - Fix NameError filtering pyarrow dataset (#26166)
- Fix concat_arr panic when using categoricals/enums (#26146)
- Fix NDJSON/scan_lines negative slice splitting with extremely long lines (#26132)
- Incorrect group_by min/max fast path (#26139)
- Remove a source of non-determinism from lowering (#26137)
- Error when
with_row_indexorunpivotcreate duplicate columns on aLazyFrame(#26107) - Panics on shift with head (#26099)
- Optimize slicing support on compressed IPC (#26071)
- CPU check for musl builds (#26076)
- Fix slicing on compressed IPC (#26066)
- Release GIL on collect_batches (#26033)
- Missing buffer update in String is_in Parquet pushdown (#26019)
- Make
struct.with_fieldsdata model coherent (#25610) - Incorrect output order for order sensitive operations after join_asof (#25990)
- Use SeriesExport for pyo3-polars FFI (#26000)
- Don't write Parquet min/max statistics for i128 (#25986)
- Ensure chunk consistency in in-memory join (#25979)
- Fix varying block metadata length in IPC reader (#25975)
- Implement collect_batches properly in Rust (#25918)
- Fix panic on arithmetic with bools in list (#25898)
- Convert to index type with strict cast in some places (#25912)
- Empty dataframe in streaming non-strict hconcat (#25903)
- Infer large u64 in json as i128 (#25904)
- Set http client timeouts to 10 minutes (#25902)
- Prevent panic when comparing
DatewithDurationtypes (#25856) - Correct lexicographic ordering for Parquet BYTE_ARRAY statistics (#25886)
- Raise error on duplicate
group_bynames inupsample()(#25811) - Correctly export view buffer sizes nested in Extension types (#25853)
- Fix
DataFrame.estimated_sizenot handling overlapping chunks correctly (#25775) - Ensure Kahan sum does not introduce NaN from infinities (#25850)
- Trim excess bytes in parquet decode (#25829)
- Reshape checks size to match exactly (#25571)
- Fix panic/deadlock sinking parquet with rows larger than 64MB estimated size (#25836)
- Fix quantile
midpointinterpolation (#25824) - Don't use cast when converting from physical in list.get (#25831)
- Invalid null count on int -> categorical cast (#25816)
- Update groups in
list.eval(#25826) - Use downcast before FFI conversion in PythonScan (#25815)
- Double-counting of row metrics (#25810)
- Cast nulls to expected type in streaming union node (#25802)
- Incorrect slice pushdown into map_groups (#25809)
- Fix panic writing parquet with single bool column (#25807)
- Fix upsample with
group_byincorrectly introduced NULLs on group key columns (#25794) - Panic in top_k pruning (#25798)
- Fix documentation for new() (#25791)
- Fix incorrect
collect_schemafor unpivot followed by join (#25782) - Fix documentation for
tail()(#25784) - Verify
arrnamespace is called from array column (#25650) - Ensure
LazyFrame.serialize()unchanged aftercollect_schema()(#25780) - Function map_(rows|elements) with return_dtype = pl.Object (#25753)
- Avoid visiting nodes multiple times in PhysicalPlanVisualizationDataGenerator (#25737)
- Fix incorrect cargo sub-feature (#25738)
- Fix deadlock on empty scan IR (#25716)
- Don't invalidate node in cluster-with-columns (#25714)
- Move
boto3extra from s3fs in dev requirements (#25667) - Binary slice methods missing from Series and docs (#25683)
- Mix-up of variable_name/value_name in unpivot (#25685)
- Invalid usage of
drop_firstinto_dummieswhen nulls present (#25435) - Rechunk on nested dtypes in
take_unchecked_implparallel path (#25662) - New single file IO sink pipeline enabled for sink_parquet (#25670)
- Fix streaming
SchemaMismatchpanic onlist.drop_nulls(#25661) - Correct overly eager local predicate insertion for unpivot (#25644)
- Fix "dtype is unknown" panic in cross joins with literals (#25658)
- Fix panic on Boolean
rolling_sumcalculation for list or array eval (#25660) - Preserve List inner dtype during chunked take operations (#25634)
- Fix
panicedge-case when scanning hive partitioned data (#25656) - Fix lifetime for
AmortSerieslazy group iterator (#25620) - Improve SQL
GROUP BYandORDER BYexpression resolution, handling aliasing edge-cases (#25637) - Fix empty format handling (#25638)
- Prevent false positives in is_in for large integers (#25608)
- Optimize projection pushdown through HConcat (#25371)
- Differentiate between empty list an no list for unpivot (#25597)
- Properly resolve
HAVINGclause during SQLGROUP BYoperations (#25615) - Fix spearman panicking on nulls (#25619)
- Increase precision when constructing
floatSeries (#25323) - Make sum on strings error in group_by context (#25456)
- Hang in multi-chunk DataFrame .rows() (#25582)
- Bug in boolean unique_counts (#25587)
- Set
Float16parquet schema type toFloat16(#25578) - Correct arr_to_any_value for object arrays (#25581)
- Have
PySeries::new_f16receivepf16s instead off32s (#25579) - Fix occurence of exact matches of
.join_asof(strategy="nearest", allow_exact_matches=False, ...)(#25506) - Raise error on out-of-range dates in temporal operations (#25471)
- Fix incorrect
.list.evalafter slicing operations (#25540) - Reduce HuggingFace API calls (#25521)
- Strict conversion AnyValue to Struct (#25536)
- Fix panic in is_between support in streaming Parquet predicate push down (#25476)
- Always respect return_dtype in map_elements and map_rows (#25504)
- Rolling
mean/medianfor temporals (#25512) - Add
.rolling_rank()support for temporal types andpl.Boolean(#25509) - Fix dictionary replacement error in
write_ipc()(#25497) - Fix group lengths check in
sort_bywithAggregatedScalar(#25503) - Fix expr slice pushdown causing shape error on literals (#25485)
- Allow empty list in
sort_byinlist.evalcontext (#25481) - Prevent panic when joining sorted LazyFrame with itself (#25453)
- Apply CSV dict overrides by name only (#25436)
- Incorrect result in aggregated
first/lastwithignore_nulls(#25414) - Fix off-by-one bug in
ColumnPredicatesgeneration for inequalities operating on integer columns (#25412) - Fix
arr.{eval,agg}in aggregation context (#25390) - Support
AggregatedListinlist.{eval,agg}context (#25385) - Improve SQL
UNNESTbehaviour (#22546) - Remove
ClosableFile(#25330) - Use Cargo.template.toml to prevent git dependencies from using template (#25392)
- Resolve edge-case with SQL aggregates that have the same name as one of the
GROUP BYkeys (#25362) - Revert
pl.formatbehavior with nulls (#25370) - Remove
Exprcasts inpl.litinvocations (#25373) - Nested dtypes in streaming
first_non_null/last_non_null(#25375) - Correct
eq_missingfor struct with nulls (#25363) - Unique on literal in aggregation context (#25359)
- Allow
implodeand aggregation in aggregation context (#25357) - Aggregation with
drop_nullson literal (#25356) - Address multiple issues with SQL
OVERclause behaviour for window functions (#25249) - Schema mismatch with
list.agg,uniqueand scalar (#25348) - Correct
drop_itemsfor scalar input (#25351) - SQL
NATURALjoins should coalesce the key columns (#25353) - Mark
{forward,backward}_fillaslength_preserving(#25352) - Nested dtypes in streaming
first/last(#25298) - AnyValue::to_physical for categoricals (#25341)
- Fix link errors reported by
markdown-link-check(#25314) - Parquet
is_infor mixed validity pages (#25313) - Fix length preserving check for
evalexpressions in streaming engine (#25294) - Fix building polars-plan with features lazy,concat_str (but no strings) (#25306)
- Fix building polars-mem-engine with the async feature (#25300)
- Don't quietly allow unsupported SQL
SELECTclauses (#25282) - Fix small bug with
PyExprtoPyObjectconversion (#25265) - Reverse on chunked
struct(#25281) - Panic exception when calling
Expr.rollingin.over(#25283) - Correct
{first,last}_non_nullif there are empty chunks (#25279) - Incorrect results for aggregated
{n_,}uniqueon bools (#25275) - Fix building polars-expr without timezones feature (#25254)
- Ensure out-of-range integers and other edge case values don't give wrong results for index_of() (#24369)
- Correctly prune projected columns in hints (#25250)
- Allow
Nulldtype values inscatter(#25245) - Correct handle requested stops in streaming shift (#25239)
- Make
str.json_decodeoutput deterministic with lists (#25240) - Wide-table join performance regression (#25222)
- Fix single-column CSV header duplication with leading empty lines (#25186)
- Enhanced column resolution/tracking through multi-way SQL joins (#25181)
- Fix serialization of lazyframes containing huge tables (#25190)
- Use (i64, u64) for VisualizationData (offset, length) slices (#25203)
- Fix assertion panic on
group_by(#25179) - Fix
format_strin case of multiple chunks (#25162) - Fix incorrect
drop_nans()result when used ingroup_by()/over()(#25146)
📖 Documentation
- Fix typo in max_by docstring (#26404)
- Remove deprecated
cublet_id(#26260) - Update for new release (#26255)
- Update MCP server section with new URL (#26241)
- Fix unmatched paren and punctuation in pandas migration guide (#26251)
- Add observatory database_path to docs (#26201)
- Note plugins in Python user-defined functions (#26138)
- Clarify min_by/max_by behavior on ties (#26077)
- Add
QUALIFYclause andSUBSTRINGfunction to the SQL docs (#25779) - Update mixed-offset datetime parsing example in user guide (#25915)
- Update bare-metal docs for mounted anonymous results (#25801)
- Fix credential parameter name in cloud-storage.py (#25788)
- Configuration options update (#25756)
- Fix typos in Excel and Pandas migration guides (#25709)
- Add "right" to
howoptions injoin()docstrings (#25678) - Document schema parameter in meta methods (#25543)
- Correct link to
datetime_rangeinstead ofdate_rangein resampling page (#25532) - Explain aggregation & sorting of lists (#25260)
- Update
LazyFrame.collect_schema()docstring (#25508) - Remove lzo from parquet write options (#25522)
- Update on-premise documentation (#25489)
- Fix incorrect 'bitwise' in
any_horizontal/all_horizontaldocstring (#25469) - Add Extension and BaseExtension to doc index (#25444)
- Add polars-on-premise documentation (#25431)
- Fix link errors reported by
markdown-link-check(#25314) - Fix LanceDB URL (#25198)
📦 Build system
- Address remaining Python 3.14 issues with
make requirements-all(#26195) - Address a macOS-specific compile issue (#26172)
- Fix
make fmtandmake lintcommands (#25200)
🛠️ Other improvements
- Move IO source metrics instrumentation to
PolarsObjectStore(#26414) - More SQL to IR conversion
execute_isolated(#26455) - Cleanup unused attributes in optimizer (#26464)
- Use
Expr::Displayas catch all for IR - DSL asymmetry (#26471) - Remove the
POLARS_IDEAL_MORSEL_SIZEmonkeypatching in the parametric merge-join test (#26418) - Move IO metrics struct to
polars-ioand use new timer (#26397) - Reduce blocking on computational executor threads in multiscan init (#26407)
- Cleanup the parametric merge-join test (#26413)
- Ensure local doctests skip
from_torchif module not installed (#26405) - Implement various deprecations (#26314)
- Refactor
MinByandMaxByasIRFunctions(#26307) - Rename
Operator::DividetoRustDivide(#26339) - Properly disable the Pyodide tests (#26382)
- Add LiveTimer (#26384)
- Use derived serialization on
PlRefPath(#26167) - Add metadata to
ArrowSchemastruct (#26318) - Remove unused field (#26367)
- Fix runtime nesting (#26359)
- Remove
xlsx2csvdependency pin (#26355) - Allow unchecked IPC reads (#26354)
- Use outer runtime if exists in to_alp (#26353)
- Make CategoricalMapping::new pub(crate) to avoid misuse (#26308)
- Clarify IPC buffer read limit/length paramter (#26334)
- Improve accuracy of active IO time metric (#26315)
- Mark
VarStateasrepr(C)(#26309) - IO metrics for streaming Parquet / IPC sources (#26300)
- Replace panicking index access with error handling in
dictionaries_to_encode(#26059) - Remove unnecessary match and move early return in testing (#26297)
- Add dtype test coverage for delta predicate filter (#26291)
- Add property-based tests for
Scalar::cast_with_options(#25744) - Add AI policy (#26286)
- Remove MemSlice (#26259)
- Remove recursion from
upsample_impl(#26250) - Remove all non CSV fast-count paths (#26233)
- Replace MemReader with Cursor (#26216)
- Add
serde(default)to new CSV compression fields (#26210) - Add a couple of
SAFETYcomments in merge-join node (#26197) - Expose physical plan
NodeStyle(#26184) - Ensure optimization flag modification happens local (#26185)
- Use NullChunked as default for Series (#26181)
- In merge-sorted node, when buffering, request a stop on *every* unbuffered morsel (#26178)
- Rename
io_sinks2 -> io_sinks(#26159) - Lint leftover fixme (#26122)
- Move Buffer and SharedStorage to polars-buffer crate (#26113)
- Remove old sink IR (#26130)
- Use derived serialization on
PlRefPath(#26062) - Improve backtrace for
POLARS_PANIC_ON_ERR(#26125) - Fix Python docs build (#26117)
- Remove old streaming sink implementation (#26102)
- Disable
unused-ignoremypy lint (#26110) - Remove unused equality impls for IR / FunctionIR (#26106)
- Ignore mypy warning (#26105)
- Preserve order for string concatenation (#26101)
- Raise error on
file://hostname/path(#26061) - Disable debug info for docs workflow (#26086)
- Remove IR / physical plan visualization data generators (#26090)
- Update docs for next polars cloud release (#26091)
- Support Python 3.14 in dev environment (#26073)
- Mark top slow normal tests as slow (#26080)
- Simplify
PlPath(#26053) - Update breaking deps (#26055)
- Fix for upstream url bug and update deps (#26052)
- Properly pin chrono (#26051)
- Don't run rust doctests (#26046)
- Update deps (#26042)
- Ignore very slow test (#26041)
- Add Send bound for SharedStorage owner (#26040)
- Update rust compiler (#26017)
- Improve csv test coverage (#25980)
- Use
from_any_values_and_dtypeinSeries::extend_constant(#26006) - Pass
sync_on_closeandnum_pipelinesviastart_file_writerfor IO sinks (#25950) - Add
broadcast_nullsfield toRowEncodingVariantand_get_rows_encoded_{ca,arr}(#26001) - Ramp up CSV read size (#25997)
- Rename
FileTypetoFileWriteFormat(#25951) - Don't unwrap first sink morsel send (#25981)
- Update
ruffaction and simplify version handling (#25940) - Cleanup Rust DataFrame interface (#25976)
- Export PhysNode related struct (#25987)
- Restructure Sort variant in logical and physical plans visualization data (#25978)
- Run python lint target as part of pre-commit (#25982)
- Allow multiple inputs to streaming GroupBy node (#25961)
- Disable HTTP timeout for receiving response body (#25970)
- Add AI contribution policy (#25956)
- Remove unused sink code (#25949)
- Add detailed Sink info to IRNodeProperties (#25954)
- Wrap
FileScanIR::Csvenum variant inArc(#25952) - Use PlSmallStr for CSV format strings (#25901)
- Add unsafe bound to MemSlice::from_arc (#25920)
- Improve MemSlice Debug impl (#25913)
- Remove manual cmp impls for
&[u8](#25890) - Remove and deprecate batched csv reader (#25884)
- Remove unused AnonymousScan functions (#25872)
- Use Buffer<T> instead of Arc<[T]> to store stringview buffers (#25870)
- Add
TakeableRowsProviderfor IO sinks (#25858) - Filter DeprecationWarning from pyparsing indirectly through pyiceberg (#25854)
- Various small improvements (#25835)
- Clear venv with appropriate version of Python (#25851)
- Move CSV write logic to
CsvSerializer(#25828) - Ensure Polars Object extension type is registered (#25813)
- Harden Python object process ID (#25812)
- Skip schema inference if schema provided for
scan_csv/ndjson(#25757) - Ensure proper async connection cleanup on DB test exit (#25766)
- Flip
has_residual_predicate->no_residual_predicate(#25755) - Track original length before file filtering in scan IR (#25717)
- Ensure we uninstall other Polars runtimes in CI (#25739)
- Make 'make requirements' more robust (#25693)
- Remove duplicate compression level types (#25723)
- Replace async blocks with named components in new parquet write pipeline (#25695)
- Move Object
litfix earlier in the function (#25713) - Remove unused decimal file (#25701)
- Move
boto3extra from s3fs in dev requirements (#25667) - Upgrade to latest version of
sqlparser-rs(#25673) - Update slab to version without RUSTSEC (#25686)
- Fix typo (#25684)
- Avoid rechunk requirement for Series.iter() (#25603)
- Use dtype for group_aware evaluation on
ApplyExpr(#25639) - Make polars-plan constants more consistent (#25645)
- Add "panic" and "streaming" tagging to
issue-labelerworkflow (#25657) - Add support for multi-column reductions (#25640)
- Fix rolling kernel dispatch with
monotonicgroup attribute (#25494) - Simplify _write_any_value (#25622)
- Ensure we hash all attributes and visit all children in
traverse_and_hash_aexpr(#25627) - Ensure literal-only
SELECTbroadcast conforms to SQL semantics (#25633) - Add parquet file write pipeline for new IO sinks (#25618)
- Rename polars-on-premise to polars-on-premises (#25617)
- Constrain new
issue-labelerworkflow to the Issue title (#25614) - Add streaming IO sink components (#25594)
- Help categorise Issues by automatically applying labels (using the same patterns used for labelling PRs) (#25599)
- Show on streaming engine (#25589)
- Add
arg_sort()andWriteable::as_buffered()(#25583) - Take task priority argument in
parallelize_first_to_local(#25563) - Skip existing files in pypi upload (#25576)
- Fix template path in release-python workflow (#25565)
- Skip rust integration tests for coverage in CI (#25558)
- Add asserts and tests for
list.evalon multiple chunks with slicing (#25559) - Rename
URL_ENCODE_CHARSETtoHIVE_ENCODE_CHARSET(#25554) - Add
assert_sql_matchescoverage for SQLDISTINCTandDISTINCT ONsyntax (#25440) - Use strong hash instead of traversal for CSPE equality (#25537)
- Update partitioned sink IR (#25524)
- Print expected DSL schema hashes if mismatched (#25526)
- Remove verbose prints on file opens (#25523)
- Add
proptestAnyValuestrategies (#25510) - Fix --uv argument for benchmark-remote (#25513)
- Add
proptestDataFramestrategy (#25446) - Run
maturinwith--uvoption (#25490) - Remove some dead argminmax impl code (#25501)
- Fix feature gating TZ_AWARE_RE again (#25493)
- Take
syncparameter inWriteable::close()(#25475) - Fix unsoundness in ChunkedArray::{first, last} (#25449)
- Add some cleanup (#25445)
- Test for
group_by(...).having(...)(#25430) - Accept multiple files in
pipe_with_schema(#25388) - Remove aggregation context
Context(#25424) - Take
&dyn Anyinstead ofBox<dyn Any>in python object converters (#25421) - Refactor sink IR (#25308)
- Remove
ClosableFile(#25330) - Remove debug file write from test suite (#25393)
- Add
ElementExprfor_evalexpressions (#25199) - Dispatch
Series.settozip_with_same_dtype(#25327) - Better coverage for
group_byaggregations (#25290) - Add oneshot channel to polars-stream (#25378)
- Enable more streaming tests (#25364)
- Remove
Column::Partitioned(#25324) - Remove incorrect cast in reduce code (#25321)
- Add toolchain file to runtimes for sdist (#25311)
- Remove
PyPartitioning(#25303) - Directly take
CloudSchemeinparse_cloud_options()(#25304) - Refactor
dt_rangefunctions (#25225) - Fix typo in CI release workflow (#25309)
- Use dedicated runtime packages from template (#25284)
- Add
propteststrategies for Series nested types (#25220) - Simplify sink parameter passing from Python (#25302)
- Add test for unique with column subset (#25241)
- Fix Decimal precision annotation (#25227)
- Add
LazyFrame.pivot(#25016) - Clean up CSPE callsite (#25215)
- Avoid relabelling changes-dsl on every commit (#25216)
- Move ewm variance code to polars-compute (#25188)
- Upgrade to schemars 0.9.0 (#25158)
- Update markdown link checker (#25201)
- Automatically label pull requests that change the DSL (#25177)
- Add reliable test for
pl.formaton multiple chunks (#25164) - Move supertype determination and casting to IR for
date_rangeand related functions (#24084) - Make python docs build again (#25165)
- Make
pipe_with_schemawork on Arced schema (#25155) - Add functions for
scan_lines(#25136) - Remove lower_ir conversion from Scan to InMemorySource (#25150)
- Update versions (#25141)
Thank you to all our contributors for making this release possible!
@AndreaBozzo, @Atarust, @DannyStoll1, @EndPositive, @JakubValtar, @Jesse-Bakker, @Kevin-Patyk, @LeeviLindgren, @MarcoGorelli, @Matt711, @MrAttoAttoAtto, @TNieuwdorp, @Voultapher, @WaffleLapkin, @agossard, @alex-gregory-ds, @alexander-beedie, @anosrepenilno, @arlyon, @azimafroozeh, @bayoumi17m, @borchero, @c-peters, @cBournhonesque, @camriddell, @carnarez, @cmdlineluser, @coastalwhite, @cr7pt0gr4ph7, @davanstrien, @davidia, @dependabot[bot], @dsprenkels, @edizeqiri, @eitanf, @etiennebacher, @feliblo, @gab23r, @guilhem-dvr, @hallmason17, @hamdanal, @henryharbeck, @hutch3232, @ion-elgreco, @itamarst, @jamesfricker, @jannickj, @jetuk, @jqnatividad, @kdn36, @lun3x, @m1guelperez, @marinegor, @mcrumiller, @nameexhaustion, @orlp, @pomo-mondreganto, @qxzcode, @r-brink, @ritchie46, @sachinn854, @stijnherfst, @sweb, @tlauli, @vyasr, @wtn, @yonikremer and dependabot[bot]