pola-rs/polars py-1.38.0 on GitHub

⚠️ Deprecations

Deprecate retries=n in favor of storage_options={"max_retries": n} (#26155)

🚀 Performance improvements

Enable zero-copy object_store put upload for IPC sink (#26288)
Resolve file schema's and metadata concurrently (#26325)
Run elementwise CSEE for the streaming engine (#26278)
Disable morsel splitting for fast-count on streaming engine (#26245)
Implement streaming decompression for scan_ndjson and scan_lines (#26200)
Improve string slicing performance (#26206)
Refactor scan_delta to use python dataset interface (#26190)
Add dedicated kernel for group-by arg_max/arg_min (#26093)
Add streaming merge-join (#25964)
Generalize Bitmap::new_zeroed opt for Buffer::zeroed (#26142)
Reduce fs stat calls in path expansion (#26173)
Lower streaming group_by n_unique to unique().len() (#26109)

✨ Enhancements

Avoid OOM for scan_ndjson and scan_lines if input is compressed and negative slice (#26396)
Support annoymous agg in-mem (#26376)
Add unstable arrow_schema parameter to sink_parquet (#26323)
Improve error message formatting for structs (#26349)
Remove parquet field overwrites (#26236)
Enable zero-copy object_store put upload for IPC sink (#26288)
Improved disambiguation for qualified wildcard columns in SQL projections (#26301)
Expose upload_concurrency through env var (#26263)
Allow quantile to compute multiple quantiles at once (#25516)
Allow empty LazyFrame in LazyFrame.group_by(...).map_groups (#26275)
Use delta file statistics for batch predicate pushdown (#26242)
Add streaming UnorderedUnion (#26240)
Implement compression support for sink_ndjson (#26212)
Add unstable record batch statistics flags to {sink/scan}_ipc (#26254)
Support CSE for python UDFs on the same address (#26253)
Cloud retry/backoff configuration via storage_options (#26204)
Use same sort order for expanded paths across local / cloud / directory / glob (#26191)
Add streaming merge-join (#25964)
Serialize optimization flags for cloud plan (#26168)
Add compression support to write_csv and sink_csv (#26111)
Add scan_lines (#26112)
Support regex in str.split (#26060)
Add unstable IPC Statistics read/write to scan_ipc/sink_ipc (#26079)
Add unstable height parameter to DataFrame/LazyFrame (#26014)
Remove old partition sink API (#26100)
Expose ArrowStreamExportable on python collect batches iterator (#26074)
Add nulls support for all rolling_by operations (#26081)

🐞 Bug fixes

Correct off-by-one in RLE row counting for nullable dictionary-encoded columns (#26411)
Support very large integers in env var limits (#26399)
Fix PlPath panic from incorrect slicing of UTF8 boundaries (#26389)
Fix Float dtype for spearman correlation (#26392)
Fix optimizer panic in right joins with type coercion (#26365)
Don't serialize retry config from local environment vars (#26289)
Fix PartitionBy with scalar key expressions and diff() (#26370)
Add {Float16, Float32} -> Float32 lossless upcast (#26373)
Fix panic using with_columns and collect_all (#26366)
Add multi-page support for writing dictionary-encoded Parquet columns (#26360)
Ensure slice advancement when skipping non-inlinable values in is_in with inlinable needles (#26361)
Pin xlsx2csv version temporarily (#26352)
Bugs in ViewArray total_bytes_len (#26328)
Overflow in i128::abs in Decimal fits check (#26341)
Make Expr.hash on Categorical mapping-independent (#26340)
Clone shared GroupBy node before mutation in physical plan creation (#26327)
Fixed "sheet_name" typing for read_ods and read_excel (#26317)
Improve Polars dtype inference from Python Union typing (#26303)
Consider the "current location" of an item when computing rolling_rank_by (#26287)
Reset is_count_star flag between queries in collect_all (#26256)
Fix incorrect is_between filter on scan_parquet (#26284)
Make polars compatible with ty (#26270)
Lower AnonymousStreamingAgg in group-by as aggregate (#26258)
Avoid overflow in pl.duration scalar arguments case (#26213)
Broadcast arr.get on single array with multiple indices (#26219)
Fix panic on CSPE with sorts (#26231)
Eager DataFrame.slice with negative offset and length=None (#26215)
Use correct schema side for streaming merge join lowering (#26218)
Overflow panic in scan_csv with multiple files and skip_rows + n_rows larger than total row count (#26128)
Respect allow_object flag after cache (#26196)
Raise error on non-elementwise PartitionBy keys (#26194)
Allow ordered categorical dictionary in scan_parquet (#26180)
Allow excess bytes on IPC bitmap compressed length (#26176)
Address a macOS-specific compile issue (#26172)
Fix deadlock on hash_rows() of 0-width DataFrame (#26154)
Fix NameError filtering pyarrow dataset (#26166)
Fix concat_arr panic when using categoricals/enums (#26146)
Fix NDJSON/scan_lines negative slice splitting with extremely long lines (#26132)
Incorrect group_by min/max fast path (#26139)
Remove a source of non-determinism from lowering (#26137)
Error when with_row_index or unpivot create duplicate columns on a LazyFrame (#26107)
Panics on shift with head (#26099)

📖 Documentation

Fix Expr.get referencing incorrect dtype for index parameter (#26364)
Fix Expr.quantile formatting (#26351)
Drop sphinx-llms-txt extension (#26285)
Remove deprecated cublet_id (#26260)
Update for new release (#26255)
Update MCP server section with new URL (#26241)
Fix unmatched paren and punctuation in pandas migration guide (#26251)
Add observatory database_path to docs (#26201)
Note plugins in Python user-defined functions (#26138)

📦 Build system

Address remaining Python 3.14 issues with make requirements-all (#26195)
Address a macOS-specific compile issue (#26172)

🛠️ Other improvements

Ensure local doctests skip from_torch if module not installed (#26405)
Change linked timezones in test suite to canonical timezones (#26310)
Implement various deprecations (#26314)
Rename Operator::Divide to RustDivide (#26339)
Properly disable the Pyodide tests (#26382)
Remove unused field (#26367)
Fix runtime nesting (#26359)
Remove xlsx2csv dependency pin (#26355)
Use outer runtime if exists in to_alp (#26353)
Make CategoricalMapping::new pub(crate) to avoid misuse (#26308)
Clarify IPC buffer read limit/length paramter (#26334)
Add dtype test coverage for delta predicate filter (#26291)
Add AI policy (#26286)
Unpin "pandas<3" in dev dependencies (#26249)
Remove all non CSV fast-count paths (#26233)
Pin pandas to 2.x for now (#26221)
Remove unnecessary xfail (#26199)
Ensure optimization flag modification happens local (#26185)
Simplify IcebergDataset (#26165)
Reorganize unit tests into logical subdirectories (#26149)
Lint leftover fixme (#26122)
Improve backtrace for POLARS_PANIC_ON_ERR (#26125)
Fix Python docs build (#26117)
Disable unused-ignore mypy lint (#26110)
Ignore mypy warning (#26105)
Raise error on file://hostname/path (#26061)
Disable debug info for docs workflow (#26086)
Update docs for next polars cloud release (#26091)
Support Python 3.14 in dev environment (#26073)

Thank you to all our contributors for making this release possible!
@Atarust, @EndPositive, @Kevin-Patyk, @LeeviLindgren, @MarcoGorelli, @Matt711, @MrAttoAttoAtto, @Voultapher, @WaffleLapkin, @agossard, @alex-gregory-ds, @alexander-beedie, @azimafroozeh, @bayoumi17m, @c-peters, @carnarez, @dependabot[bot], @dsprenkels, @hallmason17, @hamdanal, @ion-elgreco, @kdn36, @lun3x, @mcrumiller, @nameexhaustion, @orlp, @qxzcode, @r-brink, @ritchie46, @sweb and dependabot[bot]

pola-rs/polars py-1.38.0 Python Polars 1.38.0 on GitHub

⚠️ Deprecations

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

pola-rs/polars py-1.38.0
Python Polars 1.38.0

on GitHub