github pola-rs/polars py-1.38.0
Python Polars 1.38.0

12 hours ago

⚠️ Deprecations

  • Deprecate retries=n in favor of storage_options={"max_retries": n} (#26155)

🚀 Performance improvements

  • Enable zero-copy object_store put upload for IPC sink (#26288)
  • Resolve file schema's and metadata concurrently (#26325)
  • Run elementwise CSEE for the streaming engine (#26278)
  • Disable morsel splitting for fast-count on streaming engine (#26245)
  • Implement streaming decompression for scan_ndjson and scan_lines (#26200)
  • Improve string slicing performance (#26206)
  • Refactor scan_delta to use python dataset interface (#26190)
  • Add dedicated kernel for group-by arg_max/arg_min (#26093)
  • Add streaming merge-join (#25964)
  • Generalize Bitmap::new_zeroed opt for Buffer::zeroed (#26142)
  • Reduce fs stat calls in path expansion (#26173)
  • Lower streaming group_by n_unique to unique().len() (#26109)

✨ Enhancements

  • Avoid OOM for scan_ndjson and scan_lines if input is compressed and negative slice (#26396)
  • Support annoymous agg in-mem (#26376)
  • Add unstable arrow_schema parameter to sink_parquet (#26323)
  • Improve error message formatting for structs (#26349)
  • Remove parquet field overwrites (#26236)
  • Enable zero-copy object_store put upload for IPC sink (#26288)
  • Improved disambiguation for qualified wildcard columns in SQL projections (#26301)
  • Expose upload_concurrency through env var (#26263)
  • Allow quantile to compute multiple quantiles at once (#25516)
  • Allow empty LazyFrame in LazyFrame.group_by(...).map_groups (#26275)
  • Use delta file statistics for batch predicate pushdown (#26242)
  • Add streaming UnorderedUnion (#26240)
  • Implement compression support for sink_ndjson (#26212)
  • Add unstable record batch statistics flags to {sink/scan}_ipc (#26254)
  • Support CSE for python UDFs on the same address (#26253)
  • Cloud retry/backoff configuration via storage_options (#26204)
  • Use same sort order for expanded paths across local / cloud / directory / glob (#26191)
  • Add streaming merge-join (#25964)
  • Serialize optimization flags for cloud plan (#26168)
  • Add compression support to write_csv and sink_csv (#26111)
  • Add scan_lines (#26112)
  • Support regex in str.split (#26060)
  • Add unstable IPC Statistics read/write to scan_ipc/sink_ipc (#26079)
  • Add unstable height parameter to DataFrame/LazyFrame (#26014)
  • Remove old partition sink API (#26100)
  • Expose ArrowStreamExportable on python collect batches iterator (#26074)
  • Add nulls support for all rolling_by operations (#26081)

🐞 Bug fixes

  • Correct off-by-one in RLE row counting for nullable dictionary-encoded columns (#26411)
  • Support very large integers in env var limits (#26399)
  • Fix PlPath panic from incorrect slicing of UTF8 boundaries (#26389)
  • Fix Float dtype for spearman correlation (#26392)
  • Fix optimizer panic in right joins with type coercion (#26365)
  • Don't serialize retry config from local environment vars (#26289)
  • Fix PartitionBy with scalar key expressions and diff() (#26370)
  • Add {Float16, Float32} -> Float32 lossless upcast (#26373)
  • Fix panic using with_columns and collect_all (#26366)
  • Add multi-page support for writing dictionary-encoded Parquet columns (#26360)
  • Ensure slice advancement when skipping non-inlinable values in is_in with inlinable needles (#26361)
  • Pin xlsx2csv version temporarily (#26352)
  • Bugs in ViewArray total_bytes_len (#26328)
  • Overflow in i128::abs in Decimal fits check (#26341)
  • Make Expr.hash on Categorical mapping-independent (#26340)
  • Clone shared GroupBy node before mutation in physical plan creation (#26327)
  • Fixed "sheet_name" typing for read_ods and read_excel (#26317)
  • Improve Polars dtype inference from Python Union typing (#26303)
  • Consider the "current location" of an item when computing rolling_rank_by (#26287)
  • Reset is_count_star flag between queries in collect_all (#26256)
  • Fix incorrect is_between filter on scan_parquet (#26284)
  • Make polars compatible with ty (#26270)
  • Lower AnonymousStreamingAgg in group-by as aggregate (#26258)
  • Avoid overflow in pl.duration scalar arguments case (#26213)
  • Broadcast arr.get on single array with multiple indices (#26219)
  • Fix panic on CSPE with sorts (#26231)
  • Eager DataFrame.slice with negative offset and length=None (#26215)
  • Use correct schema side for streaming merge join lowering (#26218)
  • Overflow panic in scan_csv with multiple files and skip_rows + n_rows larger than total row count (#26128)
  • Respect allow_object flag after cache (#26196)
  • Raise error on non-elementwise PartitionBy keys (#26194)
  • Allow ordered categorical dictionary in scan_parquet (#26180)
  • Allow excess bytes on IPC bitmap compressed length (#26176)
  • Address a macOS-specific compile issue (#26172)
  • Fix deadlock on hash_rows() of 0-width DataFrame (#26154)
  • Fix NameError filtering pyarrow dataset (#26166)
  • Fix concat_arr panic when using categoricals/enums (#26146)
  • Fix NDJSON/scan_lines negative slice splitting with extremely long lines (#26132)
  • Incorrect group_by min/max fast path (#26139)
  • Remove a source of non-determinism from lowering (#26137)
  • Error when with_row_index or unpivot create duplicate columns on a LazyFrame (#26107)
  • Panics on shift with head (#26099)

📖 Documentation

  • Fix Expr.get referencing incorrect dtype for index parameter (#26364)
  • Fix Expr.quantile formatting (#26351)
  • Drop sphinx-llms-txt extension (#26285)
  • Remove deprecated cublet_id (#26260)
  • Update for new release (#26255)
  • Update MCP server section with new URL (#26241)
  • Fix unmatched paren and punctuation in pandas migration guide (#26251)
  • Add observatory database_path to docs (#26201)
  • Note plugins in Python user-defined functions (#26138)

📦 Build system

  • Address remaining Python 3.14 issues with make requirements-all (#26195)
  • Address a macOS-specific compile issue (#26172)

🛠️ Other improvements

  • Ensure local doctests skip from_torch if module not installed (#26405)
  • Change linked timezones in test suite to canonical timezones (#26310)
  • Implement various deprecations (#26314)
  • Rename Operator::Divide to RustDivide (#26339)
  • Properly disable the Pyodide tests (#26382)
  • Remove unused field (#26367)
  • Fix runtime nesting (#26359)
  • Remove xlsx2csv dependency pin (#26355)
  • Use outer runtime if exists in to_alp (#26353)
  • Make CategoricalMapping::new pub(crate) to avoid misuse (#26308)
  • Clarify IPC buffer read limit/length paramter (#26334)
  • Add dtype test coverage for delta predicate filter (#26291)
  • Add AI policy (#26286)
  • Unpin "pandas<3" in dev dependencies (#26249)
  • Remove all non CSV fast-count paths (#26233)
  • Pin pandas to 2.x for now (#26221)
  • Remove unnecessary xfail (#26199)
  • Ensure optimization flag modification happens local (#26185)
  • Simplify IcebergDataset (#26165)
  • Reorganize unit tests into logical subdirectories (#26149)
  • Lint leftover fixme (#26122)
  • Improve backtrace for POLARS_PANIC_ON_ERR (#26125)
  • Fix Python docs build (#26117)
  • Disable unused-ignore mypy lint (#26110)
  • Ignore mypy warning (#26105)
  • Raise error on file://hostname/path (#26061)
  • Disable debug info for docs workflow (#26086)
  • Update docs for next polars cloud release (#26091)
  • Support Python 3.14 in dev environment (#26073)

Thank you to all our contributors for making this release possible!
@Atarust, @EndPositive, @Kevin-Patyk, @LeeviLindgren, @MarcoGorelli, @Matt711, @MrAttoAttoAtto, @Voultapher, @WaffleLapkin, @agossard, @alex-gregory-ds, @alexander-beedie, @azimafroozeh, @bayoumi17m, @c-peters, @carnarez, @dependabot[bot], @dsprenkels, @hallmason17, @hamdanal, @ion-elgreco, @kdn36, @lun3x, @mcrumiller, @nameexhaustion, @orlp, @qxzcode, @r-brink, @ritchie46, @sweb and dependabot[bot]

Don't miss a new polars release

NewReleases is sending notifications on new releases.