github pola-rs/polars py-1.39.0
Python Polars 1.39.0

8 hours ago

🚀 Performance improvements

  • Lower arg_{min,max} to streaming engine (#26845)
  • Additional IR slice pushdown after filter pushdown (#26815)
  • Streaming first/last on Enum through physical (#26783)
  • Fast filter for scalar predicates (#26745)
  • Allow SimpleProjection in streaming engine to rename (#26709)
  • Streaming cloud download for scan_csv (#26637)
  • Drop columns only needed for predicates after the predicate is applied (#26703)
  • Run projection pushdown after predicate pushdown (#26688)
  • Comparison literal downcasting (#26663)
  • Add dynamic predicates for TopK (#26495)
  • Increase minimum default parquet row group prefetch to 8 (#26632)
  • Partial predicate conversion to PyArrow (#26567)
  • Streaming cloud download for scan_ndjson / scan_lines (#26563)
  • Grab GIL fewer times during Object join materialization (#26587)
  • Improve CSV and NDJSON cloud sink performance (#26545)
  • Tune cloud writer performance (#26518)
  • Allow parallel InMemorySinks in streaming engine (#26501)
  • Add streaming AsOf join node (#26398)
  • Don't always rechunk on gather of nested types (#26478)

✨ Enhancements

  • Support Expr for holidays in business day calculations (#26193)
  • Parameter for pivot to always include value column name (#26730)
  • Raise error in .collect_schema() when arr.get() is out-of-bounds (#26866)
  • Extend Expr.reinterpret to all numeric types of the same size (#26401)
  • Add missing_columns parameter to scan_csv (#26787)
  • Clear no-op scan projections (#26858)
  • Support nested datatypes for {min,max}_by (#26849)
  • Support SQL ARRAY init from typed literals (#26622)
  • Accept table identifier string in scan_iceberg() (#26826)
  • Add a convenience make fresh command to the Makefile (#26809)
  • Expose "use_zip64" Workbook option for write_excel (#26699)
  • Add unstable LazyFrame.sink_iceberg (#26799)
  • Add maintain order argument on implode (#26782)
  • Speed up casting primitive to bool by at least 2x (#26823)
  • Support ASCII format table input to pl.from_repr (#26806)
  • Enable rowgroup skipping for float columns (#26805)
  • Add expression context to errors (#26716)
  • Add Decimal support for product reduction (#26725)
  • Support all Iceberg V2 arrow types in sink_parquet arrow_schema parameter (#26669)
  • Re-work behavior of arrow_schema parameter on sink_parquet (#26621)
  • Add contains_dtype() method for Schema (#26661)
  • Implement truncate as a "to_zero" rounding mode (#26677)
  • More generic streaming GroupBy lowering (#26696)
  • Create an Alignment TypeAlias (#26668)
  • Add basic MemoryManager to track buffered dataframes for out-of-core support later (#26443)
  • Add truncate Expression for numeric values (#26666)
  • Better error messages for hex literal conversion issues in the SQL interface (#26657)
  • Add SQL support for LPAD and RPAD string functions (#26631)
  • Support SQL "FROM-first" SELECT query syntax (#26598)
  • Improve base_type typing (#26602)
  • Bump Chrono to 0.4.24, enabling stricter parsing of %.3f/%.6f/%.9f specifiers (#26075)
  • Expose unstable assert_schema_equal in py-polars (#24869)
  • Allow parsing of compact ISO 8601 strings (#24629)
  • Add optional "label" param to DataFrame corr (#26588)
  • Streaming cloud download for scan_ndjson / scan_lines (#26563)
  • Configuration to cast integers to floats in cast_options for scan_parquet (#26492)
  • Add escaping to quotes and newlines when reading JSON object into string (#26578)
  • Standardise on RFC-5545 when doing datetime arithmetic on timezone-aware datetimes (#26425)
  • Support sas_token in Azure credential provider (#26565)
  • Relax SQL requirement for derived tables and subqueries to have aliases (#26543)
  • Add polars-config and pl.Config.reload_env_vars() (#26524)
  • Record path for object store error raised from sinks (#26541)
  • Use CRC64NVME for checksum in aws sinks (#26522)
  • Add get() for binary Series (#26514)
  • Add streaming AsOf join node (#26398)
  • Add primitive filter -> agg lowering in streaming GroupBy (#26459)
  • Support for the SQL FETCH clause (#26449)

🐞 Bug fixes

  • Prevent Boolean arithmetic with integer literals producing Unknown type in streaming engine (#26878)
  • Fix sink to partitioned S3 from Windows corrupted slashes (#26889)
  • Remove outdated warning about List columns in unique() (#26295) (#26890)
  • Restore pyarrow predicate conversion for is_in (#26811)
  • Release GIL before df.to_ndarray() to avoid deadlock (#26832)
  • Fix panic on CSV count_rows with FORCE_ASYNC (#26883)
  • Add scalar comparisons for UInt128 series (#26886)
  • Fix shape error not raised for 0 width inputs with non-0 height for streaming horizontal concat (#26877)
  • Fix streaming zip-broadcast node did not raise shape mismatch on empty recv from ready port (#26871)
  • Fix incorrect output list.eval with scalar expr, fix panic on list.agg with nulls (#26868)
  • Allow list argument in group_by().map_groups() (#26707)
  • Support for ADBC drivers instantiated with dbc in DataFrame.write_database (#26157)
  • Incorrect arg_sort with descending+limit (#26839)
  • Raise error in .collect_schema() when arr.get() is out-of-bounds (#26866)
  • Return ComputeError instead of panicking in map_groups UDF (#26665)
  • Issue PerformanceWarning in LazyFrame.__contains__ (#26734)
  • Correct type hint for map_columns function parameter (#26487)
  • Apply thousands_separator to count/null_count in describe() for non-numeric columns (#26486)
  • Ensure proper handling of timedelta when multiplying with a Series (#26830)
  • Correct type hint for function parameter in DataFrame.map_columns (#26372)
  • Segfault in JoinExec on deep plan (#26796)
  • Fix unary expressions on literal in over context (#26827)
  • Fix {min,max}_by in streaming engine for Boolean full {min,max} value column (#26848)
  • Fix debug panic on clip with nan bound (#26854)
  • Support grouped {arg_,}_{min,max} for Categoricals (#26856)
  • Throw an error if a string is passed to LazyFrame.pivot on_columns (#26852)
  • Preserve input float precision in rolling_cov() and rolling_corr() with mixed input types (#26820)
  • Preserve row count when converting zero-column DataFrame via arrow PyCapsule interface (#26835)
  • Prevent infinite recursion in streaming group_by fallback (#26801)
  • Use RowEncodingContext::Struct when determining D::Struct encoded item len (#26817)
  • Incorrectly applied CSE on different map_batches functions (#26822)
  • Fix duplicated query execution on todo panic when combining collect(engine='streaming') with POLARS_AUTO_NEW_STREAMING (#26792)
  • Prevent predicate pushdown across Sort with baked-in slice (#26804)
  • Restore compatibility with pd.Timedelta (#26785)
  • Fix panic on lazy sink_parquet created in pipe_with_schema (#26784)
  • Support {column_name} and {index} placeholders in pl.format string (#26771)
  • Do not use merge-join if nulls_last is unknown (#26778)
  • Normalize float zeros in Parquet column statistics (#26776)
  • Fix out-of-bounds for positive offset in windowed rolling (#26724)
  • Raise error when .get() is out-of-bounds in group by context (#26752)
  • Boolean bitwise_xor aggregation inverted when column contains nulls (#26749)
  • Parameter nulls_last was ignored in over (#26718)
  • Allow missing time in inexact strptime (#26714)
  • Respect nulls_last in sort_by within group_by().agg() slow path (#26681)
  • Return NaN when using corr() with a literal and expr (#26697)
  • Allow strict horizontal concat with empty df (#26345)
  • Fix PoisonError panic caused by reentrant usage of file cache (#26627)
  • Return null for int values exceeding 128-bit range with strict=False (#26674)
  • Incorrect boolean min/max with nulls (#26671)
  • Slice-slice pushdown for n_rows (#26673)
  • Resolve panic in Enum struct slicing (#26643)
  • Fix CSPE for group_by.map_groups (#26640)
  • Remove non-existent parameter from SQLContext typing overloads (#26658)
  • Address pl.from_epoch losing fractional seconds (#26419)
  • Fix to_pandas() on empty enum Series did not preserve enum dictionary (#26610)
  • Rounding behaviour for f32 values with "HalfAwayFromZero" mode (#26624)
  • Updated Sum Type Hint (#26629)
  • Don't allow namespace registration to override standard methods or properties (#26450)
  • Correct arg_(min|max) for scalar columns (#26609)
  • Use monkeypatch.chdir in test_sink_path_slicing_utf8_boundaries_26324 (#26616)
  • Respect SQL semantics for cumulative functions mapped via OVER clause (#26570)
  • Fix incorrect multiplexer output ordering on source token stop request (#26561)
  • Fix PyIceberg filter on boolean column (#26550)
  • Set dictionary_page_offset when dictionary encoding is used and point data_page_offset to the first data page (#26542)
  • Move query parameters to request body when retrieving Unity Catalog temporary credentials (#26539)
  • Ensure read_csv_batched() prints deprecation warning (#26530)
  • Implement PhysicalExpr for MinBy/MaxBy nodes (#26506)
  • Refactor row-encoding logic in IR join lowering into separate function (#26512)
  • Correctly check for path extensions (#26513)
  • Change AsOf join to be based on TotalOrd (#26497)
  • Correctly raise error on failing nested strict casts (#26499)
  • Prevent invalid type casts in replace_strict() (#26453)
  • Return null when dividing literals by 0 (#26343)
  • Fix type-hint for Series.quantile (#26422)

📖 Documentation

  • Mention ComputeContexts create ephemeral environments by default and hint at re-use (#26692)
  • Remove confusing join validation note (#26795)
  • Fix formatting in categorical documentation (#26746)
  • Fix broken AI policy link (#26728)
  • Create Polars Cloud Glossary (#26690)
  • Additional SQL documentation (#26662)
  • Include invalidate_caches in bisect instructions (#26641)
  • Add git bisect guide to contributing docs (#26634)
  • Fix Polars Cloud examples (formatting & type hints) (#26625)
  • Updated Airflow orchestration documentation (#26585)
  • Improve SQL docs for EXTRACT and DATE_PART functions (#26575)
  • Fix docstring for bitwise_count_zeros method (#26519)
  • Add get() for binary Series (#26514)

🛠️ Other improvements

  • Use large linux-arm runner for release (#26898)
  • Ensure .gitignore and .typos.toml exclude "_polars_runtime*" directories (#26842)
  • Additional IR slice pushdown after filter pushdown (#26815)
  • Add private _expand_paths scan function (#26798)
  • Change Expr sortedness container to AExprSorted and add nulls_last to PyExpr.set_sorted() (#26781)
  • Move stop_and_buffer_pipe_contents into joins/utils.rs (#26810)
  • Replace iejoin is_supported_type macro with a closure in predicate_pushdown/join.rs (#26812)
  • Fix first-time contributor auto-label (#26794)
  • Automatically add first-contribution label (#26780)
  • Add tests for functions that operate on pl.all() expansion (#26773)
  • Make contributing policy more strict (#26772)
  • Add unused argument warning to ruff rules (#26720)
  • Move shared streaming CSV/NDJSON code into shared mod (#26742)
  • Undo pub removal of to_dyn_object_store (#26722)
  • Mark {read, scan}_ndjson cache argument(s) as deprecated (#26711)
  • Add test for predicate before join (#26705)
  • Remove PlanCallback from sql (#26686)
  • Bump Rust nightly compiler version (#26379)
  • Remove unused problematic ArrayFromIter (#26639)
  • Move more boolean code to polars_compute, reusing kernels (#26636)
  • Avoid implicit import from importlib (#26603)
  • Cleanup assert_schema_equal (#26596)
  • Replace some env var reading by polars-config (#26607)
  • Use monkeypatch.chdir in test_sink_path_slicing_utf8_boundaries_26324 (#26616)
  • Add __init__.py files and docstrings to testing directories (#26408)
  • Add wrapper for clippy so it continues on warnings (#26527)
  • Use LazyFrame.clear to clear sql (#26562)
  • Update docs (#26560)
  • Add backtrace coloring (#26544)
  • Evaluate sql process_except_intersect during IR (#26516)
  • Reformat LICENSE (#26532)
  • Add a pipeline in which we test with POLARS_IDEAL_MORSEL_SIZE=4 (#26420)
  • Remove test_file and have tests create test.parquet in tmp_path (#26525)
  • Refactor row-encoding logic in IR join lowering into separate function (#26512)
  • Fix mypy pyiceberg expression errors (#26523)
  • Make nix flake mostly work (#26517)
  • Switch to custom cloud writer with IO sink metrics (#26494)
  • Update s3fs dev dependency (#26509)
  • Remove Default on DataType (#26511)
  • Have parameterized series rechunk() if not allow_chunks (#26504)
  • Remove dead code (RevMapping) (#26508)
  • Upgraded ruff, mypy, typos (#26476)
  • More SQL to IR conversion execute_isolated (#26455)

Thank you to all our contributors for making this release possible!
@BJohnBraddock, @EndPositive, @Jesse-Bakker, @Kevin-Patyk, @MarcoGorelli, @Matt711, @NathanHu725, @RenzoMXD, @TNieuwdorp, @Voultapher, @WaffleLapkin, @abishop1990, @alexander-beedie, @azimafroozeh, @boris324, @cBournhonesque, @carnarez, @coastalwhite, @daizutabi, @dependabot[bot], @dsprenkels, @erandagan, @etiennebacher, @gautamvarmadatla, @henryharbeck, @hutch3232, @itamarst, @jberg5, @johalnes, @kdn36, @leudz, @lukas-reining, @moktamd, @mqqz, @mroeschke, @nameexhaustion, @orlp, @pragun-ananda, @qxzcode, @ritchie46, @spock-yh, @stakeswky, @tlauli, @toroleapinc, @veeceey and dependabot[bot]

Don't miss a new polars release

NewReleases is sending notifications on new releases.