🚀 Performance improvements
- Lower
arg_{min,max}to streaming engine (#26845) - Additional IR slice pushdown after filter pushdown (#26815)
- Streaming first/last on Enum through physical (#26783)
- Fast filter for scalar predicates (#26745)
- Allow SimpleProjection in streaming engine to rename (#26709)
- Streaming cloud download for
scan_csv(#26637) - Drop columns only needed for predicates after the predicate is applied (#26703)
- Run projection pushdown after predicate pushdown (#26688)
- Comparison literal downcasting (#26663)
- Add dynamic predicates for TopK (#26495)
- Increase minimum default parquet row group prefetch to 8 (#26632)
- Partial predicate conversion to PyArrow (#26567)
- Streaming cloud download for
scan_ndjson/scan_lines(#26563) - Grab GIL fewer times during Object join materialization (#26587)
- Improve CSV and NDJSON cloud sink performance (#26545)
- Tune cloud writer performance (#26518)
- Allow parallel InMemorySinks in streaming engine (#26501)
- Add streaming
AsOfjoin node (#26398) - Don't always rechunk on gather of nested types (#26478)
✨ Enhancements
- Support Expr for holidays in business day calculations (#26193)
- Parameter for pivot to always include value column name (#26730)
- Raise error in
.collect_schema()whenarr.get()is out-of-bounds (#26866) - Extend
Expr.reinterpretto all numeric types of the same size (#26401) - Add missing_columns parameter to scan_csv (#26787)
- Clear no-op scan projections (#26858)
- Support nested datatypes for
{min,max}_by(#26849) - Support SQL
ARRAYinit from typed literals (#26622) - Accept table identifier string in
scan_iceberg()(#26826) - Add a convenience
make freshcommand to the Makefile (#26809) - Expose "use_zip64" Workbook option for
write_excel(#26699) - Add unstable
LazyFrame.sink_iceberg(#26799) - Add maintain order argument on implode (#26782)
- Speed up casting primitive to bool by at least 2x (#26823)
- Support ASCII format table input to
pl.from_repr(#26806) - Enable rowgroup skipping for float columns (#26805)
- Add expression context to errors (#26716)
- Add Decimal support for product reduction (#26725)
- Support all Iceberg V2 arrow types in sink_parquet arrow_schema parameter (#26669)
- Re-work behavior of arrow_schema parameter on sink_parquet (#26621)
- Add
contains_dtype()method forSchema(#26661) - Implement
truncateas a "to_zero" rounding mode (#26677) - More generic streaming GroupBy lowering (#26696)
- Create an
AlignmentTypeAlias (#26668) - Add basic MemoryManager to track buffered dataframes for out-of-core support later (#26443)
- Add
truncateExpression for numeric values (#26666) - Better error messages for hex literal conversion issues in the SQL interface (#26657)
- Add SQL support for
LPADandRPADstring functions (#26631) - Support SQL "FROM-first"
SELECTquery syntax (#26598) - Improve
base_typetyping (#26602) - Bump Chrono to 0.4.24, enabling stricter parsing of
%.3f/%.6f/%.9fspecifiers (#26075) - Expose unstable
assert_schema_equalin py-polars (#24869) - Allow parsing of compact ISO 8601 strings (#24629)
- Add optional "label" param to DataFrame
corr(#26588) - Streaming cloud download for
scan_ndjson/scan_lines(#26563) - Configuration to cast integers to floats in
cast_optionsforscan_parquet(#26492) - Add escaping to quotes and newlines when reading JSON object into string (#26578)
- Standardise on RFC-5545 when doing datetime arithmetic on timezone-aware datetimes (#26425)
- Support
sas_tokenin Azure credential provider (#26565) - Relax SQL requirement for derived tables and subqueries to have aliases (#26543)
- Add polars-config and pl.Config.reload_env_vars() (#26524)
- Record path for object store error raised from sinks (#26541)
- Use CRC64NVME for checksum in aws sinks (#26522)
- Add
get()for binary Series (#26514) - Add streaming
AsOfjoin node (#26398) - Add primitive filter -> agg lowering in streaming GroupBy (#26459)
- Support for the SQL
FETCHclause (#26449)
🐞 Bug fixes
- Prevent
Booleanarithmetic with integer literals producingUnknowntype in streaming engine (#26878) - Fix sink to partitioned S3 from Windows corrupted slashes (#26889)
- Remove outdated warning about List columns in unique() (#26295) (#26890)
- Restore pyarrow predicate conversion for is_in (#26811)
- Release GIL before df.to_ndarray() to avoid deadlock (#26832)
- Fix panic on CSV count_rows with FORCE_ASYNC (#26883)
- Add scalar comparisons for
UInt128series (#26886) - Fix shape error not raised for 0 width inputs with non-0 height for streaming horizontal concat (#26877)
- Fix streaming zip-broadcast node did not raise shape mismatch on empty recv from ready port (#26871)
- Fix incorrect output list.eval with scalar expr, fix panic on list.agg with nulls (#26868)
- Allow list argument in
group_by().map_groups()(#26707) - Support for ADBC drivers instantiated with
dbcinDataFrame.write_database(#26157) - Incorrect arg_sort with descending+limit (#26839)
- Raise error in
.collect_schema()whenarr.get()is out-of-bounds (#26866) - Return ComputeError instead of panicking in map_groups UDF (#26665)
- Issue PerformanceWarning in
LazyFrame.__contains__(#26734) - Correct type hint for
map_columnsfunction parameter (#26487) - Apply thousands_separator to count/null_count in describe() for non-numeric columns (#26486)
- Ensure proper handling of
timedeltawhen multiplying with aSeries(#26830) - Correct type hint for
functionparameter inDataFrame.map_columns(#26372) - Segfault in
JoinExecon deep plan (#26796) - Fix unary expressions on literal in
overcontext (#26827) - Fix
{min,max}_byin streaming engine for Boolean full{min,max}value column (#26848) - Fix debug panic on clip with nan bound (#26854)
- Support grouped
{arg_,}_{min,max}for Categoricals (#26856) - Throw an error if a string is passed to LazyFrame.pivot
on_columns(#26852) - Preserve input float precision in
rolling_cov()androlling_corr()with mixed input types (#26820) - Preserve row count when converting zero-column DataFrame via arrow PyCapsule interface (#26835)
- Prevent infinite recursion in streaming
group_byfallback (#26801) - Use
RowEncodingContext::Structwhen determiningD::Structencoded item len (#26817) - Incorrectly applied CSE on different map_batches functions (#26822)
- Fix duplicated query execution on todo panic when combining
collect(engine='streaming')withPOLARS_AUTO_NEW_STREAMING(#26792) - Prevent predicate pushdown across Sort with baked-in slice (#26804)
- Restore compatibility with
pd.Timedelta(#26785) - Fix panic on lazy sink_parquet created in pipe_with_schema (#26784)
- Support
{column_name}and{index}placeholders in pl.format string (#26771) - Do not use merge-join if
nulls_lastis unknown (#26778) - Normalize float zeros in Parquet column statistics (#26776)
- Fix out-of-bounds for positive offset in windowed
rolling(#26724) - Raise error when
.get()is out-of-bounds in group by context (#26752) - Boolean
bitwise_xoraggregation inverted when column contains nulls (#26749) - Parameter nulls_last was ignored in over (#26718)
- Allow missing time in inexact strptime (#26714)
- Respect
nulls_lastinsort_bywithingroup_by().agg()slow path (#26681) - Return
NaNwhen usingcorr()with a literal and expr (#26697) - Allow strict horizontal concat with empty df (#26345)
- Fix
PoisonErrorpanic caused by reentrant usage of file cache (#26627) - Return null for int values exceeding 128-bit range with
strict=False(#26674) - Incorrect boolean min/max with nulls (#26671)
- Slice-slice pushdown for n_rows (#26673)
- Resolve panic in
Enumstruct slicing (#26643) - Fix CSPE for group_by.map_groups (#26640)
- Remove non-existent parameter from
SQLContexttyping overloads (#26658) - Address
pl.from_epochlosing fractional seconds (#26419) - Fix
to_pandas()on empty enum Series did not preserve enum dictionary (#26610) - Rounding behaviour for
f32values with "HalfAwayFromZero" mode (#26624) - Updated Sum Type Hint (#26629)
- Don't allow namespace registration to override standard methods or properties (#26450)
- Correct arg_(min|max) for scalar columns (#26609)
- Use monkeypatch.chdir in test_sink_path_slicing_utf8_boundaries_26324 (#26616)
- Respect SQL semantics for cumulative functions mapped via
OVERclause (#26570) - Fix incorrect multiplexer output ordering on source token stop request (#26561)
- Fix PyIceberg filter on boolean column (#26550)
- Set
dictionary_page_offsetwhen dictionary encoding is used and pointdata_page_offsetto the first data page (#26542) - Move query parameters to request body when retrieving Unity Catalog temporary credentials (#26539)
- Ensure
read_csv_batched()prints deprecation warning (#26530) - Implement
PhysicalExprforMinBy/MaxBynodes (#26506) - Refactor row-encoding logic in IR join lowering into separate function (#26512)
- Correctly check for path extensions (#26513)
- Change AsOf join to be based on
TotalOrd(#26497) - Correctly raise error on failing nested strict casts (#26499)
- Prevent invalid type casts in
replace_strict()(#26453) - Return
nullwhen dividing literals by0(#26343) - Fix type-hint for Series.quantile (#26422)
📖 Documentation
- Mention ComputeContexts create ephemeral environments by default and hint at re-use (#26692)
- Remove confusing join validation note (#26795)
- Fix formatting in categorical documentation (#26746)
- Fix broken AI policy link (#26728)
- Create Polars Cloud Glossary (#26690)
- Additional SQL documentation (#26662)
- Include invalidate_caches in bisect instructions (#26641)
- Add git bisect guide to contributing docs (#26634)
- Fix Polars Cloud examples (formatting & type hints) (#26625)
- Updated Airflow orchestration documentation (#26585)
- Improve SQL docs for
EXTRACTandDATE_PARTfunctions (#26575) - Fix docstring for bitwise_count_zeros method (#26519)
- Add
get()for binary Series (#26514)
🛠️ Other improvements
- Use large linux-arm runner for release (#26898)
- Ensure
.gitignoreand.typos.tomlexclude"_polars_runtime*"directories (#26842) - Additional IR slice pushdown after filter pushdown (#26815)
- Add private
_expand_pathsscan function (#26798) - Change
Exprsortedness container toAExprSortedand addnulls_lasttoPyExpr.set_sorted()(#26781) - Move
stop_and_buffer_pipe_contentsintojoins/utils.rs(#26810) - Replace iejoin
is_supported_typemacro with a closure inpredicate_pushdown/join.rs(#26812) - Fix first-time contributor auto-label (#26794)
- Automatically add first-contribution label (#26780)
- Add tests for functions that operate on
pl.all()expansion (#26773) - Make contributing policy more strict (#26772)
- Add unused argument warning to ruff rules (#26720)
- Move shared streaming CSV/NDJSON code into shared mod (#26742)
- Undo pub removal of to_dyn_object_store (#26722)
- Mark
{read, scan}_ndjsoncache argument(s) as deprecated (#26711) - Add test for predicate before join (#26705)
- Remove PlanCallback from sql (#26686)
- Bump Rust nightly compiler version (#26379)
- Remove unused problematic ArrayFromIter (#26639)
- Move more boolean code to polars_compute, reusing kernels (#26636)
- Avoid implicit import from
importlib(#26603) - Cleanup
assert_schema_equal(#26596) - Replace some env var reading by polars-config (#26607)
- Use monkeypatch.chdir in test_sink_path_slicing_utf8_boundaries_26324 (#26616)
- Add
__init__.pyfiles and docstrings to testing directories (#26408) - Add wrapper for clippy so it continues on warnings (#26527)
- Use
LazyFrame.clearto clear sql (#26562) - Update docs (#26560)
- Add backtrace coloring (#26544)
- Evaluate sql
process_except_intersectduring IR (#26516) - Reformat LICENSE (#26532)
- Add a pipeline in which we test with
POLARS_IDEAL_MORSEL_SIZE=4(#26420) - Remove
test_fileand have tests createtest.parquetintmp_path(#26525) - Refactor row-encoding logic in IR join lowering into separate function (#26512)
- Fix mypy pyiceberg expression errors (#26523)
- Make nix flake mostly work (#26517)
- Switch to custom cloud writer with IO sink metrics (#26494)
- Update
s3fsdev dependency (#26509) - Remove Default on DataType (#26511)
- Have parameterized series
rechunk()ifnot allow_chunks(#26504) - Remove dead code (
RevMapping) (#26508) - Upgraded
ruff,mypy,typos(#26476) - More SQL to IR conversion
execute_isolated(#26455)
Thank you to all our contributors for making this release possible!
@BJohnBraddock, @EndPositive, @Jesse-Bakker, @Kevin-Patyk, @MarcoGorelli, @Matt711, @NathanHu725, @RenzoMXD, @TNieuwdorp, @Voultapher, @WaffleLapkin, @abishop1990, @alexander-beedie, @azimafroozeh, @boris324, @cBournhonesque, @carnarez, @coastalwhite, @daizutabi, @dependabot[bot], @dsprenkels, @erandagan, @etiennebacher, @gautamvarmadatla, @henryharbeck, @hutch3232, @itamarst, @jberg5, @johalnes, @kdn36, @leudz, @lukas-reining, @moktamd, @mqqz, @mroeschke, @nameexhaustion, @orlp, @pragun-ananda, @qxzcode, @ritchie46, @spock-yh, @stakeswky, @tlauli, @toroleapinc, @veeceey and dependabot[bot]