rapidsai/cudf v25.12.00 on GitHub

What's Changed

🚨 Breaking Changes

Rewrite JNI functions to use JNI_TRY/JNI_CATCH by @ttnghia in #19053
Remove compatibility with nvCOMP versions before 5.0 by @vuule in #20140
Remove DataFrame.apply_chunks, Groupby.apply_grouped by @mroeschke in #20194
Change .str.starts/endswith with tuple argument to match any pattern instead of pairwise matching by @mroeschke in #20249
[cudf-polars] CUDA stream by @madsbk in #20154
Chunked read parquet, prepend index column, and apply deletion vector by @mhaseeb123 in #20201
Zero-copy hostdevice_vector on integrated systems by @vuule in #20225
Use int64_t for the num_rows slot in parquet_reader_options by @wence- in #20256
Require CUDA 12.2+ by @jakirkham in #20416
Remove compatibility for CCCL < 3.1 by @bdice in #20468
Remove deprecated types and APIs by @vuule in #20422
Support signed integers and decimals in SUM_WITH_OVERFLOW groupby by @PointKernel in #19598
Change groupby-scan COUNT to 1-based results by @davidwendt in #20168
Change strings::like() pattern parameter from string_scalar to string_view by @davidwendt in #20428
No-op performance tracking wrappers by @galipremsagar in #20595

🐛 Bug Fixes

Copy attrs at correct place in DataFrame constructor by @galipremsagar in #20074
Handle missing nightly runs in pandas tests job by @galipremsagar in #20081
Fix numpy ufunc for DataFrame by @galipremsagar in #20070
Unproxy few unnecessary testing utilities in pandas by @galipremsagar in #20088
Fix libcudf groupby benchmarks to not include internal cache by @davidwendt in #20038
Fix cudf.date_range with non-iso start and end date strings by @mroeschke in #20116
Fix create_distinct_rows_column to create non-nullable columns by @davidwendt in #20082
Fix arrow timestamp frequency cases in cudf.pandas by @galipremsagar in #20128
Cast inputs to true division from decimal to float by @Matt711 in #20077
Handle NVMLError_NotSupported in cudf-polars by @TomAugspurger in #20179
Fix RMM JNI pinned_fallback_host_memory_resource for CCCL 3.1.0 by @bdice in #20160
Require passing memory resources to from_libcudf methods by @vyasr in #20171
Enable hash-groupby for decimal32/64 type and MEAN aggregation by @davidwendt in #20040
Align decimal dtypes in predicate before conditional join by @Matt711 in #20060
Change stream_checking_resource_adaptor::do_deallocate to noexcept by @vyasr in #20218
Deallocation should be noexcept by @bdice in #20219
Fix a race condition in the decode of delta encoded Parquet columns by @vuule in #20216
Fix the host-device tdigest offsets by using cuda::std::span by @PointKernel in #20220
Add stream and mr arguments to Column.from_arrow type stub by @TomAugspurger in #20244
Pin deltalake in cudf-polars-polars-tests CI job by @TomAugspurger in #20255
Pin ibis-framework<11.0.0 by @Matt711 in #20267
Add private attributes for cudf.pandas proxy objects by @galipremsagar in #20276
Add Proxy for SparseAccessor by @galipremsagar in #20278
We need this to pacify mypy by @wence- in #20285
Purge non-empty nulls for the generated lists columns in data generation utility by @ttnghia in #20283
Fix missing table compatibility check in two_table_comparator constructor by @PointKernel in #20305
Fix the check for equal num_cols across empty parquet sources by @mhaseeb123 in #20320
Add nans_to_nulls to Frame by @galipremsagar in #20314
Add support for list type in get by @galipremsagar in #20332
Fix decimal dtype serialization in cudf-polars by @Matt711 in #20300
Make the GroupedRollingWindowexpression node reconstructable in cudf-polars by @Matt711 in #20288
Ensure pylibcudf.Scalar.from_py uses CUDA streams by @TomAugspurger in #20340
Skip failing cudf-polars test due to hash groupby bug by @Matt711 in #20356
Support order by keys for order-sensitive scalar aggregations in grouped windows by @Matt711 in #20350
Honor user-passed stream in slice_strings for scalar inputs by @mroeschke in #20349
Thread missing streams in column/table view creation to char size calculation by @vyasr in #20351
Fix missed-sync for mapping_indices_kernel in hash-based groupby aggregation by @ttnghia in #20370
Fix a few SPDX-related issues by @KyleFromNVIDIA in #20364
Fix a dtype bug in column constructor by @galipremsagar in #20384
Refactor as_column dtype parameter calls by @galipremsagar in #20379
Add CUDA stream to cudf_polars.Column.deserialize by @TomAugspurger in #20396
Add missing CUDA stream to cudf-polars left-semi join by @TomAugspurger in #20398
Fix various string APIs to work with extension types by @galipremsagar in #20368
Add parameter validation for merge and MultiIndex.from_frame by @galipremsagar in #20382
Fix nvtext::normalize_characters special token case by @davidwendt in #20242
Fix pinned memory resource shared_pointer lifetime in tests. by @bdice in #20407
Support new nvcompStatus_t enum value by @vuule in #20376
Don't skip blank CSV lines rows after the header in cudf-polars scan_csv by @mroeschke in #20341
Fix OOB accesses in JSON_CornerCase_Empty test and get_row_array_parent_col_id function by @bdice in #20421
Change calls to cudaMemcpyToSymbol to cudaMemcpyToSymbolAsync by @davidwendt in #20374
Do not accelerate pandas._config.config by @Matt711 in #20413
Return timedelta instead of datetime type with std with datetime type with missing values by @mroeschke in #20439
Disallow non-bool skipna arguments to reduction methods by @mroeschke in #20436
Fix parquet scans for duckDB PDS-DS by @Matt711 in #20388
Support __array_function__ on the proxy array type by @Matt711 in #20419
Make memory_usage and __sizeof__ proxy attributes and always skip all memory usage tests by @Matt711 in #20425
Add input validation for from_records by @galipremsagar in #20412
Use computed reduction result type for empty sum and product aggregations by @mroeschke in #20438
Correct level arg validation for Index.isin, unique by @mroeschke in #20449
Add private _grouper attribute to DataFrameGroupBy proxy type by @Matt711 in #20448
Raise ValueError when indexing with zero step slice by @mroeschke in #20453
Raise IndexError for float-like indexers in RangeIndex/MultiIndex.getitem by @mroeschke in #20454
Disallow slice(bool, ...) in DataFrame.loc with MultiIndex by @mroeschke in #20457
Fix core dump in MemoryCleaner by @res-life in #19872
Disallow multiple ellipse values in loc/iloc indexing by @mroeschke in #20456
Fix scan operations for string columns by @galipremsagar in #20460
Fix UTF8 data generator in libcudf benchmarks utility by @davidwendt in #20465
Handle dealloc in stream-ordered cudf-polars ops by @TomAugspurger in #20467
Raise on unsupported unstack cases by @Matt711 in #20463
Allow early exit for left semi-/anti- joins with empty build/probe tables by @shrshi in #20452
Fix OOB memory access in JSON reader ingest_raw utility by @davidwendt in #20451
Round up small-type groupby outputs to 4-byte boundary by @PointKernel in #20455
Fix GPU acceleration bug in decimal type-cast by @galipremsagar in #20471
Add missing CUDA stream in cudf_polars Distinct by @TomAugspurger in #20477
Support __arrow_array__ on proxy extension array by @Matt711 in #20478
Enable scan operation for datetime64 and timedelta64 types by @galipremsagar in #20464
Remove unneeded type check in cudf::strings::slice_strings by @davidwendt in #20437
Fix join match context tests by @PointKernel in #20472
Fix the statistics_mr in benchmark fixture by @PointKernel in #20496
Guard __sizeof__ in pandas compatability mode by @Matt711 in #20495
Fix OOB memory access in Orc and Parquet stacks from fixed-width unaligned loads by @mhaseeb123 in #20458
Fix cudf.pandas Timestamp/Timedelta not subclassing stdlib datetime objects by @mroeschke in #20433
Revert benchmark input generation logic for list type by @davidwendt in #20498
Avoid using pylibcudf directly in rapidsmpf runtime by @rjzamora in #20501
Suppress NVRTC arch warnings by @brandon-b-miller in #20517
Fix ChannelManager and Lineariser by @rjzamora in #20516
Synchronize streams in LocalShuffle by @rjzamora in #20515
Make argsort have return type np.intp to match pandas by @Matt711 in #20487
Fix polars.concat_str with one column in cudf_polars by @mroeschke in #20535
Override __sizeof__ for cudf.Index by @Matt711 in #20530
Fix pl.scan_csv(...).slice(...).collect(engine="gpu") with None endpoint by @mroeschke in #20519
Fix DataChunkSourceTest by syncing default stream by @davidwendt in #20492
Fix data size errors in some libcudf benchmarks by @davidwendt in #20512
Pin cython and pytest dependencies by @TomAugspurger in #20571
Pin Cython pre-3.2.0 and PyTest pre-9 by @jakirkham in #20573
Handle Empty child IRs in _decompose by @Matt711 in #20409
Skip flaky pandas datetime test by @Matt711 in #20585
Fix max-pool-size-exceeded error in DATA_CHUNK_SOURCE_TEST by @davidwendt in #20534
Fix racecheck in nvtext wordpiece tokenizer kernel by @davidwendt in #20588
Fix the check to determine if all column chunk pages are dict encoded by @mhaseeb123 in #20524
Add stream synchronize to QUANTILES_TEST PercentileApprox gtests by @davidwendt in #20558
updated update-version.sh to handle release branch version changes by @rockhowse in #20598
Fix nvtext tokenizers handling invalid UTF8 data by @davidwendt in #20514
Fix overflow errors in distinct and filtered joins when hash table size exceeds int32 limits by @shrshi in #20594
[FEA] Optimize JIT Filter for Low-Selectivity by @lamarrr in #20222
Compute boolean function(NOT) on integers as a bitwise invert by @Matt711 in #20599
Cast output dtype of rolling aggregations to match pandas by @Matt711 in #20526
Add noop path for Frame.astype by @Matt711 in #20581
Fix copy semantics bugs thus reduce copies and memory usage by @galipremsagar in #20121
Ensure the sum after expression decomposition for mean has float output dtype by @Matt711 in #20596
Use Decimal(0) literal for all-null decimal groups in groupby-sum by @Matt711 in #20591
Do not drop freq when constructing DatetimeIndex from pandas by @brandon-b-miller in #18778
Fix --validation flag for cudf.pandas PDSH benchmarks by @mroeschke in #20540
Enable GPU acceleration for more binops by @galipremsagar in #20507
Fix rmm function calls due to removed deprecated APIs and macro by @ttnghia in #20661
Fix orc reader bool bug due to not being able to resume rle decode by @pmattione-nvidia in #20666
Fix categorical comparisons in cudf to match pandas by @galipremsagar in #20674
Fix any and all to match pandas by @galipremsagar in #20679
Fix return types of string APIs in cudf.pandas by @galipremsagar in #20683
Resolve pandas test failures by @galipremsagar in #20704
Fix DatetimeIndex pickling by @vyasr in #20709
DatetimeIndex.serialize() headers are msgpack serializable by @TomAugspurger in #20714

📖 Documentation

Add note that --rmm-async only affects distributed scheduler. by @bdice in #20129
Add profiling guide by @bdice in #20292
Find RMM before CCCL by @wence- in #20336
Use current system architecture in conda environment creation command by @bdice in #20500
Use uname -m instead of arch command by @bdice in #20502
Use RAPIDS_BRANCH file for documentation links by @bdice in #20494

🚀 New Features

Add memory resources to unary, transform, and filling modules by @vyasr in #20054
Add memory resources to binaryop, copying, and stream_compaction by @vyasr in #20059
Add memory resources to groupby, datetime, and lists modules by @vyasr in #20102
Add memory resources to search, reshape, and partitioning module by @vyasr in #20101
Add memory resources to rolling, sorting, and quantiles modules by @vyasr in #20099
[FEA] Implement JIT Filter for read_parquet by @lamarrr in #19831
Add memory resources to all nvtext APIs by @vyasr in #20119
Add memory resource to all strings modules by @vyasr in #20123
Add memory resources to reduce, column, column_factories, and contiguous_split by @vyasr in #20135
Add memory resources to I/O modules by @vyasr in #20136
Remove rounding from cudf java by @pmattione-nvidia in #20110
Add memory resources to replace, json, and hashing by @vyasr in #20150
Add support for maintain_order param in joins by @Matt711 in #17698
Add an example to inspect parquet files and dump row group and page level metadata information by @mhaseeb123 in #20117
Support forward/backward filling null values in a grouped window context by @Matt711 in #19907
Allow multiple calls to cudf::initialize and cudf::deinitialize by @vuule in #20111
Add remaining memory resources by @vyasr in #20197
Add memory resources to scalars by @vyasr in #20196
Add pylibcudf is_valid_reduce_aggregation API by @davidwendt in #20145
Support decimal literals in cudf-polars by @Matt711 in #20147
Support cum_sum(...).over(...) expressions in cudf-polars by @Matt711 in #19908
Passthrough unary ops through Parquet predicate pushdown by @mhaseeb123 in #20127
Implement ARGMIN and ARGMAX aggregations for reduction by @ttnghia in #20207
Skip decompression of pruned parquet pages by @mhaseeb123 in #20192
Add an example to demonstrate the use of next-gen parquet reader to read a parquet file with highly selective filters by @mhaseeb123 in #19469
Evaluate IS_NULL at row group and page level in Parquet filtering by @mhaseeb123 in #20144
[Java] Add optional native deps loader by @zpuller in #20414
Add cudf-polars + rapidsmpf CI check by @rjzamora in #20355
Add Python bindings for the hybrid scan reader by @vyasr in #20381
RapidsMPF streaming-engine translation by @rjzamora in #20161
[JNI] Use a read/write lock pattern in Rmm.class by @abellina in #20521
[Java] Supports output projection indices for contiguousSplitGroupsAndGenUniqKeys by @res-life in #20391
Support Series.at and Series.iat for pandas compatability by @Matt711 in #20529
Add COUNT_VALID aggregation support to groupby-scan by @davidwendt in #20531
Use RapidsMPF read_parquet in "rapidsmpf" runtime by @rjzamora in #20497
Support decimal128 SUM aggregation in hash-based groupby by @PointKernel in #20509
Add stream testing in pylibcudf by @vyasr in #20625

🛠️ Improvements

Deprecate .from_pandas constructor by @mroeschke in #19996
Prune entries in Sphinx nitpick_ignore by @mroeschke in #20045
Avoid direct CategoricalColumn calls in dask_cudf by @mroeschke in #20080
Fix typing issues in pylibcudf by @vyasr in #20069
Avoid shadowing module names by @vyasr in #20071
Remove calling to purge_nonempty_nulls in make_lists_column by @ttnghia in #12873
Reduce verbosity of running the pandas test suite by @vyasr in #20107
Clean up detail device atomic logic using atomic_ref by @PointKernel in #19924
Use 8 processes for pandas tests, show top 10 test times by @bdice in #20109
Update nvbench by @bdice in #19619
Cleanup of some libcudf aggregation code by @davidwendt in #20053
Run cudf-polars conda unit tests with more than 1 process by @mroeschke in #19980
Avoid running pandas unit tests for private functionality with cudf.pandas by @mroeschke in #20115
Remove MultiIndex.from_pandas pytest benchmark by @mroeschke in #20112
Switch host_vector and host_span dependency by @davidwendt in #20106
Have ListColumn.from_sequence go through pylibcudf by @mroeschke in #20098
Fix RAPIDS_BRANCH version and update script by @galipremsagar in #20091
Add pyarrow stubs to mypy environment and fix associated errors by @vyasr in #20118
Fix slowdown in cudf-polars distributed tests by @TomAugspurger in #20137
Improve performance of string column size computation during parquet reads. by @nvdbaranec in #19986
Disable async MR priming in cudf.pandas by @bdice in #20133
Rework reduction case statement as dispatch_type_and_aggregation by @davidwendt in #20078
Fix type annotations in cudf-polars by @TomAugspurger in #20131
Add tests for AUTO and HYBRID (de)compression modes by @vuule in #20126
Branch 25.12 merge branch 25.10 by @vyasr in #20152
Manual forward merger for Branch 25.12 - branch 25.10 by @galipremsagar in #20157
Temporarily disable conda-java-tests by @bdice in #20162
Remove unused ColumnBase.view by @mroeschke in #20141
Avoid NumericalColumn call from CategoricalColumn.children by @mroeschke in #20153
Deprecate legacy public row operators by @PointKernel in #20097
Avoid more explicit calls to IntervalColumn and StructColumn by @mroeschke in #20064
Run cudf-polars wheels unit tests with more than 1 process by @mroeschke in #20124
Trace node execution in cudf-polars by @TomAugspurger in #19895
Make ColumnBase.as_*_column convert via pylibcudf by @mroeschke in #20149
Reduce execution times for parquet dictionary tests by @mhaseeb123 in #20176
Update to rapids-logger 0.2 by @bdice in #20172
Adjust rmm pool handling in PDSH benchmarks by @TomAugspurger in #20138
Don't assume cudf_polars benchmarking scale factor is always an integer by @mroeschke in #20182
Skip filtering Parquet row groups with dictionaries if there are non-dict encoded pages by @mhaseeb123 in #20175
Remove unnecessary work from read_parquet_metadata by @vuule in #20180
Improve performance of groupby tdigests gtests by @davidwendt in #20173
Revert "Temporarily disable conda-java-tests" by @bdice in #20184
Add PDSH benchmark runner for cudf.pandas by @mroeschke in #20164
Make Column.set_mask go through pylibcudf by @mroeschke in #20103
Pin pydantic<2.12 in ci/test_cudf_polars_polars_tests.sh by @mroeschke in #20200
Add an overhead field to cudf-polars tracing by @TomAugspurger in #20198
Support binops between float scalar to decimal column by @mroeschke in #20199
Reduce output buffer sizes for pruned pages of columns with a list parent by @mhaseeb123 in #20086
Make ListColumn._transform_leaves convert via pylibcudf by @mroeschke in #20151
Rename comparison_binop_generator to arg_minmax_binop_generator and corresponding file to nested_types_extrema_utils.cuh by @Copilot in #20212
Pin polars version <1.34 and >=1.29 by @Matt711 in #19912
Stop using libcudf default parameters in pylibcudf by @vyasr in #20204
Fix various typing errors by @vyasr in #20205
Cleanup parquet for simple columns by @pmattione-nvidia in #19869
Configuration for which metrics are enabled during tracing by @TomAugspurger in #20223
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #20189
Fix parquet row number check for page bounds by @pmattione-nvidia in #20217
More mypy and docs fixes by @vyasr in #20224
Prevent accidental copies of expensive-to-copy object types by @vuule in #20226
Split row operator header by @PointKernel in #20166
Standardize setting StructDtype field names post libcudf conversion by @mroeschke in #20235
Add arm testing of cudf.pandas unit tests by @vyasr in #20251
Enable sccache-dist connection pool by @trxcllnt in #20264
Run polars tests with the streaming and in-memory executors by @Matt711 in #19354
Move and rename ScanPartitionPlan by @rjzamora in #20248
Unpin DuckDB and Ibis in cudf.pandas thirdparty tests by @mroeschke in #20269
Add pylibcudf to pre-commit linting and fix outstanding errors by @vyasr in #20250
Update ConfigOptions for rapidsmpf-streaming integration by @rjzamora in #20252
Handle unordered grouped windows properly for null filling and cum sums by @Matt711 in #20275
Add more type annotations to cudf/core/column subclasses by @mroeschke in #20277
Remove extraneous host_memory_resource include by @bdice in #20284
Add MultiIndex.dtypes by @galipremsagar in #20279
Skip mypy in pre-commit.ci by @bdice in #20286
Make ColumnBase.deserialize construct via pylibcudf by @mroeschke in #20142
Add numpy to the mypy pre-commit environment by @vyasr in #20282
Add ability to set the source_info of parquet_reader_options by @wence- in #20253
Add more Python type annotations to cudf/core by @mroeschke in #20287
Use main in RAPIDS_BRANCH by @bdice in #20312
Add inferred_type and missing IntervalIndex properties by @galipremsagar in #20294
Avoid unseeded, random data generation in cuDF classic tests by @mroeschke in #20319
Improve hash-based groupby aggregation: direct write to the dense output columns whenever possible by @ttnghia in #19764
Avoid accessing range values in cudf::strings::contains_re logic by @davidwendt in #20122
Migrate mixed join to use the multiset data structure by @PointKernel in #19989
Add benchmark for strings cast to/from integer APIs by @davidwendt in #20247
Use main shared-workflows branch by @bdice in #20324
Use the thread pool for Parquet metadata processing by @vuule in #20263
Add .dt.day_of_week and .dt.daysinmonth by @galipremsagar in #20298
Avoid Column materialization in RangeIndex.nans_to_nulls by @mroeschke in #20331
Update the code to be compatible with the new cuco stream-ordered allocator by @PointKernel in #20258
Deprecate Series.data by @mroeschke in #20281
Align cudf Python's Column constructors by @mroeschke in #20233
Make type annotations of ColumnBase.set_mask stricter by @mroeschke in #20261
Make type annotations of ColumnBase.find_and_replace stricter by @mroeschke in #20259
Make type annotations of ColumnBase.apply_boolean_mask stricter by @mroeschke in #20262
Skip Python LZ4 tests when nvCOMP is disabled by @vuule in #20293
Move cudf/io/nvcomp_adapter.hpp to cudf/io/detail by @davidwendt in #20327
Add context to IR.do_evaluate by @TomAugspurger in #20322
Update mypy # type: ignore comments according to stricter mypy configs by @mroeschke in #20272
Remove duplicated enforce null consistency code by @mhaseeb123 in #20342
Use SPDX for all copyright headers by @KyleFromNVIDIA in #20321
Add more type annotations to cudf/core/series.py by @mroeschke in #20304
Remove/Replace uses of numba.cuda arrays in pytest benchmarks and tests by @mroeschke in #20359
Add duckdb pdsh query queries by @Matt711 in #20257
Use stream in cudf_polars.DataFrame.to_polars by @TomAugspurger in #20323
Add join_streams to pylibcudf API by @TomAugspurger in #20316
Use CUDA streams in all pylibcudf calls made by cudf-polars by @TomAugspurger in #20291
Add cudf/io/config_utils.hpp to doxygen by @davidwendt in #20329
Test coverage for parallel metadata parsing by @vuule in #20334
Support serializing more polars types by @Matt711 in #20347
Add CUDAStreamPolicy to cudf-polars configuration by @TomAugspurger in #20366
Unskip cudf-polars groupby test by @Matt711 in #20406
Deprecate pylibcudf interop arrow APIs by @Matt711 in #20405
Get rid of the hashing helper header by @PointKernel in #20360
Minor cleanup and fixes for libcudf generate_input.cu by @davidwendt in #20363
Ignore assert_produces_warning and shares_memory pandas unit tests for cudf.pandas by @mroeschke in #20434
Short circut RangeIndex.append for length 0 input, proxy private attribute by @mroeschke in #20442
Mark DataFrame.insert as _external_only_api by @Copilot in #20403
Deprecate get_current_device_resource in favor of get_current_device_resource_ref by @PointKernel in #20386
Promote JoinNoneValue to public as JoinNoMatch for clear non-match Join semantics by @PointKernel in #20440
Remove duplicate entries in NODEIDS_THAT_FAIL_WITH_CUDF_PANDAS by @mroeschke in #20447
Use the thread pool in the compact protocol reader by @vuule in #20417
Update README.md generalizing all cuDF components by @mroeschke in #20357
Skip TestDatetimelikeCoercion pandas tests that assert ._value identity by @mroeschke in #20459
Add PSDH Q2-9 for cudf.pandas by @mroeschke in #20418
Add s3fs to test_cudf_python common dependencies by @trxcllnt in #20473
Use public pandas APIs in StringColumn.to_pandas by @mroeschke in #20474
Expose java GatherMap internals and add toString to AST by @revans2 in #20483
Add create_ascii_string_column to the libcudf benchmark data generator by @davidwendt in #20354
Skip more pandas unit tests that tests BlockManager, private sparse types by @mroeschke in #20489
Add boto3/botocore/aiobotocore to common test dependencies by @trxcllnt in #20490
Use a lower bound when estimating the partial file-size by @rjzamora in #20193
Performance improvement for nvtext::edit_distance for long strings by @davidwendt in #20268
Add MemoryResourceConfig to cudf-polars config by @TomAugspurger in #20042
Improve project automation by @vyasr in #20523
Fuse simple streaming reductions in cudf-polars by @rjzamora in #18757
Migrate to new CCCL memory resource interface by @bdice in #20513
Add empty input gtest for cudf::transform by @davidwendt in #20505
Rework internal json headers to allow converting gtests files from .cu to .cpp by @davidwendt in #20491
Set continue on error in the cudf-polars-rapidsmpf nightly CI job by @Matt711 in #20550
Permanently back cuDF column by a pylibcudf.Column by @mroeschke in #20306
Skip flaky upstream polars rolling test by @Matt711 in #20552
Accelerate data page mask computation on device by @mhaseeb123 in #20280
Change default rapidsmpf stream policy to 'pool' by @TomAugspurger in #20527
Increase gtests coverage for cudf::strings::like patterns by @davidwendt in #20348
Add cuda::std::span operator to cudf::column_view by @davidwendt in #20541
Update ArrowStringView compare benchmark for gather by @davidwendt in #19935
Add pytest stubs and remove ujson usage by @vyasr in #20560
Skip arrow array constructor tests by @Matt711 in #20579
Add Polars to mypy environment and fix errors by @vyasr in #20563
Ensure table chunks are unspilled and available by @madsbk in #20583
Skip tests that assert behavior when copy-on-write is False by @Matt711 in #20506
Pass streams through Column.from_array/from_iterable_of_py by @Matt711 in #20569
Stop using Dtype annotation by @vyasr in #20590
Workaround to enable running PDS-H via WebHDFS by @kingcrimsontianyu in #20132
Update RMM includes from <rmm/mr/device/*> to <rmm/mr/*> by @bdice in #20607
Stricter typing import for cudf-polars by @TomAugspurger in #20614
Avoid the unnecessary H2H copy in the std::vector sink by @vuule in #20602
Preprocessing offsets for Parquet non-dictionary string columns by @pmattione-nvidia in #20430
Move more pandas unit tests that test private APIs by @mroeschke in #20511
Use .plc_column instead of .to_pylibcudf in rolling, string utilties by @mroeschke in #20562
Skip TestSetitemNADatetimeLikeDtype pandas unit tests due to private assertion by @mroeschke in #20578
Pin Polars version <1.35 by @Matt711 in #20266
Skip pandas unit tests in test_old_base.py that test private APIs by @mroeschke in #20572
Use .plc_column attribute instead of to_pylibcudf more internally by @mroeschke in #20559
Skip arrow-backed arithmetic tests and categorize the remaining failing tests by @Matt711 in #20577
Fix a pytest execution that is spawned in a subprocess by @galipremsagar in #20660
Accelerated parquet page header decoding when page index is available by @mhaseeb123 in #20369
feat: add error handling for non-existent columns in parquet reader by @gforsyth in #20659
Optimize row mask computation for single filter column by @mhaseeb123 in #20335
Skip MultiIndex pandas unit tests testing private functionalty, test_chaining_and_caching.py by @mroeschke in #20575
Address minor comments from recent hybrid scan PRs by @mhaseeb123 in #20672
Add a timeout for the rapidsmpf test run by @vyasr in #20681
Use sccache-dist build cluster for conda and wheel builds by @trxcllnt in #20488

New Contributors

@Copilot made their first contribution in #20212
@rockhowse made their first contribution in #20598

Full Changelog: v25.12.00a...v25.12.00