What's Changed
🚨 Breaking Changes
- Rewrite JNI functions to use
JNI_TRY/JNI_CATCHby @ttnghia in #19053 - Remove compatibility with nvCOMP versions before 5.0 by @vuule in #20140
- Remove DataFrame.apply_chunks, Groupby.apply_grouped by @mroeschke in #20194
- Change .str.starts/endswith with tuple argument to match any pattern instead of pairwise matching by @mroeschke in #20249
- [cudf-polars] CUDA stream by @madsbk in #20154
- Chunked read parquet, prepend index column, and apply deletion vector by @mhaseeb123 in #20201
- Zero-copy
hostdevice_vectoron integrated systems by @vuule in #20225 - Use int64_t for the num_rows slot in parquet_reader_options by @wence- in #20256
- Require CUDA 12.2+ by @jakirkham in #20416
- Remove compatibility for CCCL < 3.1 by @bdice in #20468
- Remove deprecated types and APIs by @vuule in #20422
- Support signed integers and decimals in
SUM_WITH_OVERFLOWgroupby by @PointKernel in #19598 - Change groupby-scan COUNT to 1-based results by @davidwendt in #20168
- Change strings::like() pattern parameter from string_scalar to string_view by @davidwendt in #20428
- No-op performance tracking wrappers by @galipremsagar in #20595
🐛 Bug Fixes
- Copy
attrsat correct place inDataFrameconstructor by @galipremsagar in #20074 - Handle missing nightly runs in pandas tests job by @galipremsagar in #20081
- Fix numpy ufunc for
DataFrameby @galipremsagar in #20070 - Unproxy few unnecessary testing utilities in pandas by @galipremsagar in #20088
- Fix libcudf groupby benchmarks to not include internal cache by @davidwendt in #20038
- Fix cudf.date_range with non-iso start and end date strings by @mroeschke in #20116
- Fix create_distinct_rows_column to create non-nullable columns by @davidwendt in #20082
- Fix arrow timestamp frequency cases in
cudf.pandasby @galipremsagar in #20128 - Cast inputs to true division from decimal to float by @Matt711 in #20077
- Handle NVMLError_NotSupported in cudf-polars by @TomAugspurger in #20179
- Fix RMM JNI pinned_fallback_host_memory_resource for CCCL 3.1.0 by @bdice in #20160
- Require passing memory resources to from_libcudf methods by @vyasr in #20171
- Enable hash-groupby for decimal32/64 type and MEAN aggregation by @davidwendt in #20040
- Align decimal dtypes in predicate before conditional join by @Matt711 in #20060
- Change stream_checking_resource_adaptor::do_deallocate to noexcept by @vyasr in #20218
- Deallocation should be noexcept by @bdice in #20219
- Fix a race condition in the decode of delta encoded Parquet columns by @vuule in #20216
- Fix the host-device tdigest offsets by using cuda::std::span by @PointKernel in #20220
- Add
streamandmrarguments toColumn.from_arrowtype stub by @TomAugspurger in #20244 - Pin
deltalakein cudf-polars-polars-tests CI job by @TomAugspurger in #20255 - Pin ibis-framework<11.0.0 by @Matt711 in #20267
- Add private attributes for
cudf.pandasproxy objects by @galipremsagar in #20276 - Add Proxy for
SparseAccessorby @galipremsagar in #20278 - We need this to pacify mypy by @wence- in #20285
- Purge non-empty nulls for the generated lists columns in data generation utility by @ttnghia in #20283
- Fix missing table compatibility check in two_table_comparator constructor by @PointKernel in #20305
- Fix the check for equal
num_colsacross empty parquet sources by @mhaseeb123 in #20320 - Add
nans_to_nullstoFrameby @galipremsagar in #20314 - Add support for list type in
getby @galipremsagar in #20332 - Fix decimal dtype serialization in cudf-polars by @Matt711 in #20300
- Make the
GroupedRollingWindowexpression node reconstructable in cudf-polars by @Matt711 in #20288 - Ensure pylibcudf.Scalar.from_py uses CUDA streams by @TomAugspurger in #20340
- Skip failing cudf-polars test due to hash groupby bug by @Matt711 in #20356
- Support order by keys for order-sensitive scalar aggregations in grouped windows by @Matt711 in #20350
- Honor user-passed stream in slice_strings for scalar inputs by @mroeschke in #20349
- Thread missing streams in column/table view creation to char size calculation by @vyasr in #20351
- Fix missed-sync for
mapping_indices_kernelin hash-based groupby aggregation by @ttnghia in #20370 - Fix a few SPDX-related issues by @KyleFromNVIDIA in #20364
- Fix a
dtypebug in column constructor by @galipremsagar in #20384 - Refactor
as_columndtype parameter calls by @galipremsagar in #20379 - Add CUDA stream to
cudf_polars.Column.deserializeby @TomAugspurger in #20396 - Add missing CUDA stream to cudf-polars left-semi join by @TomAugspurger in #20398
- Fix various string APIs to work with extension types by @galipremsagar in #20368
- Add parameter validation for
mergeandMultiIndex.from_frameby @galipremsagar in #20382 - Fix nvtext::normalize_characters special token case by @davidwendt in #20242
- Fix pinned memory resource
shared_pointerlifetime in tests. by @bdice in #20407 - Support new
nvcompStatus_tenum value by @vuule in #20376 - Don't skip blank CSV lines rows after the header in cudf-polars scan_csv by @mroeschke in #20341
- Fix OOB accesses in JSON_CornerCase_Empty test and get_row_array_parent_col_id function by @bdice in #20421
- Change calls to cudaMemcpyToSymbol to cudaMemcpyToSymbolAsync by @davidwendt in #20374
- Do not accelerate
pandas._config.configby @Matt711 in #20413 - Return timedelta instead of datetime type with std with datetime type with missing values by @mroeschke in #20439
- Disallow non-bool skipna arguments to reduction methods by @mroeschke in #20436
- Fix parquet scans for duckDB PDS-DS by @Matt711 in #20388
- Support
__array_function__on the proxy array type by @Matt711 in #20419 - Make
memory_usageand__sizeof__proxy attributes and always skip all memory usage tests by @Matt711 in #20425 - Add input validation for
from_recordsby @galipremsagar in #20412 - Use computed reduction result type for empty sum and product aggregations by @mroeschke in #20438
- Correct level arg validation for Index.isin, unique by @mroeschke in #20449
- Add private
_grouperattribute toDataFrameGroupByproxy type by @Matt711 in #20448 - Raise ValueError when indexing with zero step slice by @mroeschke in #20453
- Raise IndexError for float-like indexers in RangeIndex/MultiIndex.getitem by @mroeschke in #20454
- Disallow slice(bool, ...) in DataFrame.loc with MultiIndex by @mroeschke in #20457
- Fix core dump in MemoryCleaner by @res-life in #19872
- Disallow multiple ellipse values in loc/iloc indexing by @mroeschke in #20456
- Fix
scanoperations forstringcolumns by @galipremsagar in #20460 - Fix UTF8 data generator in libcudf benchmarks utility by @davidwendt in #20465
- Handle dealloc in stream-ordered cudf-polars ops by @TomAugspurger in #20467
- Raise on unsupported unstack cases by @Matt711 in #20463
- Allow early exit for left semi-/anti- joins with empty build/probe tables by @shrshi in #20452
- Fix OOB memory access in JSON reader ingest_raw utility by @davidwendt in #20451
- Round up small-type groupby outputs to 4-byte boundary by @PointKernel in #20455
- Fix GPU acceleration bug in decimal type-cast by @galipremsagar in #20471
- Add missing CUDA stream in cudf_polars Distinct by @TomAugspurger in #20477
- Support
__arrow_array__on proxy extension array by @Matt711 in #20478 - Enable scan operation for
datetime64andtimedelta64types by @galipremsagar in #20464 - Remove unneeded type check in cudf::strings::slice_strings by @davidwendt in #20437
- Fix join match context tests by @PointKernel in #20472
- Fix the statistics_mr in benchmark fixture by @PointKernel in #20496
- Guard
__sizeof__in pandas compatability mode by @Matt711 in #20495 - Fix OOB memory access in Orc and Parquet stacks from fixed-width unaligned loads by @mhaseeb123 in #20458
- Fix cudf.pandas Timestamp/Timedelta not subclassing stdlib datetime objects by @mroeschke in #20433
- Revert benchmark input generation logic for list type by @davidwendt in #20498
- Avoid using pylibcudf directly in rapidsmpf runtime by @rjzamora in #20501
- Suppress NVRTC arch warnings by @brandon-b-miller in #20517
- Fix
ChannelManagerandLineariserby @rjzamora in #20516 - Synchronize streams in
LocalShuffleby @rjzamora in #20515 - Make
argsorthave return typenp.intpto match pandas by @Matt711 in #20487 - Fix
polars.concat_strwith one column in cudf_polars by @mroeschke in #20535 - Override
__sizeof__forcudf.Indexby @Matt711 in #20530 - Fix
pl.scan_csv(...).slice(...).collect(engine="gpu")with None endpoint by @mroeschke in #20519 - Fix DataChunkSourceTest by syncing default stream by @davidwendt in #20492
- Fix data size errors in some libcudf benchmarks by @davidwendt in #20512
- Pin cython and pytest dependencies by @TomAugspurger in #20571
- Pin Cython pre-3.2.0 and PyTest pre-9 by @jakirkham in #20573
- Handle
Emptychild IRs in_decomposeby @Matt711 in #20409 - Skip flaky pandas datetime test by @Matt711 in #20585
- Fix max-pool-size-exceeded error in DATA_CHUNK_SOURCE_TEST by @davidwendt in #20534
- Fix racecheck in nvtext wordpiece tokenizer kernel by @davidwendt in #20588
- Fix the check to determine if all column chunk pages are dict encoded by @mhaseeb123 in #20524
- Add stream synchronize to QUANTILES_TEST PercentileApprox gtests by @davidwendt in #20558
- updated update-version.sh to handle release branch version changes by @rockhowse in #20598
- Fix nvtext tokenizers handling invalid UTF8 data by @davidwendt in #20514
- Fix overflow errors in distinct and filtered joins when hash table size exceeds int32 limits by @shrshi in #20594
- [FEA] Optimize JIT Filter for Low-Selectivity by @lamarrr in #20222
- Compute boolean function(NOT) on integers as a bitwise invert by @Matt711 in #20599
- Cast output dtype of rolling aggregations to match pandas by @Matt711 in #20526
- Add noop path for
Frame.astypeby @Matt711 in #20581 - Fix
copysemantics bugs thus reduce copies and memory usage by @galipremsagar in #20121 - Ensure the sum after expression decomposition for mean has float output dtype by @Matt711 in #20596
- Use
Decimal(0)literal for all-null decimal groups in groupby-sum by @Matt711 in #20591 - Do not drop
freqwhen constructingDatetimeIndexfrom pandas by @brandon-b-miller in #18778 - Fix --validation flag for cudf.pandas PDSH benchmarks by @mroeschke in #20540
- Enable GPU acceleration for more binops by @galipremsagar in #20507
- Fix
rmmfunction calls due to removed deprecated APIs and macro by @ttnghia in #20661 - Fix orc reader bool bug due to not being able to resume rle decode by @pmattione-nvidia in #20666
- Fix categorical comparisons in
cudfto matchpandasby @galipremsagar in #20674 - Fix
anyandallto match pandas by @galipremsagar in #20679 - Fix return types of string APIs in
cudf.pandasby @galipremsagar in #20683 - Resolve pandas test failures by @galipremsagar in #20704
- Fix DatetimeIndex pickling by @vyasr in #20709
DatetimeIndex.serialize()headers are msgpack serializable by @TomAugspurger in #20714
📖 Documentation
- Add note that --rmm-async only affects distributed scheduler. by @bdice in #20129
- Add profiling guide by @bdice in #20292
- Find RMM before CCCL by @wence- in #20336
- Use current system architecture in conda environment creation command by @bdice in #20500
- Use uname -m instead of arch command by @bdice in #20502
- Use RAPIDS_BRANCH file for documentation links by @bdice in #20494
🚀 New Features
- Add memory resources to unary, transform, and filling modules by @vyasr in #20054
- Add memory resources to binaryop, copying, and stream_compaction by @vyasr in #20059
- Add memory resources to groupby, datetime, and lists modules by @vyasr in #20102
- Add memory resources to search, reshape, and partitioning module by @vyasr in #20101
- Add memory resources to rolling, sorting, and quantiles modules by @vyasr in #20099
- [FEA] Implement JIT Filter for read_parquet by @lamarrr in #19831
- Add memory resources to all nvtext APIs by @vyasr in #20119
- Add memory resource to all strings modules by @vyasr in #20123
- Add memory resources to reduce, column, column_factories, and contiguous_split by @vyasr in #20135
- Add memory resources to I/O modules by @vyasr in #20136
- Remove rounding from cudf java by @pmattione-nvidia in #20110
- Add memory resources to replace, json, and hashing by @vyasr in #20150
- Add support for maintain_order param in joins by @Matt711 in #17698
- Add an example to inspect parquet files and dump row group and page level metadata information by @mhaseeb123 in #20117
- Support forward/backward filling null values in a grouped window context by @Matt711 in #19907
- Allow multiple calls to
cudf::initializeandcudf::deinitializeby @vuule in #20111 - Add remaining memory resources by @vyasr in #20197
- Add memory resources to scalars by @vyasr in #20196
- Add pylibcudf is_valid_reduce_aggregation API by @davidwendt in #20145
- Support decimal literals in cudf-polars by @Matt711 in #20147
- Support
cum_sum(...).over(...)expressions in cudf-polars by @Matt711 in #19908 - Passthrough unary ops through Parquet predicate pushdown by @mhaseeb123 in #20127
- Implement
ARGMINandARGMAXaggregations for reduction by @ttnghia in #20207 - Skip decompression of pruned parquet pages by @mhaseeb123 in #20192
- Add an example to demonstrate the use of next-gen parquet reader to read a parquet file with highly selective filters by @mhaseeb123 in #19469
- Evaluate
IS_NULLat row group and page level in Parquet filtering by @mhaseeb123 in #20144 - [Java] Add optional native deps loader by @zpuller in #20414
- Add cudf-polars + rapidsmpf CI check by @rjzamora in #20355
- Add Python bindings for the hybrid scan reader by @vyasr in #20381
- RapidsMPF streaming-engine translation by @rjzamora in #20161
- [JNI] Use a read/write lock pattern in Rmm.class by @abellina in #20521
- [Java] Supports output projection indices for
contiguousSplitGroupsAndGenUniqKeysby @res-life in #20391 - Support
Series.atandSeries.iatfor pandas compatability by @Matt711 in #20529 - Add COUNT_VALID aggregation support to groupby-scan by @davidwendt in #20531
- Use RapidsMPF
read_parquetin "rapidsmpf" runtime by @rjzamora in #20497 - Support decimal128 SUM aggregation in hash-based groupby by @PointKernel in #20509
- Add stream testing in pylibcudf by @vyasr in #20625
🛠️ Improvements
- Deprecate .from_pandas constructor by @mroeschke in #19996
- Prune entries in Sphinx nitpick_ignore by @mroeschke in #20045
- Avoid direct CategoricalColumn calls in dask_cudf by @mroeschke in #20080
- Fix typing issues in pylibcudf by @vyasr in #20069
- Avoid shadowing module names by @vyasr in #20071
- Remove calling to
purge_nonempty_nullsinmake_lists_columnby @ttnghia in #12873 - Reduce verbosity of running the pandas test suite by @vyasr in #20107
- Clean up detail device atomic logic using atomic_ref by @PointKernel in #19924
- Use 8 processes for pandas tests, show top 10 test times by @bdice in #20109
- Update nvbench by @bdice in #19619
- Cleanup of some libcudf aggregation code by @davidwendt in #20053
- Run cudf-polars conda unit tests with more than 1 process by @mroeschke in #19980
- Avoid running pandas unit tests for private functionality with cudf.pandas by @mroeschke in #20115
- Remove MultiIndex.from_pandas pytest benchmark by @mroeschke in #20112
- Switch host_vector and host_span dependency by @davidwendt in #20106
- Have ListColumn.from_sequence go through pylibcudf by @mroeschke in #20098
- Fix
RAPIDS_BRANCHversion and update script by @galipremsagar in #20091 - Add pyarrow stubs to mypy environment and fix associated errors by @vyasr in #20118
- Fix slowdown in cudf-polars distributed tests by @TomAugspurger in #20137
- Improve performance of string column size computation during parquet reads. by @nvdbaranec in #19986
- Disable async MR priming in cudf.pandas by @bdice in #20133
- Rework reduction case statement as dispatch_type_and_aggregation by @davidwendt in #20078
- Fix type annotations in cudf-polars by @TomAugspurger in #20131
- Add tests for AUTO and HYBRID (de)compression modes by @vuule in #20126
- Branch 25.12 merge branch 25.10 by @vyasr in #20152
- Manual forward merger for Branch 25.12 - branch 25.10 by @galipremsagar in #20157
- Temporarily disable conda-java-tests by @bdice in #20162
- Remove unused ColumnBase.view by @mroeschke in #20141
- Avoid NumericalColumn call from CategoricalColumn.children by @mroeschke in #20153
- Deprecate legacy public row operators by @PointKernel in #20097
- Avoid more explicit calls to IntervalColumn and StructColumn by @mroeschke in #20064
- Run cudf-polars wheels unit tests with more than 1 process by @mroeschke in #20124
- Trace node execution in cudf-polars by @TomAugspurger in #19895
- Make ColumnBase.as_*_column convert via pylibcudf by @mroeschke in #20149
- Reduce execution times for parquet dictionary tests by @mhaseeb123 in #20176
- Update to rapids-logger 0.2 by @bdice in #20172
- Adjust rmm pool handling in PDSH benchmarks by @TomAugspurger in #20138
- Don't assume cudf_polars benchmarking scale factor is always an integer by @mroeschke in #20182
- Skip filtering Parquet row groups with dictionaries if there are non-dict encoded pages by @mhaseeb123 in #20175
- Remove unnecessary work from
read_parquet_metadataby @vuule in #20180 - Improve performance of groupby tdigests gtests by @davidwendt in #20173
- Revert "Temporarily disable conda-java-tests" by @bdice in #20184
- Add PDSH benchmark runner for cudf.pandas by @mroeschke in #20164
- Make Column.set_mask go through pylibcudf by @mroeschke in #20103
- Pin pydantic<2.12 in ci/test_cudf_polars_polars_tests.sh by @mroeschke in #20200
- Add an overhead field to cudf-polars tracing by @TomAugspurger in #20198
- Support binops between float scalar to decimal column by @mroeschke in #20199
- Reduce output buffer sizes for pruned pages of columns with a
listparent by @mhaseeb123 in #20086 - Make ListColumn._transform_leaves convert via pylibcudf by @mroeschke in #20151
- Rename
comparison_binop_generatortoarg_minmax_binop_generatorand corresponding file tonested_types_extrema_utils.cuhby @Copilot in #20212 - Pin polars version <1.34 and >=1.29 by @Matt711 in #19912
- Stop using libcudf default parameters in pylibcudf by @vyasr in #20204
- Fix various typing errors by @vyasr in #20205
- Cleanup parquet for simple columns by @pmattione-nvidia in #19869
- Configuration for which metrics are enabled during tracing by @TomAugspurger in #20223
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #20189
- Fix parquet row number check for page bounds by @pmattione-nvidia in #20217
- More mypy and docs fixes by @vyasr in #20224
- Prevent accidental copies of expensive-to-copy object types by @vuule in #20226
- Split row operator header by @PointKernel in #20166
- Standardize setting StructDtype field names post libcudf conversion by @mroeschke in #20235
- Add arm testing of cudf.pandas unit tests by @vyasr in #20251
- Enable
sccache-distconnection pool by @trxcllnt in #20264 - Run polars tests with the streaming and in-memory executors by @Matt711 in #19354
- Move and rename
ScanPartitionPlanby @rjzamora in #20248 - Unpin DuckDB and Ibis in cudf.pandas thirdparty tests by @mroeschke in #20269
- Add pylibcudf to pre-commit linting and fix outstanding errors by @vyasr in #20250
- Update
ConfigOptionsfor rapidsmpf-streaming integration by @rjzamora in #20252 - Handle unordered grouped windows properly for null filling and cum sums by @Matt711 in #20275
- Add more type annotations to cudf/core/column subclasses by @mroeschke in #20277
- Remove extraneous host_memory_resource include by @bdice in #20284
- Add
MultiIndex.dtypesby @galipremsagar in #20279 - Skip mypy in pre-commit.ci by @bdice in #20286
- Make ColumnBase.deserialize construct via pylibcudf by @mroeschke in #20142
- Add numpy to the mypy pre-commit environment by @vyasr in #20282
- Add ability to set the source_info of parquet_reader_options by @wence- in #20253
- Add more Python type annotations to
cudf/coreby @mroeschke in #20287 - Use main in RAPIDS_BRANCH by @bdice in #20312
- Move "All rights reserved" statements to copyright line by @KyleFromNVIDIA in #20313
- Add
inferred_typeand missingIntervalIndexproperties by @galipremsagar in #20294 - Avoid unseeded, random data generation in cuDF classic tests by @mroeschke in #20319
- Improve hash-based groupby aggregation: direct write to the dense output columns whenever possible by @ttnghia in #19764
- Avoid accessing range values in cudf::strings::contains_re logic by @davidwendt in #20122
- Migrate mixed join to use the multiset data structure by @PointKernel in #19989
- Add benchmark for strings cast to/from integer APIs by @davidwendt in #20247
- Use main shared-workflows branch by @bdice in #20324
- Use the thread pool for Parquet metadata processing by @vuule in #20263
- Add
.dt.day_of_weekand.dt.daysinmonthby @galipremsagar in #20298 - Avoid Column materialization in RangeIndex.nans_to_nulls by @mroeschke in #20331
- Update the code to be compatible with the new cuco stream-ordered allocator by @PointKernel in #20258
- Deprecate Series.data by @mroeschke in #20281
- Align cudf Python's Column constructors by @mroeschke in #20233
- Make type annotations of ColumnBase.set_mask stricter by @mroeschke in #20261
- Make type annotations of ColumnBase.find_and_replace stricter by @mroeschke in #20259
- Make type annotations of ColumnBase.apply_boolean_mask stricter by @mroeschke in #20262
- Skip Python LZ4 tests when nvCOMP is disabled by @vuule in #20293
- Move cudf/io/nvcomp_adapter.hpp to cudf/io/detail by @davidwendt in #20327
- Add context to IR.do_evaluate by @TomAugspurger in #20322
- Update mypy
# type: ignorecomments according to stricter mypy configs by @mroeschke in #20272 - Remove duplicated enforce null consistency code by @mhaseeb123 in #20342
- Use SPDX for all copyright headers by @KyleFromNVIDIA in #20321
- Add more type annotations to
cudf/core/series.pyby @mroeschke in #20304 - Remove/Replace uses of numba.cuda arrays in pytest benchmarks and tests by @mroeschke in #20359
- Add duckdb pdsh query queries by @Matt711 in #20257
- Use stream in cudf_polars.DataFrame.to_polars by @TomAugspurger in #20323
- Add
join_streamsto pylibcudf API by @TomAugspurger in #20316 - Use CUDA streams in all pylibcudf calls made by cudf-polars by @TomAugspurger in #20291
- Add cudf/io/config_utils.hpp to doxygen by @davidwendt in #20329
- Test coverage for parallel metadata parsing by @vuule in #20334
- Support serializing more polars types by @Matt711 in #20347
- Add CUDAStreamPolicy to cudf-polars configuration by @TomAugspurger in #20366
- Unskip cudf-polars groupby test by @Matt711 in #20406
- Deprecate pylibcudf interop arrow APIs by @Matt711 in #20405
- Get rid of the hashing helper header by @PointKernel in #20360
- Minor cleanup and fixes for libcudf generate_input.cu by @davidwendt in #20363
- Ignore assert_produces_warning and shares_memory pandas unit tests for cudf.pandas by @mroeschke in #20434
- Short circut RangeIndex.append for length 0 input, proxy private attribute by @mroeschke in #20442
- Mark DataFrame.insert as _external_only_api by @Copilot in #20403
- Deprecate
get_current_device_resourcein favor ofget_current_device_resource_refby @PointKernel in #20386 - Promote
JoinNoneValueto public asJoinNoMatchfor clear non-match Join semantics by @PointKernel in #20440 - Remove duplicate entries in NODEIDS_THAT_FAIL_WITH_CUDF_PANDAS by @mroeschke in #20447
- Use the thread pool in the compact protocol reader by @vuule in #20417
- Update README.md generalizing all cuDF components by @mroeschke in #20357
- Skip TestDatetimelikeCoercion pandas tests that assert ._value identity by @mroeschke in #20459
- Add PSDH Q2-9 for cudf.pandas by @mroeschke in #20418
- Add s3fs to
test_cudf_pythoncommon dependencies by @trxcllnt in #20473 - Use public pandas APIs in StringColumn.to_pandas by @mroeschke in #20474
- Expose java GatherMap internals and add toString to AST by @revans2 in #20483
- Add create_ascii_string_column to the libcudf benchmark data generator by @davidwendt in #20354
- Skip more pandas unit tests that tests BlockManager, private sparse types by @mroeschke in #20489
- Add boto3/botocore/aiobotocore to common test dependencies by @trxcllnt in #20490
- Use a lower bound when estimating the partial file-size by @rjzamora in #20193
- Performance improvement for nvtext::edit_distance for long strings by @davidwendt in #20268
- Add MemoryResourceConfig to cudf-polars config by @TomAugspurger in #20042
- Improve project automation by @vyasr in #20523
- Fuse simple streaming reductions in cudf-polars by @rjzamora in #18757
- Migrate to new CCCL memory resource interface by @bdice in #20513
- Add empty input gtest for cudf::transform by @davidwendt in #20505
- Rework internal json headers to allow converting gtests files from .cu to .cpp by @davidwendt in #20491
- Set continue on error in the cudf-polars-rapidsmpf nightly CI job by @Matt711 in #20550
- Permanently back cuDF column by a pylibcudf.Column by @mroeschke in #20306
- Skip flaky upstream polars rolling test by @Matt711 in #20552
- Accelerate data page mask computation on device by @mhaseeb123 in #20280
- Change default rapidsmpf stream policy to 'pool' by @TomAugspurger in #20527
- Increase gtests coverage for cudf::strings::like patterns by @davidwendt in #20348
- Add cuda::std::span operator to cudf::column_view by @davidwendt in #20541
- Update ArrowStringView compare benchmark for gather by @davidwendt in #19935
- Add pytest stubs and remove ujson usage by @vyasr in #20560
- Skip arrow array constructor tests by @Matt711 in #20579
- Add Polars to mypy environment and fix errors by @vyasr in #20563
- Ensure table chunks are unspilled and available by @madsbk in #20583
- Skip tests that assert behavior when copy-on-write is False by @Matt711 in #20506
- Pass streams through
Column.from_array/from_iterable_of_pyby @Matt711 in #20569 - Stop using Dtype annotation by @vyasr in #20590
- Workaround to enable running PDS-H via WebHDFS by @kingcrimsontianyu in #20132
- Update RMM includes from
<rmm/mr/device/*>to<rmm/mr/*>by @bdice in #20607 - Stricter typing import for cudf-polars by @TomAugspurger in #20614
- Avoid the unnecessary H2H copy in the
std::vectorsink by @vuule in #20602 - Preprocessing offsets for Parquet non-dictionary string columns by @pmattione-nvidia in #20430
- Move more pandas unit tests that test private APIs by @mroeschke in #20511
- Use
.plc_columninstead of.to_pylibcudfin rolling, string utilties by @mroeschke in #20562 - Skip TestSetitemNADatetimeLikeDtype pandas unit tests due to private assertion by @mroeschke in #20578
- Pin Polars version <1.35 by @Matt711 in #20266
- Skip pandas unit tests in
test_old_base.pythat test private APIs by @mroeschke in #20572 - Use
.plc_columnattribute instead ofto_pylibcudfmore internally by @mroeschke in #20559 - Skip arrow-backed arithmetic tests and categorize the remaining failing tests by @Matt711 in #20577
- Fix a pytest execution that is spawned in a subprocess by @galipremsagar in #20660
- Accelerated parquet page header decoding when page index is available by @mhaseeb123 in #20369
- feat: add error handling for non-existent columns in parquet reader by @gforsyth in #20659
- Optimize row mask computation for single filter column by @mhaseeb123 in #20335
- Skip MultiIndex pandas unit tests testing private functionalty,
test_chaining_and_caching.pyby @mroeschke in #20575 - Address minor comments from recent hybrid scan PRs by @mhaseeb123 in #20672
- Add a timeout for the rapidsmpf test run by @vyasr in #20681
- Use
sccache-distbuild cluster for conda and wheel builds by @trxcllnt in #20488
New Contributors
- @Copilot made their first contribution in #20212
- @rockhowse made their first contribution in #20598
Full Changelog: v25.12.00a...v25.12.00