rapidsai/cudf v25.06.00a on GitHub

🔗 Links

🚨 Breaking Changes

Promote Parquet type enums to enum classes (#18441) @mhaseeb123
Move parquet schema types and structs to public headers (#18424) @mhaseeb123
Start removal of vector factories with _sync suffix by deprecating them and adding versions without the suffix (#18414) @vuule
Skip decoding of pages marked as pruned in PQ reader (#18347) @mhaseeb123
Deprecate nvtext subword tokenizer (#18334) @davidwendt
Add standard data ingestion pipelines to pylibcudf for ndarrays (#18311) @Matt711
Add Keep Option Parameter to Distinct (#18237) @warrickhe

🐛 Bug Fixes

Fix an error when reading some compressed Parquet V2 files (#18478) @vuule
Ensure DataFrame column label operations reset label_dtype (#18452) @mroeschke
Fix a segfault when reading a Parquet file with unsupported compression type (#18451) @vuule
Fix logger macros (#18444) @vyasr
Use delete not free to release data allocated with new (#18412) @wence-
Fix synchronization issues in host compression and decompression (#18395) @vuule
Update Dask array-conversion handling (#18382) @rjzamora
Fixed indexing on empty DataFrame with no columns (#18381) @TomAugspurger
Deterministic hashing for DataFrameScan nodes in cudf-polars multi-partition executor (#18351) @TomAugspurger
Fix index of right table in unary operators in AST, in Joins (#18333) @karthikeyann
Add offsetalator to contiguous-split (#18312) @davidwendt
Support large strings in nvtext vocabulary-tokenizer (#18283) @davidwendt

📖 Documentation

[DOC] Improve clarity in parquet APIs set_row_groups and set_columns parquet (#18466) @Matt711
Add a usage page to cudf-polars documentation (#18460) @Matt711
[DOC] Fix typo in CONTRIBUTING.md on build type tests (#18456) @JigaoLuo
Add restart kernel note in cudf pandas docs (#18374) @ncclementi

🚀 New Features

Move parquet schema types and structs to public headers (#18424) @mhaseeb123
Add optional dtype argument to Scalar.from_any (#18415) @Matt711
Expose cudf::chunked_pack in pylibcudf (#18411) @wence-
Add support for long string columns in cudf::contiguous_split (#18393) @nvdbaranec
Automatically dispatch between host and device decompression/compression based on the number of buffers (#18363) @vuule
Skip decoding of pages marked as pruned in PQ reader (#18347) @mhaseeb123
Support constructing pylibcudf Columns and Tables from views into arbitrary objects (#18314) @vyasr
Add standard data ingestion pipelines to pylibcudf for ndarrays (#18311) @Matt711
Support cudf-polars isoyear and week (isoweek) (#18265) @brandon-b-miller
Add Keep Option Parameter to Distinct (#18237) @warrickhe
Add rapidsmp shuffle support to cudf-polars (#18231) @rjzamora
Support cudf-polars strftime (#18181) @brandon-b-miller
Support include_file_paths in cudf polars (#18057) @Matt711

🛠️ Improvements

Fix unspecified behavior involving move semantics and order of evaluation (#18481) @kingcrimsontianyu
Rerun flaky pytests in CI (#18476) @galipremsagar
Vendor RAPIDS.cmake (#18473) @bdice
Add ARM conda environments. (#18470) @bdice
Bump polars version to <1.28 (#18469) @Matt711
Promote Parquet type enums to enum classes (#18441) @mhaseeb123
Replace direct use of nvCOMP and of its adapter with the higher-level decompression API (#18434) @vuule
Test against stable tags for narwhals (#18431) @Matt711
Refcount-based dropping of cached evaluations in cudf-polars executor (#18430) @wence-
Replace Thrust iterator facilities with libcu++ ones (#18427) @miscco
Remove numpy requirement when converting 2d cuda array interface objects to pylibcudf Columns (#18426) @Matt711
Switch the ptr type in gpumemoryview from Py_ssize_t to uintptr_t (#18419) @Matt711
Start removal of vector factories with _sync suffix by deprecating them and adding versions without the suffix (#18414) @vuule
Allow polars arrow conversion to produce string_view (#18413) @wence-
Add rank and label_bin methods to ColumnBase (#18407) @mroeschke
Automatic single-partition fallback in cudf-polars (#18405) @rjzamora
Remove _sync suffix from hostdevice types (#18404) @vuule
add static push and pop methods to NvtxRange (#18401) @zpuller
Deprecate cudf.Scalar (#18394) @mroeschke
Bump polars version to <1.27 (#18387) @Matt711
Branch 25.06 merge 25.04 (#18380) @Matt711
Silence warning by setting BUILD_SHARED_LIBS (#18371) @vyasr
Pass stream through when taking ownership from libcudf (#18367) @wence-
Deprecate old nvtext::normalize_characters (#18360) @davidwendt
Optimize sequences by introducing make_offsets_child_column (#18357) @ustcfy
Decompress all data in a single decompress_page_data when reading Parquet input in a single chunk (#18352) @vuule
Performance improvement for to_lower/to_upper for multi-byte UTF-8 characters (#18345) @davidwendt
Branch 25.06 merge branch 25.04 (#18344) @vyasr
Use dask-cuda for cudf-polars experimental testing (#18343) @rjzamora
Deprecate nvtext subword tokenizer (#18334) @davidwendt
Remove cudf.Scalar in as_column (#18331) @mroeschke
Allow cudf.DataFrame.from_pylibcudf to accept a pylibcudf.io.TableWithMetadata (#18319) @mroeschke
Avoid stateful construction in DataFrame.__init__ (#18306) @mroeschke
Improve the groupby performance for extremely low cardinality (#18290) @PointKernel
Require type annotations in cudf.polars (#18285) @TomAugspurger
Removing unnecessary StreamSynchronization in reading (#18279) @JigaoLuo
Use the mapped buffer for all read operations in the memory-mapped source; switch default source to the kvikIO one (#18204) @vuule
Improve test coverage in the catboost integration tests (#18126) @Matt711
Create file sources in parallel (#18094) @vuule

rapidsai/cudf v25.06.00a [NIGHTLY] v25.06.00 on GitHub

🔗 Links

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

rapidsai/cudf v25.06.00a
[NIGHTLY] v25.06.00

on GitHub