🔗 Links
🚨 Breaking Changes
- Promote Parquet type enums to enum classes (#18441) @mhaseeb123
- Move parquet schema types and structs to public headers (#18424) @mhaseeb123
- Start removal of vector factories with
_sync
suffix by deprecating them and adding versions without the suffix (#18414) @vuule - Skip decoding of pages marked as pruned in PQ reader (#18347) @mhaseeb123
- Deprecate nvtext subword tokenizer (#18334) @davidwendt
- Add standard data ingestion pipelines to pylibcudf for ndarrays (#18311) @Matt711
- Add Keep Option Parameter to Distinct (#18237) @warrickhe
🐛 Bug Fixes
- Fix an error when reading some compressed Parquet V2 files (#18478) @vuule
- Ensure DataFrame column label operations reset label_dtype (#18452) @mroeschke
- Fix a segfault when reading a Parquet file with unsupported compression type (#18451) @vuule
- Fix logger macros (#18444) @vyasr
- Use delete not free to release data allocated with new (#18412) @wence-
- Fix synchronization issues in host compression and decompression (#18395) @vuule
- Update Dask array-conversion handling (#18382) @rjzamora
- Fixed indexing on empty DataFrame with no columns (#18381) @TomAugspurger
- Deterministic hashing for DataFrameScan nodes in cudf-polars multi-partition executor (#18351) @TomAugspurger
- Fix index of right table in unary operators in AST, in Joins (#18333) @karthikeyann
- Add offsetalator to contiguous-split (#18312) @davidwendt
- Support large strings in nvtext vocabulary-tokenizer (#18283) @davidwendt
📖 Documentation
- [DOC] Improve clarity in parquet APIs set_row_groups and set_columns parquet (#18466) @Matt711
- Add a usage page to cudf-polars documentation (#18460) @Matt711
- [DOC] Fix typo in CONTRIBUTING.md on build type tests (#18456) @JigaoLuo
- Add restart kernel note in cudf pandas docs (#18374) @ncclementi
🚀 New Features
- Move parquet schema types and structs to public headers (#18424) @mhaseeb123
- Add optional dtype argument to
Scalar.from_any
(#18415) @Matt711 - Expose
cudf::chunked_pack
in pylibcudf (#18411) @wence- - Add support for long string columns in cudf::contiguous_split (#18393) @nvdbaranec
- Automatically dispatch between host and device decompression/compression based on the number of buffers (#18363) @vuule
- Skip decoding of pages marked as pruned in PQ reader (#18347) @mhaseeb123
- Support constructing pylibcudf Columns and Tables from views into arbitrary objects (#18314) @vyasr
- Add standard data ingestion pipelines to pylibcudf for ndarrays (#18311) @Matt711
- Support
cudf-polars
isoyear
andweek
(isoweek
) (#18265) @brandon-b-miller - Add Keep Option Parameter to Distinct (#18237) @warrickhe
- Add rapidsmp shuffle support to cudf-polars (#18231) @rjzamora
- Support
cudf-polars
strftime
(#18181) @brandon-b-miller - Support
include_file_paths
in cudf polars (#18057) @Matt711
🛠️ Improvements
- Fix unspecified behavior involving move semantics and order of evaluation (#18481) @kingcrimsontianyu
- Rerun flaky pytests in CI (#18476) @galipremsagar
- Vendor RAPIDS.cmake (#18473) @bdice
- Add ARM conda environments. (#18470) @bdice
- Bump polars version to <1.28 (#18469) @Matt711
- Promote Parquet type enums to enum classes (#18441) @mhaseeb123
- Replace direct use of nvCOMP and of its adapter with the higher-level decompression API (#18434) @vuule
- Test against stable tags for narwhals (#18431) @Matt711
- Refcount-based dropping of cached evaluations in cudf-polars executor (#18430) @wence-
- Replace
Thrust
iterator facilities with libcu++ ones (#18427) @miscco - Remove numpy requirement when converting 2d cuda array interface objects to pylibcudf Columns (#18426) @Matt711
- Switch the ptr type in gpumemoryview from Py_ssize_t to uintptr_t (#18419) @Matt711
- Start removal of vector factories with
_sync
suffix by deprecating them and adding versions without the suffix (#18414) @vuule - Allow polars arrow conversion to produce string_view (#18413) @wence-
- Add rank and label_bin methods to ColumnBase (#18407) @mroeschke
- Automatic single-partition fallback in cudf-polars (#18405) @rjzamora
- Remove
_sync
suffix from hostdevice types (#18404) @vuule - add static push and pop methods to NvtxRange (#18401) @zpuller
- Deprecate cudf.Scalar (#18394) @mroeschke
- Bump polars version to <1.27 (#18387) @Matt711
- Branch 25.06 merge 25.04 (#18380) @Matt711
- Silence warning by setting BUILD_SHARED_LIBS (#18371) @vyasr
- Pass stream through when taking ownership from libcudf (#18367) @wence-
- Deprecate old nvtext::normalize_characters (#18360) @davidwendt
- Optimize
sequences
by introducingmake_offsets_child_column
(#18357) @ustcfy - Decompress all data in a single
decompress_page_data
when reading Parquet input in a single chunk (#18352) @vuule - Performance improvement for to_lower/to_upper for multi-byte UTF-8 characters (#18345) @davidwendt
- Branch 25.06 merge branch 25.04 (#18344) @vyasr
- Use dask-cuda for cudf-polars experimental testing (#18343) @rjzamora
- Deprecate nvtext subword tokenizer (#18334) @davidwendt
- Remove cudf.Scalar in as_column (#18331) @mroeschke
- Allow
cudf.DataFrame.from_pylibcudf
to accept apylibcudf.io.TableWithMetadata
(#18319) @mroeschke - Avoid stateful construction in
DataFrame.__init__
(#18306) @mroeschke - Improve the groupby performance for extremely low cardinality (#18290) @PointKernel
- Require type annotations in cudf.polars (#18285) @TomAugspurger
- Removing unnecessary StreamSynchronization in reading (#18279) @JigaoLuo
- Use the mapped buffer for all read operations in the memory-mapped source; switch default source to the kvikIO one (#18204) @vuule
- Improve test coverage in the catboost integration tests (#18126) @Matt711
- Create file sources in parallel (#18094) @vuule