🚨 Breaking Changes
- Remove unused
group_range_rolling_window
API (#18313) @wence- - [BUG] Disabled JIT for CUDA Runtime < 11.5 (#18296) @lamarrr
- Remove cudf.Scalar from binops (#18240) @mroeschke
- Enforce deprecation of dtype parameter in sum/product (#18070) @mroeschke
- Remove deprecated single component datetime extract APIs (#18010) @Matt711
- Remove deprecated rolling window functionality (#17993) @wence-
- Remove deprecated nvtext::minhash_permuted APIs (#17939) @davidwendt
- Remove dataframe protocol (#17909) @vyasr
- Use new rapids-logger library (#17899) @vyasr
- Added Multi-input & Scalar Support for Transform UDFs (#17881) @lamarrr
- Fixed incorrect PTX parsing of
ret
instruction after branch label (#17859) @lamarrr - Use KvikIO to enable file's fast host read and host write (#17764) @kingcrimsontianyu
🐛 Bug Fixes
- Fix alpha versions of cudf package. (#18429) @bdice
- Backport: Deterministic hashing for DataFrameScan nodes in cudf-polars multi-partition executor (#18351) (#18420) @bdice
- Skip failing Narwhals rolling groupy tests (#18398) @Matt711
- Pin cmake in test_java to be less than 4.0.0 (#18392) @abellina
- Skip polars tests that fail with pydantic deprecation warnings (#18388) @Matt711
- Backport: Fix index of right table in unary operators in AST, in Joins (#18342) @bdice
- xfail narwhals sqlframe tests (#18297) @Matt711
- [BUG] Disabled JIT for CUDA Runtime < 11.5 (#18296) @lamarrr
- Make a pylibcudf Column from a device array object with
strides=None
(#18295) @Matt711 - Fix
cudf.pandas
objects to not beCallable
(#18288) @galipremsagar - Skip failing polars test test_general_prefiltering (#18264) @Matt711
- Filter all cudf.pandas profiler tests from running in parallel (#18262) @Matt711
- Allow cudf.Series([pd.NA], dtype=, nan_as_null=False) (#18259) @mroeschke
- Fix
cross
join with extra columns (#18256) @galipremsagar - Fix
Dataframe.loc
to not modify the actual dataframe (#18254) @galipremsagar - Remove RMM macro usage from to_arrow_device.cu (#18252) @davidwendt
- Skip Narwhals cross join tests for cudf.pandas CI run (#18249) @Matt711
- Fix cudf-polars tests for polars < 1.24 (#18246) @wence-
- Fix experimental cudf-polars tests (#18244) @rjzamora
- Fix
datetime64
vsdatetime
binops max resolution (#18241) @galipremsagar - Use CCCL::libcudacxx include directories in Jitify preprocessing. (#18233) @bdice
- Disable conda prefix patching to avoid mangling binaries (#18225) @vyasr
- Workaround for ARM compiler issue with single space literal string (#18220) @davidwendt
- Bump nightly check limit (#18213) @Matt711
- Support comparitive binops between catgorical and non categorical (#18200) @mroeschke
- Make the version file inside cudf.pandas not a symlink (#18198) @vyasr
- Ensure RAPIDS_ARTIFACTS_DIR is set for build metrics reports. (#18192) @bdice
- Ignore run exports of libcufile. (#18190) @bdice
- Skip flaky multi GPU test (#18187) @Matt711
- Fix BPE merges table static-map capacity size (#18184) @davidwendt
- Drop
CUB_QUOTIENT_CEILING
(#18179) @miscco - Disable ARM CI in C++ and Python test CI jobs (#18175) @Matt711
- Add fmt to the test/benchmarks env (#18173) @vyasr
- Fix merge(how=left, left_on=, right_index=True, sort=True) (#18166) @mroeschke
- Allow nonnative cupy dtype in cudf.Series (#18164) @mroeschke
- Fix Series construction from numpy array with non-native byte order (#18151) @mroeschke
- Use protocol for dlpack instead of deprecated function in cupy notebook (#18147) @Matt711
- Skip failing test (#18146) @vyasr
- Update calls to KvikIO's config setter (#18144) @kingcrimsontianyu
- Reduce memory use when writing tables with very short columns to ORC (#18136) @vuule
- Handle empty dictionary in to_arrow_device interop (#18121) @davidwendt
- Allow pivot_table to accept single label index and column arguments (#18115) @mroeschke
- Preserve DataFrame.column subclass and type during binop (#18113) @mroeschke
- Fix rmm macro call (#18108) @pmattione-nvidia
- Add include for
<functional>
(#18102) @miscco - Remove static column vectors from window function tests. (#18099) @mythrocks
- Fix scatter_by_map with spilling enabled (#18095) @mroeschke
- Use the right version macro
CCCL_MAJOR_VERSION
(#18073) @miscco - Fix
test_scan_csv_multi
cudf-polars test (#18064) @rjzamora - Fix memcopy direction for concatenate (#18058) @tgujar
- Fix upstream dask
loc
test (#18045) @rjzamora - Fix hang on invalid UTF-8 data in string_view iterator (#18039) @davidwendt
- Fix
dask_cudf.to_orc
deprecation (#18038) @rjzamora - Compatibility with dask.dataframe's
is_scalar
(#18030) @TomAugspurger - Fix the build error due to KvikIO update (#18025) @kingcrimsontianyu
- Fix failing ibis test (#18022) @Matt711
- Skip failing polars tests (#18015) @Matt711
- Fix
to_arrow
to return consistent pandas-metadata (#18009) @galipremsagar - Prevent setting custom attributes to
ColumnMethods
(#18005) @galipremsagar - Compatibility with Dask
main
(#17992) @TomAugspurger - [Bug] Fix Parquet-metadata sampling in cudf-polars (#17991) @rjzamora
- Add missing include for calling std::iota() (#17983) @davidwendt
- Fix pickle and unpickling for all objects (#17980) @galipremsagar
- Install duckdb the default backend for ibis in the cudf.pandas integration tests (#17972) @Matt711
- Check null count too in sum aggregation (#17964) @Matt711
- Raise NotImplementedError for groupby.agg if duplicate columns would be created (#17956) @mroeschke
- Ensure disabling the module accelerator is thread-safe (#17955) @vyasr
- Fix DataFrame/Series.rank for int and null data in mode.pandas_compatible (#17954) @mroeschke
- Limit buffer size in reallocation policy in JSON reader (#17940) @shrshi
- Make
cudf.pandas
proxy array picklable (#17929) @Matt711 - Add missing standard includes (#17928) @miscco
- Fix torch integration test (#17923) @Matt711
- Fix
to_pandas
writable bug fordatetime
andtimedelta
types (#17913) @galipremsagar - Raise NotImplementedError if
.merge(suffixes=)
introduces duplicate labels (#17905) @mroeschke - Fix groupby scans with int and NA data in mode.pandas_compatible (#17895) @mroeschke
- Patch
__init__
ofcudf
constructors to parse throughcudf.pandas
proxy objects (#17878) @galipremsagar - Fixed incorrect PTX parsing of
ret
instruction after branch label (#17859) @lamarrr - Relax inconsistent schema handling in
dask_cudf.read_parquet
(#17554) @rjzamora
📖 Documentation
- Clarify that cudf.pandas should be enabled before importing pandas. (#18339) @bdice
- [DOC] Add wordpiece tokenizer to cudf documentation (#18247) @davidwendt
- Added pylibcudf.contiguous_split to API docs (#18194) @TomAugspurger
- Fix build.sh docs for default behavior (#18180) @bdice
- Update Dask-cuDF documentation to fix all warnings and errors (#18157) @TomAugspurger
- [DOC] Document character normalizer (#18125) @Matt711
🚀 New Features
- Add and revise experimental cudf-polars config options (#18284) @rjzamora
- Support
top-k
andbottom_k
expressions (#18222) @Matt711 - Support
cudf-polars
is_leap_year
(#18212) @brandon-b-miller - Support
cudf-polars
month_start
/month_end
(#18211) @brandon-b-miller - Support
cudf-polars
ordinal_day
(#18152) @brandon-b-miller - Add
pylibcudf.gpumemoryview
support forlen()
/nbytes
(#18133) @pentschev - Link to libzstd for ZSTD compression and decompression APIs (#18129) @shrshi
- Added NDSH Q09 Benchmark for Transforms (#18127) @lamarrr
- Make pylibcudf traits raise exceptions gracefully rather than terminating in C++ (#18117) @Matt711
- Host decompression (#18114) @vuule
- Add owning types to hold Arrow data (#18084) @vyasr
- Bump polars version to <1.24 (#18076) @Matt711
- Support sorted merges in cudf.polars (#18075) @Matt711
- Add a slice expression to polars IR (#18050) @Matt711
- Expose
num_rows_per_source
(IO metadata) to pylibcudf (#18049) @Matt711 - Added Imbalanced Tree Benchmarks for Transforms (#18032) @lamarrr
- Run the narwhals test suite with cudf.pandas (#18031) @Matt711
- Add
host_read_async
interfaces todatasource
(#18018) @vuule - Make most cudf-polars
Node
objects pickleable (#17998) @rjzamora - Add
Column.serialize
to cudf-polars (#17990) @rjzamora - Bump polars version to <1.23 (#17986) @Matt711
- Implemented Decimal Transforms (#17968) @lamarrr
- Introduce ZSTD host-side compression and decompression APIs (#17935) @shrshi
- Add catboost integration tests (#17931) @Matt711
- [FEA] Expose
stripe_size_rows
setting forORCWriterOptions
(#17927) @ustcfy - Test narwhals in CI (#17884) @bdice
- Added Multi-input & Scalar Support for Transform UDFs (#17881) @lamarrr
- Host Snappy compression (#17824) @vuule
- Run spark-rapids-jni CI (#17781) @KyleFromNVIDIA
- Add multi-partition
Shuffle
operation to cuDF Polars (#17744) @rjzamora - Added polynomials benchmark (#17695) @lamarrr
- Add stream parameters in pylibcudf IO APIs (#17620) @Matt711
- New nvtext::wordpiece_tokenizer APIs (#17600) @davidwendt
- Add support for unary negation operator (#17560) @Matt711
- Add multi-partition
Join
support to cuDF-Polars (#17518) @rjzamora - Add basic multi-partition
GroupBy
support to cuDF-Polars (#17503) @rjzamora - Support Distributed in cudf-polars tests and IR evaluation (#17364) @pentschev
🛠️ Improvements
- Use pyarrow 15 in oldest dependency CI jobs (#18409) @bdice
- Bump librdkafka to 2.8.0 (#18370) @raydouglass
- fix(rattler): ignore
libzlib
run dependency to avoidpandoc
collision (#18368) @gforsyth - Fix zstd build interface include definition (#18366) @trxcllnt
- test: Install pytest-env and hypothesis in test_narwhals.sh (#18337) @MarcoGorelli
- Remove unused
group_range_rolling_window
API (#18313) @wence- - Cache column view creation from arrow types (#18302) @vyasr
- Split Narwhals cudf.pandas tests failures into to fix and to skip (#18267) @mroeschke
- Support BinOp, min, and max Aggregations in cudf-polars parallel groupby (#18266) @TomAugspurger
- Minor clean up and optimizations in the Parquet writer (#18258) @vuule
- Fix
cudf_kafka
run export forcudatoolkit
(#18245) @gforsyth - dask-polars: use splat everywhere. (#18243) @madsbk
- Remove cudf.Scalar from binops (#18240) @mroeschke
- Remove warning in the stream pool when asking for more streams than available (#18236) @vuule
- Explain why we disable parallelism for profiler tests to avoid pytest-cov issue (#18234) @Matt711
- Ignore
cudatoolkit
run exports by name, not package (#18230) @gforsyth - Revert "Bump nightly check limit" (#18227) @Matt711
- Fix
cudf.pandas
to be able to work on a cpu-only machine (#18224) @galipremsagar - Add missing
cudatoolkit
run_export ignore topylibcudf
(#18223) @gforsyth - Remove cudf.Scalar from Column.setitem (#18221) @mroeschke
- Remove unused round_up_pow2 utility (#18218) @PointKernel
- Add flake8-print/debugger Ruff rules (#18217) @mroeschke
- Bump polars version to <1.25 (#18209) @Matt711
- Export RAPIDS_ARTIFACTS_DIR. (#18208) @bdice
- Drop more thrust functions with libcu++ ones (#18207) @miscco
- Update Numpy <2.1 unpinning xfail condition (#18203) @mroeschke
- Run conda import tests on Python packages (#18197) @bdice
- fix(rattler): add
cudatoolkit
ignore run export tocudf
(#18195) @gforsyth - Revert "Disable ARM CI in C++ and Python test CI jobs" (#18188) @Matt711
- Define Column.where to be used across DataFrame/Series (#18186) @mroeschke
- Remove cudf.Scalar in where (#18178) @mroeschke
- Drop unnecessary fmt dep (#18177) @vyasr
- Refactor join internals: separate hash_join declaration and cleanup (#18170) @PointKernel
- Add Ruff rule to enforce cudf dtype utils over numpy/pandas dtype utils (#18169) @mroeschke
- Combine multiple str.minhash() APIs into one call (#18168) @davidwendt
- Move nanoarrow_utils.hpp from cpp/tests/interop to cpp/include/cudf_test (#18163) @davidwendt
- Test cudf against the latest stable branch of Narwhals (#18162) @Matt711
- fix libcudf pins cu11 (#18161) @gforsyth
- Combine separate ConfigureNVBench calls to fix cpp conda builds (#18155) @gforsyth
- Add telemetry to build workflows (#18154) @gforsyth
- Prune more seldom used dtype utils (#18150) @mroeschke
- Remove some unnecessary module imports (#18143) @mroeschke
- Branch 25.04 merge branch 25.02 (#18142) @vyasr
- Prune some seldom used dtype utils (#18141) @mroeschke
- Use more, cheaper dtype checking utilities in cudf Python (#18139) @mroeschke
- Support deserializing cudf-polars objects composed of RMM frames (#18138) @pentschev
- Add
ConfigOptions
convenience class to cudf-polars (#18137) @rjzamora - Support new callback API for lazyframe.profile (#18132) @wence-
- Optimized compilation of CUDFTESTUTIL's interface sources (#18131) @lamarrr
- Unpin numpy<2.1 (#18128) @mroeschke
- Use cpu16 for build CI jobs (#18124) @bdice
- Remove now non-existent job (#18123) @vyasr
- Minor typo fix in filling.pxd (#18120) @davidwendt
- Replace more deprecated
CUB
functors (#18119) @miscco - Simplify DecimalDtype and DecimalColumn operations (#18111) @mroeschke
- Add interop support from arrow StringView to libcudf strings column (#18107) @davidwendt
- Expose the Number of Filtered Parquet Rowgroups (IO Metadata) to pylibcudf (#18106) @JigaoLuo
- Add a list of expected failures to narwhals tests (#18097) @Matt711
- Remove unused var (#18096) @vyasr
- Run narwhals tests nightly. (#18093) @bdice
- Use conda-build instead of conda-mambabuild (#18092) @bdice
- Remove static configure step (#18091) @vyasr
- Remove
FindCUDAToolkit.cmake
from.pre-commit-config.yaml
(#18087) @KyleFromNVIDIA - Align StringColumn constructor with ColumnBase base class (#18086) @mroeschke
- Remove
FindCUDAToolkit
backport (#18081) @KyleFromNVIDIA - Support melt(ignore_index=False) (#18080) @mroeschke
- Update numba dep and upper-bound numpy (#18078) @vyasr
- Add
as_proxy_object
API tocudf.pandas
(#18072) @galipremsagar - Enforce deprecation of dtype parameter in sum/product (#18070) @mroeschke
- send sccache logs to telemetry (#18069) @msarahan
- Short circuit Index.equal if compared Index isn't same type (#18067) @mroeschke
- Make Column.view/can_cast_safely accept a dtype object (#18066) @mroeschke
- Optimization improvement for substr in cudf::string_view (#18062) @davidwendt
- Forward-merge branch-25.02 to branch-25.04 (#18061) @bdice
- Port all conda recipes to
rattler-build
(#18054) @gforsyth - Minor improvements in arrow interop (#18053) @wence-
- Pass more dtype objects to
astype
calls (#18044) @mroeschke - Forward merge branch-25.02 to branch-25.04 (#18041) @Matt711
- Replace deprecated CCCL features (#18036) @miscco
- Separate stats filtering helpers to reuse in page pruning (#18034) @mhaseeb123
- Update spark-rapids-jni CI image version to cuda12.8.0 (#18024) @pxLi
- Add pylibcudf.Scalar.from_numpy for bool/int/float/str types (#18020) @mroeschke
- Support IntervalDtype(subtype=None) (#18017) @mroeschke
- Enable pytest-xdist runs for py-polars tests (#18016) @galipremsagar
- consolidate more conda solves in CI (#18014) @jameslamb
- Replace
cub::Int2Type
withcuda::std::integral_constant
(#18013) @miscco - Remove deprecated single component datetime extract APIs (#18010) @Matt711
- Pass dtype objects to Column.astype (#18008) @mroeschke
- Require CMake 3.30.4 (#18007) @robertmaynard
- Refactor math_ops.cu dispatcher logic (#18006) @davidwendt
- Move cudf::lists::detail::make_empty_lists_column to public API (#17996) @davidwendt
- Create Conda CI test env in one step (#17995) @KyleFromNVIDIA
- Add seed parameter to cudf hash_character_ngrams (#17994) @davidwendt
- Remove deprecated rolling window functionality (#17993) @wence-
- Continue on failures in cudf.pandas integration tests CI job (#17987) @Matt711
- Avoid cudf.dtype calls in build_column/column_empty/.where (#17979) @mroeschke
- Ensure dtype objects are passed within Column.astype (#17978) @mroeschke
- Use Conda XGBoost (#17959) @jakirkham
- Read the footers in parallel when reading multiple Parquet files (#17957) @vuule
- Refactor predicate pushdown to reuse row group pruning in experimental PQ reader (#17946) @mhaseeb123
- Add new nvtext tokenized minhash API (#17944) @davidwendt
- Use shared-workflows branch-25.04 (#17943) @bdice
- Get rid of the deprecated
thrust::identity
(#17942) @PointKernel - Remove deprecated nvtext::minhash_permuted APIs (#17939) @davidwendt
- Enable third party library integration tests in CI with
cudf.pandas
(#17936) @galipremsagar - Add build_type input field for
test.yaml
(#17925) @gforsyth - Remove cudf.Scalar from shift/fillna (#17922) @mroeschke
- Enabling
cross
join incudf
python (#17921) @galipremsagar - Use
rapids-pip-retry
in CI jobs that might need retries (#17920) @gforsyth - More avoid cudf.dtype internally in favor of pre-defined, supported types (#17918) @mroeschke
- Initialize inout parameter (#17911) @miscco
- Remove dataframe protocol (#17909) @vyasr
- Rename PascalCase functions and types to to snake_case to improve consistency (#17908) @vuule
- Use new rapids-logger library (#17899) @vyasr
- Add
pylibcudf.Scalar.from_py
for construction from Python strings, bool, int, float (#17898) @mroeschke - Remove cudf.Scalar from factorize (#17897) @mroeschke
- disallow fallback to Make in Python builds (#17894) @jameslamb
- Remove
orc::gpu
namespace (#17891) @vuule - Only run Auto Assign PR workflow if PR is not merged (#17888) @mroeschke
- Update pre-commit-hooks to version 0.6.0 (#17887) @KyleFromNVIDIA
- Forward-merge branch-25.02 to branch-25.04 (#17885) @bdice
- Add script to run pylibcudf tests (#17882) @bdice
- Migrate to NVKS for amd64 CI runners (#17877) @bdice
- Fix merge conflict for branch-25.02 into branch-25.04 (#17874) @davidwendt
- Remove decimal32/64 to decimal128 conversion in Parquet writer (#17869) @mhaseeb123
- Expose JSON reader options to builder in pylibcudf (#17866) @shrshi
- Remove cudf.Scalar from .dt timedelta properties (#17863) @mroeschke
- Added support for custom types in PTX parser (#17861) @lamarrr
- Remove cudf.Scalar from date_range/to_datetime (#17860) @mroeschke
- Avoid
cudf.dtype
internally in favor of pre-defined, supported types (#17839) @mroeschke - Allow cudf::type_to_id<T const>() (#17831) @esoha-nvidia
- Fixing auto-merge branch-25.02 into branch-25.04 (#17828) @davidwendt
- Add new nvtext::normalize_characters API (#17818) @davidwendt
- Include more information in error messages in the nvcomp adapter (#17814) @vuule
- Extend and simplify API for calculation of range-based rolling window offsets (#17807) @wence-
- More minor fixes for CCCL (#17793) @miscco
- Use KvikIO to enable file's fast host read and host write (#17764) @kingcrimsontianyu
- Remove cudf._lib.column in favor of pylibcudf. (#17760) @mroeschke
- Replaced std::string with std::string_view and removed excessive copies in cudf::io (#17734) @lamarrr
- Use xdist worksteal on the
cudf.pandas
test suite (#16930) @Matt711