cuda-cccl Python package — version 0.7.0
Release date: May 5th, 2026. Previous release: v0.6.0.
cuda-cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.
Installation
Please refer to the install instructions here
API breaking changes
-
All
cuda.computefunctions now require keyword-only arguments (#8772)Every top-level function and factory (
make_*) incuda.computenow enforces keyword-only call
syntax (i.e., all parameters must be passed by name). Positional calls will raise aTypeError.Before:
reduce_into(d_in, d_out, op, num_items, h_init)
After:
reduce_into(d_in=d_in, d_out=d_out, num_items=num_items, op=op, h_init=h_init)
Features
-
System CUDA toolkit install extras — New pip extras
sysctk12/sysctk13(and
minimal-sysctk12/minimal-sysctk13) allow installingcuda-ccclwithout pulling in
cuda-toolkitas a pip dependency, for users who already have CUDA installed system-wide
(#8608):pip install cuda-cccl[sysctk13] # full install, system CTK pip install cuda-cccl[minimal-sysctk13] # no Numba, system CTK
Performance
- Faster binary search —
lower_bound/upper_boundare now implemented viatransform
with a small linear search for the final steps, improving throughput on modern GPUs (#8642) - Adaptive warpspeed scan — The scan tuning policy now automatically selects the warpspeed
(lookahead) scan path when beneficial for the data type and architecture (#8158)
Bug Fixes
- Fix incorrect minimum CUDA architecture targeted when building the
cccl.cnative extension
(#8631)