These are the changes in the cuda.cccl
libraries introduced in the pre-release 0.1.3.2.0dev128 dated August 14th, 2025.
cuda.cccl
is in "experimental" status, meaning that its API and feature set can change quite rapidly.
- Major API improvements
- Single-call APIs in
cuda.cccl.parallel
- Single-call APIs in
- New algorithms
- Device-wide histogram
StripedtoBlock
exchange
- Infrastructure improvements
- CuPy dependency replaced with
cuda.core
- Support for CUDA 13 drivers
- CuPy dependency replaced with
Major API improvements
Single-call APIs in cuda.cccl.parallel
Previously, performing operation like reduce_into
required 4 API invocations to
(1) create a reducer object, (2) compute the amount of temporary storage required for the reduction,
(3) allocate the required amount of temporary memory, and (4) perform the reduction.
In this version, cuda.cccl.parallel
introduces simpler, single-call APIs. For example, reduction looks like:
# New API - single function call with automatic temp storage
parallel.reduce_into(d_input, d_output, add_op, num_items, h_init)
If you wish to have more control over temporary memory allocation,
the previous API still exists (and always will). It has been renamed from reduce_into
to make_reduce_into
:
# Object API
reducer = parallel.make_reduce_into(d_input, d_output, add_op, h_init)
temp_storage_size = reducer(None, d_input, d_output, num_items, h_init)
temp_storage = cp.empty(temp_storage_size, dtype=np.uint8)
reducer(temp_storage, d_input, d_output, num_items, h_init)
New algorithms
Device-wide histogram
The histogram_even
function provides Python exposure of the corresponding CUB C++ API DeviceHistogram::HistogramEven
.
StripedtoBlock
exchange
cuda.cccl.cooperative
adds a block.exchange
providing Python exposure of the corresponding CUB C++ API BlockExchange.
Currently, only the StripedToBlock
exchange pattern is supported.
Infrastructure improvements
CuPy dependency replaced with cuda.core
Use of CuPy within the library has been replaced with the lighter weight cuda.core
package. This means that installing cuda.cccl
won't install CuPy as a dependency.
Support for CUDA 13 drivers
cuda.cccl
can be used with CUDA 13 compatible drivers. However, the CUDA 13 toolkit (runtime and libraries) is not
yet supported, meaning you still need the CUDA 12 toolkit. Full support for CUDA 13 toolkit is planned for the next
pre-release.