github NVIDIA/cccl cccl-python-0.1.3.2.0.dev128
CCCL Python Libraries v0.1.3.2.0.dev128 (pre-release)

latest releases: v3.0.3-rc0, v3.1.0-rc4, v3.1.0-rc3...
pre-release16 days ago

These are the changes in the cuda.cccl libraries introduced in the pre-release 0.1.3.2.0dev128 dated August 14th, 2025.
cuda.cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.

  • Major API improvements
    • Single-call APIs in cuda.cccl.parallel
  • New algorithms
    • Device-wide histogram
    • StripedtoBlock exchange
  • Infrastructure improvements
    • CuPy dependency replaced with cuda.core
    • Support for CUDA 13 drivers

Major API improvements

Single-call APIs in cuda.cccl.parallel

Previously, performing operation like reduce_into required 4 API invocations to
(1) create a reducer object, (2) compute the amount of temporary storage required for the reduction,
(3) allocate the required amount of temporary memory, and (4) perform the reduction.

In this version, cuda.cccl.parallel introduces simpler, single-call APIs. For example, reduction looks like:

# New API - single function call with automatic temp storage
parallel.reduce_into(d_input, d_output, add_op, num_items, h_init)

If you wish to have more control over temporary memory allocation,
the previous API still exists (and always will). It has been renamed from reduce_into to make_reduce_into:

# Object API
reducer = parallel.make_reduce_into(d_input, d_output, add_op, h_init)
temp_storage_size = reducer(None, d_input, d_output, num_items, h_init)
temp_storage = cp.empty(temp_storage_size, dtype=np.uint8)
reducer(temp_storage, d_input, d_output, num_items, h_init)

New algorithms

Device-wide histogram

The histogram_even
function provides Python exposure of the corresponding CUB C++ API DeviceHistogram::HistogramEven.

StripedtoBlock exchange

cuda.cccl.cooperative adds a block.exchange
providing Python exposure of the corresponding CUB C++ API BlockExchange.
Currently, only the StripedToBlock exchange pattern is supported.

Infrastructure improvements

CuPy dependency replaced with cuda.core

Use of CuPy within the library has been replaced with the lighter weight cuda.core
package. This means that installing cuda.cccl won't install CuPy as a dependency.

Support for CUDA 13 drivers

cuda.cccl can be used with CUDA 13 compatible drivers. However, the CUDA 13 toolkit (runtime and libraries) is not
yet supported, meaning you still need the CUDA 12 toolkit. Full support for CUDA 13 toolkit is planned for the next
pre-release.

Don't miss a new cccl release

NewReleases is sending notifications on new releases.