github NVIDIA/cccl python-0.7.0
CCCL Python Libraries (v0.7.0)

5 hours ago

cuda-cccl Python package — version 0.7.0

Release date: May 5th, 2026. Previous release: v0.6.0.

cuda-cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.

Installation

Please refer to the install instructions here

API breaking changes

  • All cuda.compute functions now require keyword-only arguments (#8772)

    Every top-level function and factory (make_*) in cuda.compute now enforces keyword-only call
    syntax (i.e., all parameters must be passed by name). Positional calls will raise a TypeError.

    Before:

    reduce_into(d_in, d_out, op, num_items, h_init)

    After:

    reduce_into(d_in=d_in, d_out=d_out, num_items=num_items, op=op, h_init=h_init)

Features

  • System CUDA toolkit install extras — New pip extras sysctk12 / sysctk13 (and
    minimal-sysctk12 / minimal-sysctk13) allow installing cuda-cccl without pulling in
    cuda-toolkit as a pip dependency, for users who already have CUDA installed system-wide
    (#8608):

    pip install cuda-cccl[sysctk13]          # full install, system CTK
    pip install cuda-cccl[minimal-sysctk13]  # no Numba, system CTK

Performance

  • Faster binary searchlower_bound / upper_bound are now implemented via transform
    with a small linear search for the final steps, improving throughput on modern GPUs (#8642)
  • Adaptive warpspeed scan — The scan tuning policy now automatically selects the warpspeed
    (lookahead) scan path when beneficial for the data type and architecture (#8158)

Bug Fixes

  • Fix incorrect minimum CUDA architecture targeted when building the cccl.c native extension
    (#8631)

Don't miss a new cccl release

NewReleases is sending notifications on new releases.