github NVIDIA/cccl v2.4.0

latest releases: v2.5.0-rc1, v2.5.0-rc0
2 months ago

What’s New

In addition to various fixes and documentation improvements, the following notable improvements have been made to Thrust, CUB, and libcudacxx.

Thrust

As part of our kernel consolidation effort, kernels of thrust::unique_by_key, thrust::copy_if, and thrust::partition algorithms are now consolidated in CUB. Kernel consolidation achieves two goals. First, it delivers the latest optimizations of CUB algorithms to Thrust users. Apart from the performance improvements, it introduces support of large problem sizes (64-bit offsets) into Thrust algorithms.

CUB

  • cub::DeviceSelect::UniqueByKey now supports equality operator and large problem sizes.
  • New cub::DeviceFor family of algorithms goes beyond conventional cub::DeviceFor::ForEach. cub::DeviceFor::ForEachCopy can provide you with additional performance benefits from vectorized memory accesses.
  • Many CUB algorithms now support CUDA graph capture mode.

libcudacxx

What's Changed

New Contributors

Full Changelog: v2.3.2...v2.4.0

Don't miss a new cccl release

NewReleases is sending notifications on new releases.