What's Changed
- Clean up libcu++ docs landing page by @jrhemstad in #1492
- PTX: Add
cuda::ptx::elect_sync
by @ahendriksen in #1537 - Print a summary of all tests sorted by execution time. by @alliepiper in #1539
- Fix unused variable warning for
__can_use_complete_tx
by @wmaxey in #1547 - Fix usage of naked array with 0 elements in sm90 barrier tests. by @wmaxey in #1546
- Add support for stream operators for complex by @miscco in #1538
- Fix
__half
for older architectures by @miscco in #1543 - Feat 565 remove redundant thrust dialect conditional by @ZelboK in #566
- fix missing device hint in WarpMergeSort Documentation by @MARD1NO in #1553
- Minor fixes and additions on cub developer guides by @gonidelis in #1559
- Consolidate handling of
constexpr
andif constexpr
by @miscco in #1562 - Ensure that
cuda::aligned_size_t
is usable in a constexpr context by @miscco in #1564 - Group CUB docs by @gevtushenko in #1565
- Update toolkit to 12.4 by @miscco in #1554
- Work around change in cuTensorMapEncode by @miscco in #1567
- Remove stdlib arg from .clangd. by @alliepiper in #1569
- Add the DeviceSelect::FlaggedIf algorithm by @gonidelis in #1533
- Catch2 segmented sort by @alliepiper in #1484
- Do not emit diagnostic with extended device lambdas with preserved re… by @Revaj in #1495
- Use absolute includes for libcu++ by @miscco in #1560
- [NFC] Modularize
<exception>
by @miscco in #199 - Add test support for launching kernels with cluster size > 1 by @ahendriksen in #416
- Fix typo in README.md by @bprb in #1574
- [FEA]: Modularize
<cuda/memory_resource>
by @miscco in #1532 - Cleanup_complex by @miscco in #1555
- Add missing comma in barrier
__try_wait
by @miscco in #1593 - Segmented sort test fix by @alliepiper in #1591
- Add pre-commit configuration by @bdice in #1596
- Preserve
.devcontainer/img/
when cleaning. by @alliepiper in #1604 - Add some documentation for recent additions to libcu++ by @miscco in #1594
- Ensure
cuda::std::nullopt
is visible in device code by @trxcllnt in #1598 - Fix ordering of
alignas
and__shared__
by @miscco in #1601 - Update Thrust CI tests. by @alliepiper in #1605
- Implement tuple interface for cuda vector types by @miscco in #1410
- Inspect PR changes to determine if subproject builds are needed. by @alliepiper in #1572
- Apply clang-format to cub by @bdice in #1602
- Add missing non-volatile atomic overloads. by @wmaxey in #1582
- Drop unused libcxx files by @miscco in #1606
- Apply formatting to libcudacxx by @miscco in #1610
- Add conda documentation to the README. by @bdice in #1581
- Allow jobs to be skipped. by @alliepiper in #1611
- Make libcu++ work with exceptions by @miscco in #1607
- Implement
cuda::mr::cuda_memory_resource
by @miscco in #1578 - Implement
cuda::mr::managed_memory_resource
by @miscco in #1579 - Apply formatting to thrust by @miscco in #1616
- Update example_device_radix_sort.cu by @eriktedhamre in #1608
- Implement
cuda::mr::pinned_memory_resource
by @miscco in #1580 - Set the devcontainers to format on save. by @miscco in #1624
- Enable internal use of
std::allocator
related functionality by @miscco in #1583 - Adds tests for large number of items for
cub::DeviceSelect
by @elstehle in #1612 - Add pre-commit docs to CONTRIBUTING.md. by @bdice in #1627
- Move visibility attributes to cccl by @miscco in #1595
- Work around thrust/memory.h circular include by @dkolsen-pgi in #1634
- Fix mbarrier.init addressing by @ahendriksen in #1636
- Trim trailing whitespace and normalize newlines. by @bdice in #1633
- Add a
git-blame-ignore-revs
file by @miscco in #1629 - Revert "PTX: Add
cuda::ptx::elect_sync
(#1537)" by @ahendriksen in #1638 - Address potential oob in cub when passing in an invalid device counter by @miscco in #1641
- Allow ninja_summary to fail by @jrhemstad in #1644
- Mostly flatten the folder structure of libcu++ by @miscco in #1630
- Make
--cmake-options=""
always override others. by @alliepiper in #1648 - Fix invalid
_CCCL_CUDACC
definition for clang cuda by @miscco in #1656 - Add missing #pragma once in some headers by @bernhardmgruber in #1668
- Add NVTX ranges for all CUB algorithms by @bernhardmgruber in #1657
- Implement LWG-3843 and LWG-3940 by @miscco in #1621
- Modularize
<memory>
by @miscco in #1639 - Expose
<cuda/std/numeric>
to be publicly available by @miscco in #1671 - Add nsight support for automated debugging by @gonidelis in #1660
- Format core headers by @miscco in #1670
- Guard
resource_ref
and friends behind feature flag by @miscco in #1675 - Create major version 2.5.0 by @wmaxey in #1677
- Install CUB headers with .hpp extension by @bernhardmgruber in #1687
- Update CMakePresets.json by @alliepiper in #1686
- Fix deprecated status by @gevtushenko in #1692
- Test combined internal/user-side use of NVTX by @bernhardmgruber in #1690
- CI Overhaul, new nightly workflow by @alliepiper in #1654
- Fix CMake option handling. by @alliepiper in #1698
- Fix issues that came up with building cuDF with main by @miscco in #1643
- Drop new properties until we are certain about the design by @miscco in #1681
- Remove more uses of
__cuda_std__
by @miscco in #1669 - Fix usage of
result_of
in thrust by @miscco in #1705 - Fix thrust::optional<T&>::emplace() by @Snektron in #1707
- Remove old f(void) function signatures by @bernhardmgruber in #1708
- Fix code sample in README and docs by @pauleonix in #1652
- Format libcudacxx/include files without extensions by @bdice in #1676
- Several improvements to zip_iterator/zip_function by @bernhardmgruber in #1710
- Expose thrust's contiguous iterator unwrap helpers by @bernhardmgruber in #1717
- Fix flakey heterogeneous tests by @wmaxey in #1712
- Ensure that we can use
cuda::std::optional
with types that are not__host__ __device__
by @miscco in #1663 - Fix a typo in barrier docs and update the godbolt link by @PointKernel in #1718
- Massively improve test times in heterogeneous atomics tests by @wmaxey in #1719
- Consolidate more common functionality by @miscco in #1716
- Increase timeout for the libcu++ test runs by @miscco in #1720
- Fix nightly CI: H100 runners are not in a testing pool. by @alliepiper in #1723
- Add a new CUDA Next library and a first entry in it with hierarchy_dimensions type template by @pciolkosz in #1485
- Atomics backend refactor by @wmaxey in #1631
- Const-qualify
half_t::operator+/*
by @bernhardmgruber in #1726 - Reenable previously failing histogram test for icc by @bernhardmgruber in #1725
- Enable testing for the other half of the heterogeneous managed memory tests on MSVC. by @wmaxey in #1729
- PTX: mark cp_async_bulk*_multicast functions sm_90a by @ahendriksen in #1734
- Improve libcu++ documentation a bit more by @miscco in #1732
- Make atomic_ref ctor constexpr. again. by @wmaxey in #1737
- Various and sundry fixes for Thrust's CPP backends. by @alliepiper in #1722
- Avoid ABI issues due to MSVC EBCO issues by @miscco in #1739
- Drop unused header from ptx by @miscco in #1740
- Allow an
override
matrix to reduce CI workload. by @alliepiper in #1701 - Fix docs generation by @miscco in #1741
- Add docs instructions on how to utilize CMake Presets by @gonidelis in #1694
- Ensure that {cr}begin works with types that pull in namespace std via ADL by @miscco in #1685
- Merge prep jobs for verify-devcontainers CI. by @alliepiper in #1754
- Fix typo in ci docs. by @alliepiper in #1756
- Add runtime + sccache info to CI comment by @alliepiper in #1744
- Add section about SSH signing keys to developer docs. by @alliepiper in #1755
- Add sm100 support to <nv/target> for NVCC by @wmaxey in #1745
- Fix_duplicate_job_checks by @alliepiper in #1759
- Const-qualify histogram pointer input parameters by @bernhardmgruber in #1762
- Return demangled name in
c2h::type_name
by @bernhardmgruber in #1773 - Simplify argument forwarding in CUB histogram entry-points by @bernhardmgruber in #1776
- Add guard against half support by @miscco in #1735
- Refactor CUB test launch helpers by @bernhardmgruber in #1770
- Replace
cub::ArrayWrapper
bycuda::std::array
and deprecate it by @bernhardmgruber in #1764 - Fix missing qualification of
pow
in two instances by @miscco in #1784 - Add mechanism to split project tests into parallel jobs. by @alliepiper in #1696
- Fix
__half
conversion to float in histogram by @miscco in #1785 - Implement P3029R1: deduction from
integral_constant
by @miscco in #1786 - Revert to showing skipped jobs to WAR GHA bug. by @alliepiper in #1794
- Port to Catch2 and rework device histogram test by @bernhardmgruber in #1695
- Add gcc13, clang17, clang18 to CI by @jrhemstad in #1757
- Drop more of thrust type traits by @miscco in #1721
- Show workflow walltime, job max time in CI comment. by @alliepiper in #1795
- Fix span for non-ranges by @miscco in #1840
- Drop all internal implementations of exceptions (#1806) by @miscco in #1839
- Backport atomic regression fix #1801 by @wmaxey in #1833
- [BACKPORT] Symbol visibility is now invariant in regards to
__cuda_std__
definition (#1832) by @miscco in #1864
New Contributors
- @MARD1NO made their first contribution in #1553
- @Revaj made their first contribution in #1495
- @bprb made their first contribution in #1574
- @eriktedhamre made their first contribution in #1608
- @Snektron made their first contribution in #1707
Full Changelog: v2.4.0...v2.5.0-rc1