NVIDIA/cutlass v3.0.0 on GitHub

3.0.0 (2023-01-23)

CuTe, a new core library and backend for CUTLASS 3.0 that defines a single Layout vocabulary type and an associated algebra of layouts for a much more expressive and composable abstraction for tensors, sets of parallel agents, and operations by said agents on tensors.
A new conceptual operation hierarchy that replaces the architecture-centric hierarchy of CUTLASS 2.x and documentation for CUTLASS 3.0's GEMM API changes.
Strict API backwards compatibility that exposes both 2.x and 3.x API kernels through the same device::GemmUniversalAdapter and kernel::GemmUniversal types, allowing users to include both APIs in the same translation units. More information can be found in the 3.x backwards compatibility section.
Updates to Functionality which directs users on which kernels are supported via CUTLASS-2 and CUTLASS-3.
Updates to Compatibility Section regarding supported compilers, operating systems, CUDA Toolkits, Hardware Architectures and Target Architecture.
New warp-specialized GEMM kernel schedules and mainloops targeting Hopper architecture that achieve great performance with TMA, WGMMA, and threadblock clusters.
Extensions to CUTLASS profiler to support threadblock cluster shapes in library and profiler tile configurations.
CUTLASS library integration for 3.x API kernels built through the new CollectiveBuilder API, enabling CUTLASS profiler.
Support for Hopper GEMMs through the new 3.0 API with CuTe-based exposure of the Hopper Tensor Memory Accelerator and WGMMA Tensor Core features.
Set of examples that demonstrate the usage of the new 3.0 API to easily build GEMM kernels targeting Hopper: examples 48, 49, and 50.

NVIDIA/cutlass v3.0.0 CUTLASS 3.0 on GitHub

3.0.0 (2023-01-23)

NVIDIA/cutlass v3.0.0
CUTLASS 3.0

on GitHub