github NVIDIA/cutlass v3.0.0
CUTLASS 3.0

latest releases: v3.5.1, v3.5.0, v3.4.1...
20 months ago

3.0.0 (2023-01-23)

  • CuTe, a new core library and backend for CUTLASS 3.0 that defines a single Layout vocabulary type and an associated algebra of layouts for a much more expressive and composable abstraction for tensors, sets of parallel agents, and operations by said agents on tensors.
  • A new conceptual operation hierarchy that replaces the architecture-centric hierarchy of CUTLASS 2.x and documentation for CUTLASS 3.0's GEMM API changes.
  • Strict API backwards compatibility that exposes both 2.x and 3.x API kernels through the same device::GemmUniversalAdapter and kernel::GemmUniversal types, allowing users to include both APIs in the same translation units. More information can be found in the 3.x backwards compatibility section.
  • Updates to Functionality which directs users on which kernels are supported via CUTLASS-2 and CUTLASS-3.
  • Updates to Compatibility Section regarding supported compilers, operating systems, CUDA Toolkits, Hardware Architectures and Target Architecture.
  • New warp-specialized GEMM kernel schedules and mainloops targeting Hopper architecture that achieve great performance with TMA, WGMMA, and threadblock clusters.
  • Extensions to CUTLASS profiler to support threadblock cluster shapes in library and profiler tile configurations.
  • CUTLASS library integration for 3.x API kernels built through the new CollectiveBuilder API, enabling CUTLASS profiler.
  • Support for Hopper GEMMs through the new 3.0 API with CuTe-based exposure of the Hopper Tensor Memory Accelerator and WGMMA Tensor Core features.
  • Set of examples that demonstrate the usage of the new 3.0 API to easily build GEMM kernels targeting Hopper: examples 48, 49, and 50.

Don't miss a new cutlass release

NewReleases is sending notifications on new releases.