github NVIDIA/cutlass v2.4.0
CUTLASS 2.4.0

latest releases: v3.5.1, v3.5.0, v3.4.1...
3 years ago

CUTLASS 2.4

  • Implicit GEMM convolution kernels supporting CUDA and Tensor Cores on NVIDIA GPUs
    • Operators: forward (Fprop), backward data gradient (Dgrad), and backward weight gradient (Wgrad) convolution
    • Data type: FP32, complex, Tensor Float 32 (TF32), BFloat16 (BF16), Float16, Int4, Int8, Int32
    • Spatial dimensions: 1-D, 2-D, and 3-D
    • Layout: NHWC, NCxHWx
  • Implicit GEMM convolution components:
    • Global memory iterators supporting Fprop, Dgrad, and Wgrad
    • MmaMultistage for implicit GEMM convolution for NVIDIA Ampere architecture
    • MmaPipeline for implicit GEMM convolution for NVIDIA Volta and Turing architectures
    • Documentation describing Implicit GEMM Convolution algorithm and implementation

Don't miss a new cutlass release

NewReleases is sending notifications on new releases.