github NVIDIA/cutlass v2.2.0
CUTLASS 2.2

latest releases: v3.5.0, v3.4.1, v3.4.0...
3 years ago
  • NVIDIA Ampere Architecture features
    • Fast Tensor Core operations:
    • Maximum performance via mma.sync
    • Tensor Float 32, BFloat16, and double-precision data types
    • Mixed integer data types (int8, int4, bin1)
    • Asynchronous copy for deep software pipelines via cp.async
    • Described in GTC 2020 Webinar (SR 21745) (free registration required)
  • Features:
    • SDK examples showing GEMM fused with bias+relu and fused GEMM+GEMM
    • Complex-valued GEMMs targeting NVIDIA Ampere Tensor Cores in double-precision and Tensor Float 32
    • Gaussian complex GEMMs using 3m complex multiply algorithm
    • Universal GEMM kernel supporting two batch modes and two algorithms for parallel reductions
  • Policy updates:
    • CUDA 11 Toolkit needed to enable NVIDIA Ampere Architecture features
    • Disabled F16C by default for compatibility - enable on cmake command line with -DCUTLASS_ENABLE_F16C=ON

Don't miss a new cutlass release

NewReleases is sending notifications on new releases.