github NVIDIA/cutlass v3.4.0
CUTLASS 3.4.0

latest releases: v3.5.0, v3.4.1
3 months ago
  • Improved Mixed-input Hopper GEMMs supporting {16-bit, 8-bit} x {8-bit, 4-bit} input types with fast numerical converters and group scaling factors tuned for optimal performance on Hopper H100.
  • Beta release of Pointer-Array Batched GEMMs utilizing TMA and Hopper H100 tensor cores now available. (Requires CUDA 12.3 or above)
  • Beta release of Group-GEMM - commonly used in optimization of Mixture-Of-Expert models, is now available on Hopper GPUs taking advantage of TMA and Hopper H100 tensor cores. (Requires CUDA 12.3 or above)
  • Ampere Sparse GEMM supports Epilogue Visitor Tree (EVT) now.
  • Impovements to NamedBarriers including details of ReservedNamedBarriers used within the CUTLASS library.
  • Improved CuTe documentation including improved clarity and depth of Quickstart, CuTe Layout, and CuTe Layout Algebra. Associated code comments, post-conditions, and details in CuTe Core Unit Tests also improved.

Don't miss a new cutlass release

NewReleases is sending notifications on new releases.