github NVIDIA/cutlass v2.7.0
CUTLASS 2.7

latest releases: v3.5.1, v3.5.0, v3.4.1...
3 years ago

2.7.0

  • Mainloop fusion for GEMM: summation over A or B
  • Strided DGRAD (optimized iterators)
  • Half-precision GELU_taylor activation functions
    • Use these when accumulation and epilogue compute types are all cutlass::half_t
  • Tuning and bug fixes to fused GEMM + GEMM example
  • Support for smaller than 128b aligned Convolutions: see examples
  • Caching of results to accelerate Convolution unit tests
    • Can be enabled or disabled by running cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF
  • Corrections and bug fixes reported by the CUTLASS community
    • Thank you for filing these issues!

Don't miss a new cutlass release

NewReleases is sending notifications on new releases.