2.7.0
- Mainloop fusion for GEMM: summation over A or B
- Strided DGRAD (optimized iterators)
- Half-precision GELU_taylor activation functions
- Use these when accumulation and epilogue compute types are all
cutlass::half_t
- Use these when accumulation and epilogue compute types are all
- Tuning and bug fixes to fused GEMM + GEMM example
- Support for smaller than 128b aligned Convolutions: see examples
- Caching of results to accelerate Convolution unit tests
- Can be enabled or disabled by running
cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF
- Can be enabled or disabled by running
- Corrections and bug fixes reported by the CUTLASS community
- Thank you for filing these issues!