github NVIDIA/cutlass v2.9.0
CUTLASS 2.9.0

latest releases: v3.5.1, v3.5.0, v3.4.1...
2 years ago

CUTLASS 2.9.0

  • First layer Convolution kernels specialized for small channel counts and reduced alignment
    • Few channels specialization for reduced alignment capabilities
    • Fixed channels further specialized when channel count perfectly matches the access vector size
    • Unit tests
    • Python-based instance emitter in the CUTLASS Library and support in the Profiler
  • BLAS3 operators accelerated by Tensor Cores
    • Supported types: f32, cf32, f64, cf64
    • HERK with emitter
    • SYRK with emitter
    • SYMM with emitter
    • TRMM with emitter
    • Unit tests
  • CUTLASS Python demonstrating JIT compilation of CUTLASS kernels and a Python-based runtime using CUDA Python
    • Python-based runtime interoperable with existing emitters
  • GEMM + Softmax example
  • Optimal performance using CUDA 11.6u2
  • Updates and bugfixes from the community (thanks!)

Don't miss a new cutlass release

NewReleases is sending notifications on new releases.