github NVIDIA/cutlass v4.5.1
CUTLASS 4.5.1

3 hours ago

CuTe DSL

  • Bug fixing and improvements
    • Fixed following issues:
      #3219
      #3218
      #3212
      #3210
      #3208
      #3201
      #3227
    • Fixed Jax int64 stride divisibility issue
    • Fixed issues for SM120 blockscaled MMAs
      • added missing MXFP8MMAOP and MXF8F6F4MMAOP for sm120.

CUTLASS C++

  • Fix SM100 F8F6F4 SS MMA (1SM and 2SM) traits to use typed op templates.
  • Add UE8M0 (uniform exponent distribution) initialization support in tensor fill utilities.
  • Add cvt.rn.bf16x2.e4m3x2 conversion instruction support to numeric_conversion.h.
  • Update example 93 with paged KV cache support for Blackwell low-latency GQA.

Don't miss a new cutlass release

NewReleases is sending notifications on new releases.