CuTe DSL
-
New features
- Added PDL support along with example Kernel launch with Programmatic Dependent Launch
-
Bug fixing and improvements
- Fixed a frame refcnt issue with cuda graph
- Enhancement for tvm-ffi AoT case for earlier module unload
- Fixed order issue in
make_smem_layout_ain utils/hopper_helpers.py
CUTLASS C++
- Work around a driver TMA descriptor related bug which will cause occasionally errors on Blackwell when the tensor's backing memory allocation is less than 128KB and it is not a dense non-overlapping tensor.