What's Changed
- Fix SM100 FP8 fwd with cutlass-dsl >=4.5.2 (MmaF8F6F4Op) by @Johnsonms in #2640
- [cute] Fix int32 overflow in SM100 LPT tile scheduler for long context by @sryap in #2662
- [Fwd,Sm100] Tune FP8 causal hd128 ex2_emu_freq (8 vs inherited 16) by @Johnsonms in #2642
- Make q_subtile_factor default to identity by @drisspg in #2660
Full Changelog: fa4-v4.0.0.beta17...fa4-v4.0.0.beta18