What's Changed
- [doc]: Update installation doc and readme by @yongwww in #1465
- Allow BatchPrefillPagedWrapper to call cudnn API by @Anerudhan in #1384
- [RFC] log filename and lineno in flashinfer jit logger by @842974287 in #1461
- Add Mxfp4 trtllm-gen moe unit tests by @IwakuraRein in #1399
- bugfix: Verify num_experts greater or equal to local_experts + offset by @amirkl94 in #1469
- [RFC] add an env to allow specify cubins directory by @842974287 in #1462
- Fix "more than one operator "/" matches these operands" by @842974287 in #1471
- Fix race condition when JitSpec loads the library by @nvpohanh in #1467
- perf: add 1x4x1 cluster shape for fp8 bmm M<16 cases by @ttyio in #1473
- feat: Enable multiple fused-moe backends by @amirkl94 in #1472
- Remove restrict extension to fix compilation error on GB200 by @842974287 in #1470
- feat: masked layout fp4 gemm using cute-dsl by @yzh119 in #1331
- fix: minor fix after #1384 by @yyihuang in #1476
- fix: remove redundant zero_init reverted by #1459 by @yyihuang in #1463
- Remove getEnvEnablePDL in favor of enable_pdl parameter by @yongwww in #1446
- Unify and modularize decode and prefill test. by @weireweire in #1375
- refactor: Improved metainfo for trtllm-gen kernels by @cyx-6 in #1328
- Tone down the amount of logging when downloading cubins by @joker-eph in #1477
- release: bump version to v0.2.11.post2 by @yyihuang in #1478
Full Changelog: v0.2.11.post1...v0.2.11.post2