What's Changed
- ci: Update cudnn version requirements in CI container by @bkryu in #2039
- test: Mark test_fp8_prefill.py as xfail on SM90 by @bkryu in #2038
- Update Docker CI tags to 20251104-d528f0c by @flashinfer-bot in #2041
- bugfix: fix failed unittest
test_green_ctxandtest_jit_exampleon spark (sm_121) by @yzh119 in #1951 - perf: Speed up fp4 quantization for small batch with swizzling for cutlass MoE by @bkryu in #2025
- Support cc common check decorator for empty backends by @jimmyzho in #2015
- use scalar for kv_scale in xqa by @qsang-nv in #2033
- fix: support both pip and uv pip for finding flashinfer-python package by @djmmoss in #2043
- test: Fix test_sampling.py on Spark by @bkryu in #2042
- Fix dtype of output scales from mnnvl_moe_alltoallv_prepare_without_allgather by @trevor-m in #2048
- Update trtllm-gen fused moe routing kernel and add more kernels by @jiahanc in #1955
- chore: Update CODEOWNERS by @flashinfer-bot in #1984
- Add support for topkPacked input in block-level renormalize by @ChristinaZ in #2051
- test: Skip test_fp8_quantize.py on Hopper by @bkryu in #2052
- [BUG] Fix trtllm-gen fp4 moe renormalize routing by @IwakuraRein in #2049
- release: Bump version for v0.5.2 release by @bkryu in #2057
Full Changelog: v0.5.1...v0.5.2