What's Changed
- flashinfer_benchmark QoL Improvements and Attention FP8 Support by @bkryu in #1512
- add cuda version check for jit by @cyx-6 in #1526
- bugfix: Fix compile error for undefined swizzle enum. by @weireweire in #1530
- refactor: Sink attention AoT by @nandor in #1427
- test: Enable all modules in AOT build test by @yongwww in #1528
- Add GeGLU support to trtllm-gen NVFP4 Fused MoE Kernel by @stslxg-nv in #1525
- Add sm check for sm100 only cutlass/trtllm kernel by @ttyio in #1535
- bugfix: fix autotuner failure with low precision data types by @ttyio in #1539
- misc: Setting logging level from env var by @cyx-6 in #1538
- backend: Refactor trtllm-gen fmha metainfo loading by @cyx-6 in #1518
- feat: pass sm_count as param for fp4_masked_gemm by @yyihuang in #1529
- Revert "backend: Refactor trtllm-gen fmha metainfo loading (#1518)" by @yzh119 in #1543
- Fix typo in sampling.cuh: Remove duplicate parameter by @Appenhaimer in #1546
- perf: replace cudaGetDeviceProperties with cudaDeviceGetAttribute by @yongwww in #1547
- fix trtllm_allreduce_fusion twoshot register problem. by @strgrb in #1545
- feat: Integrate TRTLLM varlen kernel for deepseek R1 prefill by @elfiegg in #1537
- Add CONTRIBUTING.md by @sricketts in #1553
- release: bump version to v0.2.14 by @yongwww in #1554
- ci: add timeout for SPOT instance allocation by @yongwww in #1555
- fix: add packaging dependency to resolve pypi workflow by @yongwww in #1557
New Contributors
- @stslxg-nv made their first contribution in #1525
- @Appenhaimer made their first contribution in #1546
Full Changelog: v0.2.13...v0.2.14