flashinfer-ai/flashinfer v0.2.14 on GitHub

What's Changed

flashinfer_benchmark QoL Improvements and Attention FP8 Support by @bkryu in #1512
add cuda version check for jit by @cyx-6 in #1526
bugfix: Fix compile error for undefined swizzle enum. by @weireweire in #1530
refactor: Sink attention AoT by @nandor in #1427
test: Enable all modules in AOT build test by @yongwww in #1528
Add GeGLU support to trtllm-gen NVFP4 Fused MoE Kernel by @stslxg-nv in #1525
Add sm check for sm100 only cutlass/trtllm kernel by @ttyio in #1535
bugfix: fix autotuner failure with low precision data types by @ttyio in #1539
misc: Setting logging level from env var by @cyx-6 in #1538
backend: Refactor trtllm-gen fmha metainfo loading by @cyx-6 in #1518
feat: pass sm_count as param for fp4_masked_gemm by @yyihuang in #1529
Revert "backend: Refactor trtllm-gen fmha metainfo loading (#1518)" by @yzh119 in #1543
Fix typo in sampling.cuh: Remove duplicate parameter by @Appenhaimer in #1546
perf: replace cudaGetDeviceProperties with cudaDeviceGetAttribute by @yongwww in #1547
fix trtllm_allreduce_fusion twoshot register problem. by @strgrb in #1545
feat: Integrate TRTLLM varlen kernel for deepseek R1 prefill by @elfiegg in #1537
Add CONTRIBUTING.md by @sricketts in #1553
release: bump version to v0.2.14 by @yongwww in #1554
ci: add timeout for SPOT instance allocation by @yongwww in #1555
fix: add packaging dependency to resolve pypi workflow by @yongwww in #1557

Full Changelog: v0.2.13...v0.2.14