github flashinfer-ai/flashinfer v0.2.14

latest releases: nightly-v0.6.11-20260516, v0.6.11.post3, nightly-v0.6.11-20260515...
8 months ago

What's Changed

  • flashinfer_benchmark QoL Improvements and Attention FP8 Support by @bkryu in #1512
  • add cuda version check for jit by @cyx-6 in #1526
  • bugfix: Fix compile error for undefined swizzle enum. by @weireweire in #1530
  • refactor: Sink attention AoT by @nandor in #1427
  • test: Enable all modules in AOT build test by @yongwww in #1528
  • Add GeGLU support to trtllm-gen NVFP4 Fused MoE Kernel by @stslxg-nv in #1525
  • Add sm check for sm100 only cutlass/trtllm kernel by @ttyio in #1535
  • bugfix: fix autotuner failure with low precision data types by @ttyio in #1539
  • misc: Setting logging level from env var by @cyx-6 in #1538
  • backend: Refactor trtllm-gen fmha metainfo loading by @cyx-6 in #1518
  • feat: pass sm_count as param for fp4_masked_gemm by @yyihuang in #1529
  • Revert "backend: Refactor trtllm-gen fmha metainfo loading (#1518)" by @yzh119 in #1543
  • Fix typo in sampling.cuh: Remove duplicate parameter by @Appenhaimer in #1546
  • perf: replace cudaGetDeviceProperties with cudaDeviceGetAttribute by @yongwww in #1547
  • fix trtllm_allreduce_fusion twoshot register problem. by @strgrb in #1545
  • feat: Integrate TRTLLM varlen kernel for deepseek R1 prefill by @elfiegg in #1537
  • Add CONTRIBUTING.md by @sricketts in #1553
  • release: bump version to v0.2.14 by @yongwww in #1554
  • ci: add timeout for SPOT instance allocation by @yongwww in #1555
  • fix: add packaging dependency to resolve pypi workflow by @yongwww in #1557

New Contributors

Full Changelog: v0.2.13...v0.2.14

Don't miss a new flashinfer release

NewReleases is sending notifications on new releases.