github flashinfer-ai/flashinfer v0.6.1
Release v0.6.1

latest releases: nightly-v0.6.9-20260503, nightly-v0.6.9-20260501, v0.6.10rc1...
3 months ago

What's Changed

  • Add Claude Code GitHub Workflow by @yzh119 in #2296
  • Added the device version checks by @Anerudhan in #2307
  • Fix: FilteredTopKUnifiedKernel read value out of length by @HarryWu99 in #2308
  • refactor: decorate all operators with @flashinfer_api by @bkryu in #2311
  • feat: BF16 GEMM using CUTLASS backend for SM100 by @raayandhar in #2070
  • Super tiny remove unused argument by @fzyzcjy in #2335
  • Restructure README with updated features, GPU support table, and clearer organization by @sricketts in #2298
  • fix: guard batchWarpReduceSum with ENABLE_FP8 to fix compilation without FP8 by @yzh119 in #2328
  • Support both 3D and 4D kv_cache shapes in MLA APIs by @yzh119 in #2334
  • fix: explicitly set device to CPU for RNG state tensor by @cyx-6 in #2344
  • bugfix: fix multi-cta top-k implementation when k value is different for different row by @yzh119 in #2325
  • [ML3] Optimized Router Gemm by @dbari in #2323
  • Selective State Update kernel (mamba) by @ishovkun in #2301
  • Add deprecation and removal policy by @sricketts in #2349
  • feat: Input/output Dump + Replay Mode for API Logging Level 10 by @bkryu in #2206
  • bugfix: Ninja race condition fix by @bkryu in #2339
  • chore: bump version to v0.6.1 and exclude buggy apache-tvm-ffi releases by @yzh119 in #2347

New Contributors

Full Changelog: v0.6.0...v0.6.1

Don't miss a new flashinfer release

NewReleases is sending notifications on new releases.