flashinfer-ai/flashinfer v0.6.1 on GitHub

What's Changed

Add Claude Code GitHub Workflow by @yzh119 in #2296
Added the device version checks by @Anerudhan in #2307
Fix: FilteredTopKUnifiedKernel read value out of length by @HarryWu99 in #2308
refactor: decorate all operators with @flashinfer_api by @bkryu in #2311
feat: BF16 GEMM using CUTLASS backend for SM100 by @raayandhar in #2070
Super tiny remove unused argument by @fzyzcjy in #2335
Restructure README with updated features, GPU support table, and clearer organization by @sricketts in #2298
fix: guard batchWarpReduceSum with ENABLE_FP8 to fix compilation without FP8 by @yzh119 in #2328
Support both 3D and 4D kv_cache shapes in MLA APIs by @yzh119 in #2334
fix: explicitly set device to CPU for RNG state tensor by @cyx-6 in #2344
bugfix: fix multi-cta top-k implementation when k value is different for different row by @yzh119 in #2325
[ML3] Optimized Router Gemm by @dbari in #2323
Selective State Update kernel (mamba) by @ishovkun in #2301
Add deprecation and removal policy by @sricketts in #2349
feat: Input/output Dump + Replay Mode for API Logging Level 10 by @bkryu in #2206
bugfix: Ninja race condition fix by @bkryu in #2339
chore: bump version to v0.6.1 and exclude buggy apache-tvm-ffi releases by @yzh119 in #2347

Full Changelog: v0.6.0...v0.6.1