What's Changed
- Add Claude Code GitHub Workflow by @yzh119 in #2296
- Added the device version checks by @Anerudhan in #2307
- Fix: FilteredTopKUnifiedKernel read value out of length by @HarryWu99 in #2308
- refactor: decorate all operators with @flashinfer_api by @bkryu in #2311
- feat: BF16 GEMM using CUTLASS backend for SM100 by @raayandhar in #2070
- Super tiny remove unused argument by @fzyzcjy in #2335
- Restructure README with updated features, GPU support table, and clearer organization by @sricketts in #2298
- fix: guard batchWarpReduceSum with ENABLE_FP8 to fix compilation without FP8 by @yzh119 in #2328
- Support both 3D and 4D kv_cache shapes in MLA APIs by @yzh119 in #2334
- fix: explicitly set device to CPU for RNG state tensor by @cyx-6 in #2344
- bugfix: fix multi-cta top-k implementation when k value is different for different row by @yzh119 in #2325
- [ML3] Optimized Router Gemm by @dbari in #2323
- Selective State Update kernel (mamba) by @ishovkun in #2301
- Add deprecation and removal policy by @sricketts in #2349
- feat: Input/output Dump + Replay Mode for API Logging Level 10 by @bkryu in #2206
- bugfix: Ninja race condition fix by @bkryu in #2339
- chore: bump version to v0.6.1 and exclude buggy apache-tvm-ffi releases by @yzh119 in #2347
New Contributors
- @HarryWu99 made their first contribution in #2308
- @ishovkun made their first contribution in #2301
Full Changelog: v0.6.0...v0.6.1