ktransformers v0.6.2 Release Notes
🚀 Core Highlights
- Native DeepSeek-V4-Flash support via kt-kernel MXFP4 MoE operator, consuming the model's native E2M1 + ue8m0 weights without offline conversion.
- Hybrid CPU/GPU inference path through SGLang, validated end-to-end on 8× RTX 5090 (consumer Blackwell, SM_120).
- New AVX2 / AVX-VNNI RAWINT4 MoE backend, extending kt-kernel coverage to consumer CPUs without AVX-512 / AMX.
📌 Models, Hardware & Tooling
- Add DeepSeek-V4-Flash model entry, loader, and numerical validation script.
- Bump SGLang submodule to bring in V4-Flash support, SM_120 Triton fallbacks, and a flashinfer guard.
- Repoint
sglangextra topost2for compatibility.
📥 Installation
For most users:
pip install ktransformers==0.6.2See doc/en/install.md for the general install guide.
From source (recommended for running V4-Flash on SM_120) — pre-built wheels do not ship the Blackwell consumer-GPU fallbacks needed for V4-Flash:
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive
cd kt-kernel && ./install.sh
cd .. && ./install.sh # builds the kvcache-ai SGLang fork
pip install --upgrade flashinfer-python flashinfer-cubin # >= 0.6.9 required by V4-Flash MXFP4 MoERequires CUDA 12.8+.
Additional notes for running Deepseek v4 flash:
if you encounter with any errors , please check out the DeepSeek-V4-Flash tutorial first. It will update by time to record common errors.
📝 Docs & Community
- Add DeepSeek-V4-Flash tutorial: hardware matrix, full launch command for 8× RTX 5090, OpenAI-compatible API examples,
kt chatCLI usage. - Refresh README entry points and add KT SFT Quick Start.
🌟 Contributors
Thanks to all contributors who helped ship this release.
Full Changelog: v0.6.1...v0.6.2
CC: @JimmyPeilinLi @ouqingliang @ovowei @yyj6666667 @aliez-ren @jdai0