ktransformers v0.6.2 Release Notes

🚀 Core Highlights

Native DeepSeek-V4-Flash support via kt-kernel MXFP4 MoE operator, consuming the model's native E2M1 + ue8m0 weights without offline conversion.
Hybrid CPU/GPU inference path through SGLang, validated end-to-end on 8× RTX 5090 (consumer Blackwell, SM_120).
New AVX2 / AVX-VNNI RAWINT4 MoE backend, extending kt-kernel coverage to consumer CPUs without AVX-512 / AMX.

📌 Models, Hardware & Tooling

Add DeepSeek-V4-Flash model entry, loader, and numerical validation script.
Bump SGLang submodule to bring in V4-Flash support, SM_120 Triton fallbacks, and a flashinfer guard.
Repoint sglang extra to post2 for compatibility.

📥 Installation

For most users:

pip install ktransformers==0.6.2

See doc/en/install.md for the general install guide.

From source (recommended for running V4-Flash on SM_120) — pre-built wheels do not ship the Blackwell consumer-GPU fallbacks needed for V4-Flash:

git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive
cd kt-kernel && ./install.sh
cd .. && ./install.sh        # builds the kvcache-ai SGLang fork
pip install --upgrade flashinfer-python flashinfer-cubin   # >= 0.6.9 required by V4-Flash MXFP4 MoE

Requires CUDA 12.8+.

Additional notes for running Deepseek v4 flash:
if you encounter with any errors , please check out the DeepSeek-V4-Flash tutorial first. It will update by time to record common errors.

📝 Docs & Community

Add DeepSeek-V4-Flash tutorial: hardware matrix, full launch command for 8× RTX 5090, OpenAI-compatible API examples, kt chat CLI usage.
Refresh README entry points and add KT SFT Quick Start.

🌟 Contributors

Thanks to all contributors who helped ship this release.

Full Changelog: v0.6.1...v0.6.2

CC: @JimmyPeilinLi @ouqingliang @ovowei @yyj6666667 @aliez-ren @jdai0

kvcache-ai/ktransformers v0.6.2 KTransformers v0.6.2 on GitHub

ktransformers v0.6.2 Release Notes

kvcache-ai/ktransformers v0.6.2
KTransformers v0.6.2

on GitHub