kvcache-ai/ktransformers v0.6.1 on GitHub

KTransformers v0.6.1 is a full refactor and major upgrade of the existing KT fine-tuning path for large-MoE LoRA SFT. This release rebuilds the KT SFT backend around kt-kernel, packages the SFT stack behind the ktransformers[sft] entry, and keeps the LLaMA-Factory training entry and YAML workflow familiar for users.

In measured benchmark settings, KT SFT reaches 6-12x training performance compared with the ZeRO-Offload baseline. Separately, CPU memory usage is reduced to about 1/2 compared with the previous KT SFT path, with lower GPU memory pressure in the same benchmark context. These results are tied to the benchmark setup and should be read together with the model, hardware, context length, LoRA config, and baseline details.

🚀 Core Highlights

Refactored large-MoE SFT backend: rebuilt the KT SFT path around AMX MoE kernels with LoRA support, including SFT-specific AMX kernels, LoRA fused add tests, repacking tests, Python SFT wrappers, autograd integration, layer definitions, and weight helpers (#1936).
Pip-installable SFT stack: ktransformers[sft] now installs the KT SFT stack through regular Python packages: ktransformers, kt-kernel, transformers-kt, and accelerate-kt.
LLaMA-Factory workflow integration: KT SFT is designed to be used after installing LLaMA-Factory, while preserving the standard accelerate launch / src/train.py / YAML workflow. Companion LLaMA-Factory PR: hiyouga/LlamaFactory#10430.
Cleaner package boundary: the old kt-sft package has been archived; SFT now uses ktransformers[sft], while KT inference uses kt-kernel + sglang-kt (#1954, #1955).
Current torch baseline: package metadata and the validated public install path are aligned on torch==2.9.1, with torchaudio==2.9.1 and torchvision==0.24.1 for the current full-stack setup.

📌 Models, Hardware & Tooling

SFT and Packaging

Prepare v0.6.1 SFT wheel packaging on main, including ktransformers[sft], kt-kernel packaging updates, release workflow updates, and a lightweight top-level package shim (#1945).
Align kt-kernel and related package metadata with the current release dependency baseline (#1948).
Flatten the ktransformers package shim for cleaner wheel packaging (#1955).
Archive the legacy kt-sft package under archive/kt-sft/ so the active package layout is easier to reason about (#1954).

Kernel and Hardware Improvements

Add AVX512F+BW fallback support for FP8 and BF16 under the AMX backend, improving CPU fallback coverage where AMX-specific paths are unavailable (#1908).
Add VNNI-256 support for GPTQ INT4 MoE, including a new gptq_int4_avxvnni path and per-commit accuracy test coverage (#1926).
Add SFT-specific AMX MoE kernels, tensor-parallel MoE SFT helpers, LoRA kernel tests, repacking tests, and Python SFT module wrappers (#1936).

Inference and Runtime Integration

Sync the bundled sglang submodule for KT layerwise prefill updates and later packaging fixes (#1920).
Update SGLang-KT release workflow to use a hosted runner, avoiding release blocking on unavailable self-hosted runners.

Model Enablement

Add GLM-5.1 tutorial and prerequisite notes for the kt-kernel path (#1916, #1932).
Refresh README and model documentation links for Kimi-K2.5, MiniMax-M2.5, Qwen3.5, DeepSeek-V3.2, and SFT docs.

📦 Installation

KT SFT with LLaMA-Factory

Use a LLaMA-Factory checkout that contains the KT examples and requirements/ktransformers.txt.

cd /path/to/LLaMA-Factory
pip install -e .
pip install -r requirements/ktransformers.txt

For direct package installation outside that requirements file:

pip install "ktransformers[sft]"

KT Inference

KT inference uses the SGLang-KT path:

pip install kt-kernel sglang-kt

Keep kt-kernel in the inference installation path. LLaMA-Factory SFT continues to use ktransformers[sft].

📝 Docs & Community

Refresh KT installation commands and package boundaries in README / README_ZH and SFT docs (#1958).
Add GOSIM 2026 announcement and update roadmap link to Q2 (#1937).
Add and update GLM-5.1 tutorial prerequisites and related docs (#1916, #1932).
Remove a broken symlink in archive/ktransformers/ (#1906).

🐛 Bug Fixes

Fix Qwen3 series gibberish output by correcting RoPE write-back in the bundled SGLang integration (#1959).
Fix kt-kernel CLI environment detection when NUMA node lists are empty (#1929).
Revert the CPUInfer stream bridge for ROCm after compatibility concerns (#1918, #1925).
Fix SGLang-KT packaging metadata and point the KTransformers SGLang extra to the corrected SGLang-KT release path (#1964).

🌟 Contributors

Thanks to all contributors who helped ship this release.

Full Changelog: v0.5.3...v0.6.1

CC: @JimmyPeilinLi @mrhaoxx @jdai0 @ouqingliang @ErvinXie @chenht2022 @KMSorSMS @ovowei @SkqLiao @yyj6666667 @james0zan

kvcache-ai/ktransformers v0.6.1 KTransformers v0.6.1 on GitHub