KTransformers v0.6.1 is a full refactor and major upgrade of the existing KT fine-tuning path for large-MoE LoRA SFT. This release rebuilds the KT SFT backend around kt-kernel, packages the SFT stack behind the ktransformers[sft] entry, and keeps the LLaMA-Factory training entry and YAML workflow familiar for users.
In measured benchmark settings, KT SFT reaches 6-12x training performance compared with the ZeRO-Offload baseline. Separately, CPU memory usage is reduced to about 1/2 compared with the previous KT SFT path, with lower GPU memory pressure in the same benchmark context. These results are tied to the benchmark setup and should be read together with the model, hardware, context length, LoRA config, and baseline details.
🚀 Core Highlights
- Refactored large-MoE SFT backend: rebuilt the KT SFT path around AMX MoE kernels with LoRA support, including SFT-specific AMX kernels, LoRA fused add tests, repacking tests, Python SFT wrappers, autograd integration, layer definitions, and weight helpers (#1936).
- Pip-installable SFT stack:
ktransformers[sft]now installs the KT SFT stack through regular Python packages:ktransformers,kt-kernel,transformers-kt, andaccelerate-kt. - LLaMA-Factory workflow integration: KT SFT is designed to be used after installing LLaMA-Factory, while preserving the standard
accelerate launch/src/train.py/ YAML workflow. Companion LLaMA-Factory PR: hiyouga/LlamaFactory#10430. - Cleaner package boundary: the old
kt-sftpackage has been archived; SFT now usesktransformers[sft], while KT inference useskt-kernel+sglang-kt(#1954, #1955). - Current torch baseline: package metadata and the validated public install path are aligned on
torch==2.9.1, withtorchaudio==2.9.1andtorchvision==0.24.1for the current full-stack setup.
📌 Models, Hardware & Tooling
SFT and Packaging
- Prepare v0.6.1 SFT wheel packaging on
main, includingktransformers[sft],kt-kernelpackaging updates, release workflow updates, and a lightweight top-level package shim (#1945). - Align
kt-kerneland related package metadata with the current release dependency baseline (#1948). - Flatten the
ktransformerspackage shim for cleaner wheel packaging (#1955). - Archive the legacy
kt-sftpackage underarchive/kt-sft/so the active package layout is easier to reason about (#1954).
Kernel and Hardware Improvements
- Add AVX512F+BW fallback support for FP8 and BF16 under the AMX backend, improving CPU fallback coverage where AMX-specific paths are unavailable (#1908).
- Add VNNI-256 support for GPTQ INT4 MoE, including a new
gptq_int4_avxvnnipath and per-commit accuracy test coverage (#1926). - Add SFT-specific AMX MoE kernels, tensor-parallel MoE SFT helpers, LoRA kernel tests, repacking tests, and Python SFT module wrappers (#1936).
Inference and Runtime Integration
- Sync the bundled
sglangsubmodule for KT layerwise prefill updates and later packaging fixes (#1920). - Update SGLang-KT release workflow to use a hosted runner, avoiding release blocking on unavailable self-hosted runners.
Model Enablement
- Add GLM-5.1 tutorial and prerequisite notes for the kt-kernel path (#1916, #1932).
- Refresh README and model documentation links for Kimi-K2.5, MiniMax-M2.5, Qwen3.5, DeepSeek-V3.2, and SFT docs.
📦 Installation
KT SFT with LLaMA-Factory
Use a LLaMA-Factory checkout that contains the KT examples and requirements/ktransformers.txt.
cd /path/to/LLaMA-Factory
pip install -e .
pip install -r requirements/ktransformers.txtFor direct package installation outside that requirements file:
pip install "ktransformers[sft]"KT Inference
KT inference uses the SGLang-KT path:
pip install kt-kernel sglang-ktKeep kt-kernel in the inference installation path. LLaMA-Factory SFT continues to use ktransformers[sft].
📝 Docs & Community
- Refresh KT installation commands and package boundaries in README / README_ZH and SFT docs (#1958).
- Add GOSIM 2026 announcement and update roadmap link to Q2 (#1937).
- Add and update GLM-5.1 tutorial prerequisites and related docs (#1916, #1932).
- Remove a broken symlink in
archive/ktransformers/(#1906).
🐛 Bug Fixes
- Fix Qwen3 series gibberish output by correcting RoPE write-back in the bundled SGLang integration (#1959).
- Fix kt-kernel CLI environment detection when NUMA node lists are empty (#1929).
- Revert the CPUInfer stream bridge for ROCm after compatibility concerns (#1918, #1925).
- Fix SGLang-KT packaging metadata and point the KTransformers SGLang extra to the corrected SGLang-KT release path (#1964).
🌟 Contributors
Thanks to all contributors who helped ship this release.
Full Changelog: v0.5.3...v0.6.1
CC: @JimmyPeilinLi @mrhaoxx @jdai0 @ouqingliang @ErvinXie @chenht2022 @KMSorSMS @ovowei @SkqLiao @yyj6666667 @james0zan