kvcache-ai/ktransformers v0.4.2 on GitHub

🚀 Core Highlights

Add Qwen3-MoE models and AMX SFT rules to kt-sft, including new attention/operators, enabling Qwen3-MoE fine-tuning via LLaMA-Factory.
Restructure the repo around kt-kernel and kt-sft.
Unified KTMoEWrapper MoE backend with expert deferral for CPU-side MoE inference.
Using Docker for installation and deployment, as the pre-built Docker images, greatly reducing environment mismatch issues and ensuring reproducible performance. Seen in: https://hub.docker.com/repository/docker/approachingai/sglang-kt/general

📌 Models, Hardware & Tooling

Extend Kimi-K2 to Kimi-K2-Thinking with FP8→BF16 conversion scripts, updated weights links, and dedicated inference + LoRA-SFT guides.
Make kt-kernel/kt-sft easier to build and run across CPUs/GPUs.

📝 Docs & Community

Polish KTransformers-FT × LLaMA-Factory docs, add a hands-on KT-FT tutorial, and enhance kt-kernel’s SGLang integration docs plus Kimi-K2/Kimi-K2-Thinking tutorials.
Reorganize web/docs structure and READMEs, and add contribution infrastructure.

🌟 Contributors

Thanks to all contributors who helped ship this release.
Special thanks to @poryfly for the outstanding contribution to Qwen3-MoE integration, including key kernel adaptations and engineering refinements that significantly improved the overall usability and stability of the MoE pipeline.

Full Changelog: v0.4.1...v0.4.2

kvcache-ai/ktransformers v0.4.2 KTransformers v0.4.2 on GitHub