π Core Highlights
- Add Qwen3-MoE models and AMX SFT rules to kt-sft, including new attention/operators, enabling Qwen3-MoE fine-tuning via LLaMA-Factory.
- Restructure the repo around kt-kernel and kt-sft.
- Unified KTMoEWrapper MoE backend with expert deferral for CPU-side MoE inference.
- Using Docker for installation and deployment, as the pre-built Docker images, greatly reducing environment mismatch issues and ensuring reproducible performance. Seen in: https://hub.docker.com/repository/docker/approachingai/sglang-kt/general
π Models, Hardware & Tooling
- Extend Kimi-K2 to Kimi-K2-Thinking with FP8βBF16 conversion scripts, updated weights links, and dedicated inference + LoRA-SFT guides.
- Make kt-kernel/kt-sft easier to build and run across CPUs/GPUs.
π Docs & Community
- Polish KTransformers-FT Γ LLaMA-Factory docs, add a hands-on KT-FT tutorial, and enhance kt-kernelβs SGLang integration docs plus Kimi-K2/Kimi-K2-Thinking tutorials.
- Reorganize web/docs structure and READMEs, and add contribution infrastructure.
π Contributors
- Thanks to all contributors who helped ship this release.
- Special thanks to @poryfly for the outstanding contribution to Qwen3-MoE integration, including key kernel adaptations and engineering refinements that significantly improved the overall usability and stability of the MoE pipeline.
Full Changelog: v0.4.1...v0.4.2
CC: @SkqLiao @JimmyPeilinLi @ovowei @KMSorSMS @poryfly @ouqingliang @Azure-Tang @Atream @chenht2022 @qiyuxinlin @ErvinXie @james0zan