github kvcache-ai/ktransformers v0.4.2
KTransformers v0.4.2

latest release: v0.4.3
26 days ago

πŸš€ Core Highlights

  • Add Qwen3-MoE models and AMX SFT rules to kt-sft, including new attention/operators, enabling Qwen3-MoE fine-tuning via LLaMA-Factory.
  • Restructure the repo around kt-kernel and kt-sft.
  • Unified KTMoEWrapper MoE backend with expert deferral for CPU-side MoE inference.
  • Using Docker for installation and deployment, as the pre-built Docker images, greatly reducing environment mismatch issues and ensuring reproducible performance. Seen in: https://hub.docker.com/repository/docker/approachingai/sglang-kt/general

πŸ“Œ Models, Hardware & Tooling

  • Extend Kimi-K2 to Kimi-K2-Thinking with FP8β†’BF16 conversion scripts, updated weights links, and dedicated inference + LoRA-SFT guides.
  • Make kt-kernel/kt-sft easier to build and run across CPUs/GPUs.

πŸ“ Docs & Community

  • Polish KTransformers-FT Γ— LLaMA-Factory docs, add a hands-on KT-FT tutorial, and enhance kt-kernel’s SGLang integration docs plus Kimi-K2/Kimi-K2-Thinking tutorials.
  • Reorganize web/docs structure and READMEs, and add contribution infrastructure.

🌟 Contributors

  • Thanks to all contributors who helped ship this release.
  • Special thanks to @poryfly for the outstanding contribution to Qwen3-MoE integration, including key kernel adaptations and engineering refinements that significantly improved the overall usability and stability of the MoE pipeline.

Full Changelog: v0.4.1...v0.4.2

CC: @SkqLiao @JimmyPeilinLi @ovowei @KMSorSMS @poryfly @ouqingliang @Azure-Tang @Atream @chenht2022 @qiyuxinlin @ErvinXie @james0zan

Don't miss a new ktransformers release

NewReleases is sending notifications on new releases.