π Core Highlights: KT Fine-Tuning & Ecosystem
- Launch native KT SFT (Supervised Fine-Tuning) workflow, supporting ultra-large model (e.g., 671B MoE) fine-tuning on 2β4 RTX 4090s
- Achieve seamless integration with LLaMA-Factoryβenable KT backend via LLaMA-Factory YAML config, unlocking 671B model tuning for LLaMA-Factory users
π Model & Hardware Expansion
- Add support for Kimi-K2 (including 0905 variant), SmallThinker, GLM4-MoE, Qwen3Next
- Introduce initial Ascend NPU support (with DeepSeek-R1 NPU tutorial)
- Revert FP16 usage on XPU to fix stability issues
π§ Kernel & Tooling Improvements
- Release KT-Kernel (core compute engine), with Expert Deferral mechanism for MoE layers
- Fix KT-Kernel bugs and optimize execution stability
- Update balance_serve.py to enhance multi-concurrency inference load balancing
- Resolve CMake build issues (env settings + AMX support)
π Documentation Updates
- Add KT SFT & LLaMA-Factory integration guides (with YAML examples)
- Refine Kimi-K2 docs (GGUF links, 0905 support dates) and add SmallThinker/GLM4-MoE tutorials
- Include SGLang integration docs and citation section in README
π Contributors
- Welcome first-time contributors who delivered:
Ascend NPU adaptation + tutorial
DeepSeek-R1 NPU guide
KT SFT feature development - Full Changelog: v0.3.2...v0.4.1
- CC: @JimmyPeilinLi @yangqianrui @ovowei @KMSorSMS @Azure-Tang @Atream @chenht2022 @qiyuxinlin @ErvinXie @james0zan