๐ Core Highlights
-
Add RL-DPO training support to kt-sft, enabling preference-based reinforcement learning fine-tuning on top of KTransformersโ MoE stack.
- Includes critical PEFT adaptations and bug fixes for RL workflows.
- Example configurations and end-to-end usage can be found in:
https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/DPO_tutorial.md
-
Improve large-scale MoE stability and efficiency
- Significantly reduce CPU memory usage during large-chunk prefill.
- Fix Kimi-K2 MoE decode bugs related to buffer management.
- Refine NUMA-aware buffer writing and memory handling paths.
๐ Models, Hardware & Tooling
-
Model support updates
- Add GLM-4.6V support via refactored CPU weight conversion utilities.
- Extend and stabilize Qwen3 / Qwen3-MoE support on NPU (Ascend), including attention, LN, MLP, cache, and expert operators.
-
Deployment & installation
- Add Docker-based deployment support and automatic deployment workflows.
- Improve CPU instruction set handling (e.g., automatic BLIS detection on AMD CPUs).
- Polish PyPI release workflows and installation instructions for smoother setup.
๐ Docs & Community
- Update and polish Kimi-K2 / Kimi-K2-Thinking documentation, including installation steps, prefill strategy, and performance metrics.
- Add and refine NPU benchmarks, prerequisites, and Qwen3-NPU guides.
- Fix README assets, image links, path issues, and reorganize documentation structure.
๐ Contributors
- Thanks to all contributors who helped ship this release.
- Special thanks to @mrhaoxx and @poryfly for enabling RL-DPO support, and to all community members for kernel fixes, model adaptations, documentation, and tooling improvements.
Full Changelog: v0.4.3...v0.4.4
CC: @JimmyPeilinLi @mrhaoxx @ovowei @SkqLiao @KMSorSMS @poryfly @ouqingliang @Azure-Tang @Atream @chenht2022 @qiyuxinlin @ErvinXie @james0zan