github kvcache-ai/ktransformers v0.4.4
KTransformers v0.4.4

7 hours ago

๐Ÿš€ Core Highlights

  • Add RL-DPO training support to kt-sft, enabling preference-based reinforcement learning fine-tuning on top of KTransformersโ€™ MoE stack.

  • Improve large-scale MoE stability and efficiency

    • Significantly reduce CPU memory usage during large-chunk prefill.
    • Fix Kimi-K2 MoE decode bugs related to buffer management.
    • Refine NUMA-aware buffer writing and memory handling paths.

๐Ÿ“Œ Models, Hardware & Tooling

  • Model support updates

    • Add GLM-4.6V support via refactored CPU weight conversion utilities.
    • Extend and stabilize Qwen3 / Qwen3-MoE support on NPU (Ascend), including attention, LN, MLP, cache, and expert operators.
  • Deployment & installation

    • Add Docker-based deployment support and automatic deployment workflows.
    • Improve CPU instruction set handling (e.g., automatic BLIS detection on AMD CPUs).
    • Polish PyPI release workflows and installation instructions for smoother setup.

๐Ÿ“ Docs & Community

  • Update and polish Kimi-K2 / Kimi-K2-Thinking documentation, including installation steps, prefill strategy, and performance metrics.
  • Add and refine NPU benchmarks, prerequisites, and Qwen3-NPU guides.
  • Fix README assets, image links, path issues, and reorganize documentation structure.

๐ŸŒŸ Contributors

  • Thanks to all contributors who helped ship this release.
  • Special thanks to @mrhaoxx and @poryfly for enabling RL-DPO support, and to all community members for kernel fixes, model adaptations, documentation, and tooling improvements.

Full Changelog: v0.4.3...v0.4.4

CC: @JimmyPeilinLi @mrhaoxx @ovowei @SkqLiao @KMSorSMS @poryfly @ouqingliang @Azure-Tang @Atream @chenht2022 @qiyuxinlin @ErvinXie @james0zan

Don't miss a new ktransformers release

NewReleases is sending notifications on new releases.