github kvcache-ai/ktransformers v0.5.2.post1
KTransformers v0.5.2

7 hours ago

🚀 Core Highlights

  • Simplified Installation with sglang Submodule: The kvcache-ai/sglang fork is now vendored as a git submodule and published to PyPI as sglang-kt. Installation is reduced from a multi-step manual process to a single ./install.sh command or pip install ktransformers (which auto-installs sglang-kt). Added daily CI auto-sync of the sglang submodule and automated PyPI publishing on version bump.
  • New Model Support — Qwen3.5, GLM-5, MiniMax-M2.5, Qwen3-Coder-Next: Day-0 support for four new MoE models spanning a wide range of hardware requirements — from Qwen3-Coder-Next (1x RTX 4090, 80B-A3B) to Qwen3.5 (4x RTX 4090, 400B MoE). All models support BF16 and FP8 precision backends with CPU-GPU heterogeneous inference.
  • Kimi-K2.5 Support & Mistral MoE Compatibility: Added Kimi-K2.5 deployment guides including SFT fine-tuning integration, fallback expert prefix lookup for robust weight loading, and Mistral MoE loader compatibility for broader model coverage.

📌 Models, Hardware & Tooling

  • Model support updates
    • Add Qwen3.5 (MoE-400B) with FP8 VL detection fix.
    • Add GLM-5 with BF16/FP8 precision support.
    • Add MiniMax-M2.5 with FP8 weight optimization.
    • Add Qwen3-Coder-Next (80B-A3B) for code generation.
    • Add Mistral MoE loader compatibility (#1873).
    • Add Kimi-K2.5 with fallback expert prefix lookup (#1822).
  • Kernel & hardware improvements
    • Fix k2-moe.hpp weight loading (#1830).
    • Fix wrapper import issue (#1819).
    • Improve CUDA code readability with explicit ele_per_blk variable (#1784).
  • Tooling & integration
    • Add top-level install.sh for one-click source installation (sglang + kt-kernel).
    • Publish sglang fork as sglang-kt on PyPI; kt-kernel auto-installs it as dependency.
    • Add CI workflows: daily sglang submodule sync, automated sglang-kt PyPI publishing.
    • Align sglang-kt version with ktransformers (single version.py source of truth).
    • kt-cli enhancements (#1834).
    • Handle unquoted paths and special characters in model scanner (#1840).
    • Update Docker build for submodule-based sglang installation.

📝 Docs & Community

🐛 Bug Fixes

  • Fix Qwen3.5 FP8 load for VL detection (#1857).
  • Fix k2-moe.hpp load weight issue (#1830).
  • Fix wrapper import issue (#1819).
  • Fix experts-sched-Tutorial.md (#1808).
  • Handle unquoted paths and special characters in model scanner (#1840).

🌟 Contributors

  • Thanks to all contributors who helped ship this release.

Full Changelog: v0.5.1...v0.5.2

CC: @ouqingliang @ErvinXie @chenht2022 @KMSorSMS @ovowei @SkqLiao @JimmyPeilinLi @mrhaoxx @james0zan

Don't miss a new ktransformers release

NewReleases is sending notifications on new releases.