github kvcache-ai/ktransformers v0.5.1
KTransformers v0.5.1

11 hours ago

🚀 Core Highlights

  • Optimized CPU-GPU Expert Scheduling: Introducing a flexible GPU expert mask system that enables intelligent placement of MoE experts across CPU and GPU. The new scheduling system supports multiple placement strategies (frequency-based, uniform, front-loading, random) and dynamic expert updates during inference, significantly improving throughput by up to 30% at lower GPU expert ratios.
  • Native Precision MoE Support with CI: Expanded native precision support for FP8 and BF16 MoE models. Run Qwen3-BF16, GLM-4.7, GLM-4.7-FP8 and more models directly in their native precision without conversion overhead, now with comprehensive CI coverage.
  • Unified Fine-tuning & Inference Pipeline: New end-to-end tutorial for cost-effective large model fine-tuning and inference using AutoDL cloud infrastructure. Complete the full LoRA fine-tuning and inference loop for models from 14B to 235B with minimal GPU resources.

📌 Models, Hardware & Tooling

  • Model support updates
    • Add native precision support for MiniMax-M2, MiniMax-M2.1, MiMo, DeepSeek-V3.2, GLM-4.7-FP8.
    • Extend FP8 and BF16 MoE enablement path with CI validation.
  • Kernel & hardware improvements
    • Introduce GPU expert mask system for flexible per-layer expert placement control.
    • Add dual-stream CPU-GPU parallel optimization to hide CPU overhead when experts are fully on GPU.
    • Implement dynamic expert update for runtime adaptive optimization during layerwise prefill.
    • New parameters: --kt-num-gpu-experts (per-layer), --kt-gpu-experts-ratio (global ratio 0.0-1.0).
    • Add expert placement strategies: frequency, uniform, front-loading, random.
  • Tooling & integration
    • Add inference statistics and analysis functionality for GPU expert hit rate monitoring.

📝 Docs & Community

🐛 Bug Fixes

  • Fix environment mismatch issues in AutoDL community image for fine-tuning and inference.
  • Fix various stability issues in kt-kernel.
  • Improve error handling and logging for expert distribution recording.

🌟 Contributors

  • Thanks to all contributors who helped ship this release.

Full Changelog: v0.5.0...v0.5.1

CC: @ouqingliang @ErvinXie @chenht2022 @KMSorSMS @ovowei @SkqLiao @JimmyPeilinLi @mrhaoxx @james0zan

Don't miss a new ktransformers release

NewReleases is sending notifications on new releases.