github kvcache-ai/ktransformers v0.5.0
KTransformers v0.5.0

7 hours ago

๐Ÿš€ Core Highlights

  • Native FP8 MoE Kernel: Introducing native FP8 precision support for MoE inference with a new AVX-based kernel. Run FP8 models directly without precision conversion overhead, preserving the original model accuracy while maximizing hardware efficiency.

  • kt-cli for Effortless Local Inference: A new CLI tool designed for simplicity and ease of use. Model management, automatic configuration, seamless chat/completions workflows, and built-in SGLang environment detectionโ€”get started with local LLM inference in minutes.

  • Enhanced Layerwise Prefill: Improved layerwise prefill performance through expert-by-expert pipelining. The layerwise prefill architecture enables efficient memory streaming during prefill, significantly improving throughput and reducing latency for long-context workloads.

๐Ÿ“Œ Models, Hardware & Tooling

  • Model support updates

    • Extend the FP8 enablement path in this release, focusing on native FP8 MoE support and compatibility improvements.
    • Add native MiniMax-M2, MiniMax-M2.1, DeepSeek-V3.2 support and related enablement.
  • Kernel & hardware improvements

    • Add AVX-based FP8 MoE kernel.
    • Reduce DRAM requirements for most models during prefill in CPU.
    • Improve layerwise prefill for better throughput.
  • Tooling & integration

    • Introduce kt-cli, a new unified CLI for model management, chat, automatic configuration and inference server management.
  • Deployment & installation

    • Refactor installation workflows/scripts for the new CLI/tooling path (including cleanup of legacy install steps).
    • Improve CPU instruction set auto detection.

๐Ÿ“ Docs & Community

  • Add MiniMax-M2.1 end-to-end tutorial.
  • Refine DPO tutorial.

๐ŸŒŸ Contributors

  • Thanks to all contributors who helped ship this release.

Full Changelog: v0.4.4...v0.5.0

CC: @ouqingliang @ErvinXie @chenht2022 @KMSorSMS @ovowei @SkqLiao @JimmyPeilinLi @mrhaoxx @james0zan

Don't miss a new ktransformers release

NewReleases is sending notifications on new releases.