kvcache-ai/ktransformers v0.4.1
KTransformers-ft v0.4.1

on GitHub

latest releases: v0.4.3, v0.4.2

one month ago

🚀 Core Highlights: KT Fine-Tuning & Ecosystem

Launch native KT SFT (Supervised Fine-Tuning) workflow, supporting ultra-large model (e.g., 671B MoE) fine-tuning on 2–4 RTX 4090s
Achieve seamless integration with LLaMA-Factory—enable KT backend via LLaMA-Factory YAML config, unlocking 671B model tuning for LLaMA-Factory users

📌 Model & Hardware Expansion

Add support for Kimi-K2 (including 0905 variant), SmallThinker, GLM4-MoE, Qwen3Next
Introduce initial Ascend NPU support (with DeepSeek-R1 NPU tutorial)
Revert FP16 usage on XPU to fix stability issues

🔧 Kernel & Tooling Improvements

Release KT-Kernel (core compute engine), with Expert Deferral mechanism for MoE layers
Fix KT-Kernel bugs and optimize execution stability
Update balance_serve.py to enhance multi-concurrency inference load balancing
Resolve CMake build issues (env settings + AMX support)

📝 Documentation Updates

Add KT SFT & LLaMA-Factory integration guides (with YAML examples)
Refine Kimi-K2 docs (GGUF links, 0905 support dates) and add SmallThinker/GLM4-MoE tutorials
Include SGLang integration docs and citation section in README

🌟 Contributors

Welcome first-time contributors who delivered:
Ascend NPU adaptation + tutorial
DeepSeek-R1 NPU guide
KT SFT feature development
Full Changelog: v0.3.2...v0.4.1
CC: @JimmyPeilinLi @yangqianrui @ovowei @KMSorSMS @Azure-Tang @Atream @chenht2022 @qiyuxinlin @ErvinXie @james0zan

Check out latest releases or
releases around kvcache-ai/ktransformers v0.4.1

Don't miss a new ktransformers release

NewReleases is sending notifications on new releases.

Get notifications