github kvcache-ai/ktransformers v0.4.1
KTransformers-ft v0.4.1

latest releases: v0.4.3, v0.4.2
one month ago

πŸš€ Core Highlights: KT Fine-Tuning & Ecosystem

  • Launch native KT SFT (Supervised Fine-Tuning) workflow, supporting ultra-large model (e.g., 671B MoE) fine-tuning on 2–4 RTX 4090s
  • Achieve seamless integration with LLaMA-Factoryβ€”enable KT backend via LLaMA-Factory YAML config, unlocking 671B model tuning for LLaMA-Factory users

πŸ“Œ Model & Hardware Expansion

  • Add support for Kimi-K2 (including 0905 variant), SmallThinker, GLM4-MoE, Qwen3Next
  • Introduce initial Ascend NPU support (with DeepSeek-R1 NPU tutorial)
  • Revert FP16 usage on XPU to fix stability issues

πŸ”§ Kernel & Tooling Improvements

  • Release KT-Kernel (core compute engine), with Expert Deferral mechanism for MoE layers
  • Fix KT-Kernel bugs and optimize execution stability
  • Update balance_serve.py to enhance multi-concurrency inference load balancing
  • Resolve CMake build issues (env settings + AMX support)

πŸ“ Documentation Updates

  • Add KT SFT & LLaMA-Factory integration guides (with YAML examples)
  • Refine Kimi-K2 docs (GGUF links, 0905 support dates) and add SmallThinker/GLM4-MoE tutorials
  • Include SGLang integration docs and citation section in README

🌟 Contributors

Don't miss a new ktransformers release

NewReleases is sending notifications on new releases.