kvcache-ai/ktransformers v0.2.3 on GitHub

We're excited to announce the update of KTransformers v0.2.3! You can now compile from the GitHub source code. Release packages and Docker images are being built/uploaded - stay tuned!

Key Updates:

Low-Precision Inference Optimization #754
1. Added IQ1_S/IQ2_XXS quantized matmul support, now compatible with Unsloth's DeepSeek-R1 1.58bit/2.51bit dynamic quantized weights
2. Released DeepSeek-R1 mixed-precision model (IQ1+FP8) achieving enhanced performance:
  - 19GB VRAM usage & 140GB system memory consumption
  - MMLU score of 83.6, slightly outperforming full-precision DeepSeek-V3
  - Ongoing benchmarks: View Details (Special thanks to @moonshadow-25 and @godrosev for their huge contributions to v0.2.3)
Long Context Handling Enhancement #750
1. Implemented chunked prefill mechanism. Supports processing 139K-token contexts with DeepSeek-R1 on 24GB VRAM
2. Note: As DeepSeek's native context window only supports 128K tokens, we will pause further optimizations for extended context handling.

Coming Next - v0.2.4 Preview The upcoming v0.2.4 will be the final minor release in the 0.2 series, delivering the most crucial update to transform KTransformers from "a toy project" to "a practical solution" - multi-concurrency support.

Scheduled for release within two weeks, this update will be followed by our 0.3 version development featuring:

AMX-powered optimizations for enhanced performance
Expanded hardware support including AMD, XPU, MetaX（沐曦）, Moore threads（摩尔线程）, and Ascend（昇腾）GPUs