github kvcache-ai/ktransformers v0.2.3

latest releases: v0.4.3, v0.4.2, v0.4.1...
9 months ago

We're excited to announce the update of KTransformers v0.2.3! You can now compile from the GitHub source code. Release packages and Docker images are being built/uploaded - stay tuned!

Key Updates:

  1. Low-Precision Inference Optimization #754

    1. Added IQ1_S/IQ2_XXS quantized matmul support, now compatible with Unsloth's DeepSeek-R1 1.58bit/2.51bit dynamic quantized weights

    2. Released DeepSeek-R1 mixed-precision model (IQ1+FP8) achieving enhanced performance:

      • 19GB VRAM usage & 140GB system memory consumption

      • MMLU score of 83.6, slightly outperforming full-precision DeepSeek-V3

      • Ongoing benchmarks: View Details (Special thanks to @moonshadow-25 and @godrosev for their huge contributions to v0.2.3)

  2. Long Context Handling Enhancement #750

    1. Implemented chunked prefill mechanism. Supports processing 139K-token contexts with DeepSeek-R1 on 24GB VRAM

    2. Note: As DeepSeek's native context window only supports 128K tokens, we will pause further optimizations for extended context handling.


Coming Next - v0.2.4 Preview The upcoming v0.2.4 will be the final minor release in the 0.2 series, delivering the most crucial update to transform KTransformers from "a toy project" to "a practical solution" - multi-concurrency support.

Scheduled for release within two weeks, this update will be followed by our 0.3 version development featuring:

  • AMX-powered optimizations for enhanced performance

  • Expanded hardware support including AMD, XPU, MetaX(沐曦), Moore threads(摩尔线程), and Ascend(昇腾)GPUs

Don't miss a new ktransformers release

NewReleases is sending notifications on new releases.