github kvcache-ai/ktransformers v0.3.1

latest releases: v0.4.4, v0.4.3, v0.4.2...
7 months ago

🚀 New Features

⚡ Performance Improvements

  • DeepSeek-R1 Q4 decoding @ 7.5 tokens/s
    Measured on a single-socket Xeon + DDR5 4800 MT/s + A770 platform; enabling dual-NUMA delivers additional speedups.

  • Easy benchmarking
    Try it yourself with the local_chat script to see these gains firsthand.

🔜 What’s Next

  • Balance_serve integration
    We’re working to seamlessly merge Intel GPU operators into the balance_serve backend for end-to-end support and streamlined maintenance.

Don't miss a new ktransformers release

NewReleases is sending notifications on new releases.