github ggml-org/llama.cpp b9489

latest release: b9490
3 hours ago
Details

cuda: reserve space for quantize kv-cache at startup (#23907)

  • cuda: reserve space for quantize kv-cache at startup

  • address review comments

  • remove forward decl

Co-authored-by: Johannes Gäßler johannesg@5d6.de

  • remove assert in ggml-cuda.cu

Co-authored-by: Johannes Gäßler johannesg@5d6.de


Co-authored-by: Johannes Gäßler johannesg@5d6.de

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.