github ggml-org/llama.cpp b9255

latest releases: b9264, b9263, b9260...
5 hours ago
Details

hexagon: HMX quantized matmul rework (#23368)

  • hmx-mm: update debug logging in hmx-mm

  • hmx-mm: update dequant logic to use HVX_vector_x2/4

  • hmx-mm: remove non-pipelined version of the quantize matmul

It seems that we don't reall need non-pipelined version

  • hmx-mm: use activation depth mode and update naming

Co-authored-by: Kim-Chyan Gan kgan@qti.qualcomm.com

  • hex-mm: minor hmx matmul naming updates

  • hmx-mm: remove unused vars

  • snapdragon: scripts bump default ubatch-size to 1K

  • hexagon: combine HMX and power and clock settings into a single set_power call

  • hmx-mm: remove leftover of the scale repl helper

  • hexagon: fix editconf error


Co-authored-by: Kim-Chyan Gan kgan@qti.qualcomm.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.