ggml-org/llama.cpp b9255
on GitHub

latest releases: b9957, b9956, b9952...

one month ago

Details

hexagon: HMX quantized matmul rework (#23368)

hmx-mm: update debug logging in hmx-mm
hmx-mm: update dequant logic to use HVX_vector_x2/4
hmx-mm: remove non-pipelined version of the quantize matmul

It seems that we don't reall need non-pipelined version

hmx-mm: use activation depth mode and update naming

Co-authored-by: Kim-Chyan Gan kgan@qti.qualcomm.com

hex-mm: minor hmx matmul naming updates
hmx-mm: remove unused vars
snapdragon: scripts bump default ubatch-size to 1K
hexagon: combine HMX and power and clock settings into a single set_power call
hmx-mm: remove leftover of the scale repl helper
hexagon: fix editconf error

Co-authored-by: Kim-Chyan Gan kgan@qti.qualcomm.com

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b9255

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications