ggml-org/llama.cpp b8140
on GitHub

latest releases: b9670, b9669, b9668...

3 months ago

Details

hexagon refactor all Ops to use local context struct (#19819)

hexagon: refactor set/get/sum-rows ops to use local context
hexagon: refactor ROPE and Softmax Ops to use local context

Improves performance a bit by precomputing things and saving in the context.

hexagon: refactor activation ops to use local context struct
hexagon: refactor unary ops to use local context struct and DMA/VTCM
hexagon: use aligned hvx_scale function
hexagon: remove unused fields from op_context
hexagon: rewrite ROPE to use DMA and VTCM scratchpad
hex-rope: keep N rows in scratchpad (instead of just two)
hex-rope: introduce rowidx cache
hex-rope: remove unused fields
hex-rope: rewrite dma prefetch logic to allow for multi-row fetch/compute

also removes the need for fastdiv.

hex-rope: minor formatting
hex-rope: use indices and unroll the loops
hex-rope: more updates to cleanup rope-block handling
hexagon: cleanup supported type/dims checks
hexagon: all reduce funcs replicated across lanes

There is no need to explicitly replicate the first value.

snapdragon: update adb and windows scripts to use ubatch-size 256

Updated Ops support handles larger ubatches.

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b8140

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications