github ggml-org/llama.cpp b8140

latest releases: b8143, b8142, b8141...
8 hours ago
Details

hexagon refactor all Ops to use local context struct (#19819)

  • hexagon: refactor set/get/sum-rows ops to use local context

  • hexagon: refactor ROPE and Softmax Ops to use local context

Improves performance a bit by precomputing things and saving in the context.

  • hexagon: refactor activation ops to use local context struct

  • hexagon: refactor unary ops to use local context struct and DMA/VTCM

  • hexagon: use aligned hvx_scale function

  • hexagon: remove unused fields from op_context

  • hexagon: rewrite ROPE to use DMA and VTCM scratchpad

  • hex-rope: keep N rows in scratchpad (instead of just two)

  • hex-rope: introduce rowidx cache

  • hex-rope: remove unused fields

  • hex-rope: rewrite dma prefetch logic to allow for multi-row fetch/compute

also removes the need for fastdiv.

  • hex-rope: minor formatting

  • hex-rope: use indices and unroll the loops

  • hex-rope: more updates to cleanup rope-block handling

  • hexagon: cleanup supported type/dims checks

  • hexagon: all reduce funcs replicated across lanes

There is no need to explicitly replicate the first value.

  • snapdragon: update adb and windows scripts to use ubatch-size 256

Updated Ops support handles larger ubatches.

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.