github ggml-org/llama.cpp b7740

4 hours ago
Details

hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822)

  • hexagon: disable repack buffers if host buffers are disabled, improved handling of env vars

  • hexagon: add support for OP_CPY fp16/fp32 -> fp16/fp32

Factore out all hvx_copy functions into hvx-copy.h header and reduced code duplication.
Update HTP ops infra to support OP_CPY

  • hexagon: cleanup and refactor hex/hvx/htp headers and helper libs

hex is basically all scalar/core platform stuff (L2, DMA, basic utils)
hvx is all hvx related utils, helpers, etc
htp is higher level stuff like Ops, etc

hvx-utils library got a nice round of cleanup and refactoring to reduce duplication

use hvx_vec_store_a where possible

  • hexagon: refactor HVX sigmoid functions to hvx-sigmoid.h

Moved sigmoid and tanh vector functions from hvx-utils.h to a new header
hvx-sigmoid.h. Implemented aligned and unaligned variants for sigmoid
array processing using a macro pattern similar to hvx-copy.h. Updated
act-ops.c to use the new aligned variant hvx_sigmoid_f32_aa. Removed
unused hvx-sigmoid.c.

  • hexagon: factor out hvx-sqrt.h

  • hexagon: mintor update to hvx-utils.h

  • hexagon: remove spurios log

  • hexagon: factor out and optimize hvx_add/sub/mul

  • hexagon: remove _opt variants of add/sub/mul as they simply fully aligned versions

  • hexagon: refactor reduction functions to hvx-reduce.h

Moved hvx_self_max_f32 and hvx_self_sum_f32 from hvx-utils.h/.c to hvx-reduce.h.
Renamed them to hvx_reduce_max_f32 and hvx_reduce_sum_f32.
Added aligned (_a) and unaligned (_u) variants and used macros to unify logic.
Updated softmax-ops.c to use the new functions.

  • hexagon: refactor the rest of arithmetic functions to hvx-arith.h

Moved hvx_sum_of_squares_f32, hvx_min_scalar_f32, and hvx_clamp_scalar_f32 from hvx-utils.c/h to hvx-arith.h. Implemented aligned/unaligned variants (_aa, _au, etc.) and used macros to reduce code duplication. Updated hvx_min_scalar_f32 and hvx_clamp_scalar_f32 to use dst, src, ..., n argument order. Updated call sites in act-ops.c.

Refactor Hexagon HVX arithmetic functions (min, clamp) to hvx-arith.h

Moved hvx_min_scalar_f32 and hvx_clamp_scalar_f32 from hvx-utils.c/h to hvx-arith.h. Implemented aligned/unaligned variants (_aa, _au, etc.) and used macros to reduce code duplication. Updated these functions to use dst, src, ..., n argument order and updated call sites in act-ops.c. hvx_sum_of_squares_f32 remains in hvx-utils.c as requested.

  • hexagon: refactor hvx_sum_of_squares_f32
  • Modify hvx_sum_of_squares_f32 in ggml/src/ggml-hexagon/htp/hvx-reduce.h to use dst, src signature.
  • Implement _a (aligned) and _u (unaligned) variants for hvx_sum_of_squares_f32.
  • Update hvx_reduce_loop_body macro to support both returning and storing results via finalize_op.
  • Update existing reduction functions in hvx-reduce.h to use the updated macro.
  • Update rms_norm_htp_f32 in ggml/src/ggml-hexagon/htp/unary-ops.c to match the new signature.
  • hexagon: use hvx_splat instead of memset

  • hexagon: consistent use of f32/f16 in all function names to match the rest of GGML

  • hexagon: fix hvx_copy_f16_f32 on v75 and older

  • hexagon: update readme to include GGML_HEXAGON_EXPERIMENTAL

  • scripts: update snapdragon/adb scripts to enable host param

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.