Details
hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822)
-
hexagon: disable repack buffers if host buffers are disabled, improved handling of env vars
-
hexagon: add support for OP_CPY fp16/fp32 -> fp16/fp32
Factore out all hvx_copy functions into hvx-copy.h header and reduced code duplication.
Update HTP ops infra to support OP_CPY
- hexagon: cleanup and refactor hex/hvx/htp headers and helper libs
hex is basically all scalar/core platform stuff (L2, DMA, basic utils)
hvx is all hvx related utils, helpers, etc
htp is higher level stuff like Ops, etc
hvx-utils library got a nice round of cleanup and refactoring to reduce duplication
use hvx_vec_store_a where possible
- hexagon: refactor HVX sigmoid functions to hvx-sigmoid.h
Moved sigmoid and tanh vector functions from hvx-utils.h to a new header
hvx-sigmoid.h. Implemented aligned and unaligned variants for sigmoid
array processing using a macro pattern similar to hvx-copy.h. Updated
act-ops.c to use the new aligned variant hvx_sigmoid_f32_aa. Removed
unused hvx-sigmoid.c.
-
hexagon: factor out hvx-sqrt.h
-
hexagon: mintor update to hvx-utils.h
-
hexagon: remove spurios log
-
hexagon: factor out and optimize hvx_add/sub/mul
-
hexagon: remove _opt variants of add/sub/mul as they simply fully aligned versions
-
hexagon: refactor reduction functions to hvx-reduce.h
Moved hvx_self_max_f32 and hvx_self_sum_f32 from hvx-utils.h/.c to hvx-reduce.h.
Renamed them to hvx_reduce_max_f32 and hvx_reduce_sum_f32.
Added aligned (_a) and unaligned (_u) variants and used macros to unify logic.
Updated softmax-ops.c to use the new functions.
- hexagon: refactor the rest of arithmetic functions to hvx-arith.h
Moved hvx_sum_of_squares_f32, hvx_min_scalar_f32, and hvx_clamp_scalar_f32 from hvx-utils.c/h to hvx-arith.h. Implemented aligned/unaligned variants (_aa, _au, etc.) and used macros to reduce code duplication. Updated hvx_min_scalar_f32 and hvx_clamp_scalar_f32 to use dst, src, ..., n argument order. Updated call sites in act-ops.c.
Refactor Hexagon HVX arithmetic functions (min, clamp) to hvx-arith.h
Moved hvx_min_scalar_f32 and hvx_clamp_scalar_f32 from hvx-utils.c/h to hvx-arith.h. Implemented aligned/unaligned variants (_aa, _au, etc.) and used macros to reduce code duplication. Updated these functions to use dst, src, ..., n argument order and updated call sites in act-ops.c. hvx_sum_of_squares_f32 remains in hvx-utils.c as requested.
- hexagon: refactor hvx_sum_of_squares_f32
- Modify
hvx_sum_of_squares_f32inggml/src/ggml-hexagon/htp/hvx-reduce.hto usedst, srcsignature. - Implement
_a(aligned) and_u(unaligned) variants forhvx_sum_of_squares_f32. - Update
hvx_reduce_loop_bodymacro to support both returning and storing results viafinalize_op. - Update existing reduction functions in
hvx-reduce.hto use the updated macro. - Update
rms_norm_htp_f32inggml/src/ggml-hexagon/htp/unary-ops.cto match the new signature.
-
hexagon: use hvx_splat instead of memset
-
hexagon: consistent use of f32/f16 in all function names to match the rest of GGML
-
hexagon: fix hvx_copy_f16_f32 on v75 and older
-
hexagon: update readme to include GGML_HEXAGON_EXPERIMENTAL
-
scripts: update snapdragon/adb scripts to enable host param
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: