github ggml-org/llama.cpp b8824

latest releases: b8826, b8825
3 hours ago
Details

hexagon: optimize HMX matmul operations (#21071)

  • optimize hmx_mat_mul functions by calculating row and column tiles upfront

  • refactor core_dot_chunk_fp16 to use size_t for tile counts and improve readability

  • wip

  • set scale outside of loop

  • wip

  • refactor core_mma_chunk_fp16 and mat_mul_qk_0_d16a32 to use size_t for tile counts

  • wip

  • wip

  • refactor transfer_output_chunk_fp16_to_fp32 to use size_t for dimensions

  • refactor core_dot_chunk_fp16 to use size_t for tile row stride calculation

  • wip

  • refactor hmx_mat_mul functions to use hvx_vec_splat_f16 for column scales initialization

  • refactor hmx_mat_mul_permuted_w16a32_batched to streamline scale setting and locking

  • refactor core_dot_chunk_fp16 to improve tile stride calculations for output

  • refactor hmx_mat_mul functions to use Q6_V_vsplat_R for column scales initialization

  • fix compiling error

  • wip

  • optimize row and column tile indexing in core_mma_chunk_fp16 function

  • wip

  • Revert "wip"

This reverts commit cde679e.

  • Add size limit check for HAP_mmap in htp_iface_mmap and drop_mmap functions

  • wip

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.