ggml-org/llama.cpp b7845
on GitHub

4 hours ago

Details

ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (#18888)

Boilerplate for q6_K repack
q6_K repack to q6_Kx8 implementation

Signed-off-by: Alberto Cabrera alberto.cabrera@liquid.ai

q6_K generic gemv and gemm
wip, gemm_q6_K 8x8
Still WIP: loading of q8s, q6h and q6l
first working version of q6_K gemm
Moved q6 loads outside of sb block, Unrolled inner loop
Replaced modulo with mask
First implementation of GEMV
ggml_vdotq_s32 -> vdotq_s32
Reduce width of accumulators in q6_K gemv
Bsums instead of calc bias. Preload scales to use vget_lane. Unroll.
Reuse scales in GEMM (same GEMV opt)
Added todos for bsum and different qh repack
Arch fallback
VSLIQ for merging qh adn ql
Removed TODO, already tested
Apply suggestions

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Removed unused import

Signed-off-by: Alberto Cabrera alberto.cabrera@liquid.ai
Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b7845

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications