Details
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (#18888)
-
Boilerplate for q6_K repack
-
q6_K repack to q6_Kx8 implementation
Signed-off-by: Alberto Cabrera alberto.cabrera@liquid.ai
-
q6_K generic gemv and gemm
-
wip, gemm_q6_K 8x8
-
Still WIP: loading of q8s, q6h and q6l
-
first working version of q6_K gemm
-
Moved q6 loads outside of sb block, Unrolled inner loop
-
Replaced modulo with mask
-
First implementation of GEMV
-
ggml_vdotq_s32 -> vdotq_s32
-
Reduce width of accumulators in q6_K gemv
-
Bsums instead of calc bias. Preload scales to use vget_lane. Unroll.
-
Reuse scales in GEMM (same GEMV opt)
-
Added todos for bsum and different qh repack
-
Arch fallback
-
VSLIQ for merging qh adn ql
-
Removed TODO, already tested
-
Apply suggestions
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Removed unused import
Signed-off-by: Alberto Cabrera alberto.cabrera@liquid.ai
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: