Details
ggml-cpu: FA add GEMM microkernel (#19422)
-
ggml-cpu: FA add GEMM microkernel
-
add guard for sizeless vector types
-
fix case where DV % GGML_F32_EPR !=0
-
move memset out of the loop
-
move another memset out of the loop
-
use RM=4 for arm
-
simd_gemm: convert everything to int
-
convert everything to size_t to avoid warnings
-
fixup
-
add pragma for ignoring aggressive loop optimizations
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: