Details
ggml-hexagon: gelu optimization (#18151)
-
feat: working gelu with src0 put on vtcm
-
feat: gelu ping-pong for both in and out
-
fix: fixu compile error
-
break: distinguish dma ddr->vtcm and vtcm->ddr operation
-
fix: fix dma queue size
-
break: update dma api to either pop src or dst ptr
-
fix: fix activation vtcm allocation issue for src1 when swapperd
-
refactor: ping-pong gelu logic to avoid unnecessary if else
-
dma: improved queue interface and prefetch handling
-
gelu: fix N+2 block prefetch
Co-authored-by: Max Krasnyansky maxk@qti.qualcomm.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: