Details
ggml-webgpu: compute pass batching and removing profiling overhead (#21873)
-
Update register tiling matmul to use f32 accumulation
-
fix profiling code
-
Fix register tiling matmul for chrome, i'm blaming dawn
-
Update batch tuning value for iOS
-
compile fix
-
Fix use of new load function
-
Move to a single query set for GPU profiling
-
Move to batching compute passes when not profiling
-
Refactor build_multi
-
remove iOS throttling now that we're batching compute passes
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu s390x (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: