Details
ggml webgpu: fix workgroup dispatch limit for large batch sizes (#19965)
- ggml-webgpu: fix workgroup dispatch limit for large batch sizes
WebGPU limits workgroup sizes to 65535 per dimension. Large MUL_MAT
operations with batch sizes exceedeing this limi would fail.
-
add compute_2d_workgroups() helper to split total workgroup ID across
X/Y dimensions -
update mul_mat_reg_tile.wgsl to reconstruct linear workgroup ID from 2D
dispatch -
update mul_mat_subgroup_matrix.wgsl to reconstruct linear workgroup ID
from 2D dispatch -
update mul_mat.wgsl to compute global index from 2D workgroup
coordinates -
refactor all three mul_mat dispatch paths to use the shared helper
-
ggml-webgpu: add bounds checking for over-dispatched workgroups
2D workgroup dispatch can over-dispatch when total workgroups don't
divide evenly into the 65535 per-dimension limit. Extra workgroups
would compute invalid batch indices, causing memory corruption.
-
add batch_idx bound check to mul_mat_reg_tile.wgsl and
mul_mat_subgroup_matrix.wgsl to prevent over-dispatched workgroups
from accessing invalid memory -
fixes test failures with large batch sizes (eg., bs=[128, 1024])
-
ggml-webgpu: add back TODO for spliting large sizes into batches
-
Optimize 2d workgroup provisioning
-
Set some parameters that increase speed
Co-authored-by: Reese Levine reeselevine1@gmail.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: