Details
vulkan: fix SSM_CONV PP scaling with large ubatch sizes (#20379)
- vulkan: optimize SSM_CONV workgroup dispatch for large ubatch
Tile tokens into 2D workgroups (32x16) to reduce workgroup launch
overhead at large ubatch sizes. Add vec4 fast path for nc=4 (common
d_conv size). Fixes PP performance degradation with ubatch > 512.
Ref: #18725
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- vulkan: remove unused shared memory declaration in SSM_CONV
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Co-authored-by: Progeny Alpha ProgenyAlpha@users.noreply.github.com
Co-authored-by: Claude Opus 4.6 noreply@anthropic.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: