Details
opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno (#18970)
-
opencl: add
copy_to_contiguousand utilize mm kernels -
opencl: only copy to cont for f32 and f16 tensors
-
opencl: use cont mm for fallback when dst is large
-
opencl: use nb local to copy-to-cont
-
opencl: use local offset as well
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: