Details
CUDA & CPU: support F32 kernel type for CONV_TRANSPOSE_2D (#17094)
- Refactor CUDA 2D transpose implementation to support multiple kernel types and improve parameter handling
- Introduced a
conv2d_transpose_paramsstruct for better parameter management. - Updated
conv2d_transpose_kernelto be templated for different kernel types (float and half). - Modified
ggml_cuda_conv_2d_transpose_p0to handle both F16 and F32 kernel types. - Enhanced test cases to validate functionality for both kernel types.
- Refactor test cases for 2D convolution transpose to support dynamic kernel types
- Updated
test_conv_transpose_2dstructure to improve parameter handling by reordering constructor arguments. - Enhanced test case generation to iterate over kernel types, allowing for flexible testing of different configurations.
- Removed hardcoded kernel type instances in favor of a loop for better maintainability and scalability.
-
Refactor ggml_compute_forward_conv_transpose_2d to support both F16 and F32 tensor types.
-
Refactor conv2d transpose kernel to use a template for kernel type, enhancing flexibility for different data types.
Update test cases to include both F16 and F32 tensor types for comprehensive coverage. -
Update ggml/src/ggml-cuda/conv2d-transpose.cu
Co-authored-by: Aman Gupta amangupta052@gmail.com
- Update ggml/src/ggml-cpu/ggml-cpu.c
Co-authored-by: Aman Gupta amangupta052@gmail.com
-
Refactor conv2d transpose implementation by removing the conv2d_transpose_params struct and dispatching with direct kernel launch.
-
Enhance cpu conv2d transpose implementation by introducing a templated kernel type for improved flexibility with F16 and F32 data types.
Co-authored-by: Aman Gupta amangupta052@gmail.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: