github ggml-org/llama.cpp b8532

2 hours ago
Details

CUDA & CPU: support F32 kernel type for CONV_TRANSPOSE_2D (#17094)

  • Refactor CUDA 2D transpose implementation to support multiple kernel types and improve parameter handling
  • Introduced a conv2d_transpose_params struct for better parameter management.
  • Updated conv2d_transpose_kernel to be templated for different kernel types (float and half).
  • Modified ggml_cuda_conv_2d_transpose_p0 to handle both F16 and F32 kernel types.
  • Enhanced test cases to validate functionality for both kernel types.
  • Refactor test cases for 2D convolution transpose to support dynamic kernel types
  • Updated test_conv_transpose_2d structure to improve parameter handling by reordering constructor arguments.
  • Enhanced test case generation to iterate over kernel types, allowing for flexible testing of different configurations.
  • Removed hardcoded kernel type instances in favor of a loop for better maintainability and scalability.
  • Refactor ggml_compute_forward_conv_transpose_2d to support both F16 and F32 tensor types.

  • Refactor conv2d transpose kernel to use a template for kernel type, enhancing flexibility for different data types.
    Update test cases to include both F16 and F32 tensor types for comprehensive coverage.

  • Update ggml/src/ggml-cuda/conv2d-transpose.cu

Co-authored-by: Aman Gupta amangupta052@gmail.com

  • Update ggml/src/ggml-cpu/ggml-cpu.c

Co-authored-by: Aman Gupta amangupta052@gmail.com

  • Refactor conv2d transpose implementation by removing the conv2d_transpose_params struct and dispatching with direct kernel launch.

  • Enhance cpu conv2d transpose implementation by introducing a templated kernel type for improved flexibility with F16 and F32 data types.


Co-authored-by: Aman Gupta amangupta052@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.