github ggml-org/llama.cpp b9075

2 hours ago
Details

cuda: fuse snake activation (mul, sin, sqr, mul, add) (#22667)

  • cuda: fuse snake activation (mul, sin, sqr, mul, add)

Add ggml_cuda_op_snake_fused with F32 / F16 / BF16 templates. The
matcher recognizes the naive 5 op decomposition emitted by audio
decoders (BigVGAN, Vocos) for snake activation
y = x + sin(a*x)^2 * inv_b and rewrites it to a single elementwise
kernel.

Add test_snake_fuse comparing CPU naive vs CUDA fused across
F32 / F16 / BF16.

  • cuda: address review feedback from @am17an

Use ggml_cuda_cast for F32/F16/BF16 conversions and rename
kernel_snake to snake_kernel to match upstream conventions.

  • cuda: snake fusion fastdiv on T_len, Suggested-by: @am17an

  • Update tests/test-backend-ops.cpp

Co-authored-by: Aman Gupta amangupta052@gmail.com

  • cuda: snake fusion check add->type matches x->type

Address review feedback from @am17an

  • cuda: snake fusion check add->type matches x->type

Moved for readability (equivalent)
Address review feedback from @am17an


Co-authored-by: Aman Gupta amangupta052@gmail.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.