github ggml-org/llama.cpp b7761

2 hours ago
Details

ggml webgpu: support for backend sampling (#18880)

  • ggml webgpu: add SOFTPLUS unary operator

Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32
precision for intermediate calculations to prevent f16 overflow.

  • Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)

  • Register pipelines and device support

  • Follow Vulkan backend numerical stability pattern

  • ggml webgpu: add EXPM1 unary operator

Implements EXPM1 (exp(x) - 1) with f16/f32 support.

  • Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)

  • Register pipelines and device support

  • ggml webgpu: add FLOOR unary operator

Implements FLOOR (rounds down to nearest integer) with f16/f32 support.

  • Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)

  • Register pipelines and device support

  • ggml webgpu: add CEIL unary operator

Implements CEIL (rounds up to nearest integer) with f16/f32 support.

  • Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)

  • Register pipelines and device support

  • ggml webgpu: add ROUND unary operator

Implements ROUND (rounds to nearest integer) with f16/f32 support.

  • Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)

  • Register pipelines and device support

  • ggml webgpu: add TRUNC unary operator

Implements TRUNC (truncates towards zero) with f16/f32 support.

  • Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)

  • Register pipelines and device support

  • docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND, TRUNC, EXPM1, SOFTPLUS)

  • Updates to webgpu get_memory

  • Add argmax

  • Add argmax,cumsum,sum,sum_rows

  • Add necessary CPY/GET_ROWS operators

  • Support for argsort using multi-pass strategy

  • Update set_rows for i32 indices, move to pre-wgsl

  • Port unary operators to pre-wgsl and support FILL

  • Implement PAD

  • Add support for top-k

  • clean up, scope pipeline init mutex

  • fix newline

  • Add support for log

  • Update LOG for better precision, and ops doc


Co-authored-by: Abhijit Ramesh abhijitramesh2k@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.