ggml-org/llama.cpp b7761 on GitHub

Details

ggml webgpu: support for backend sampling (#18880)

Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32
precision for intermediate calculations to prevent f16 overflow.

Implements EXPM1 (exp(x) - 1) with f16/f32 support.

Implements FLOOR (rounds down to nearest integer) with f16/f32 support.

Implements CEIL (rounds up to nearest integer) with f16/f32 support.

Implements ROUND (rounds to nearest integer) with f16/f32 support.

Implements TRUNC (truncates towards zero) with f16/f32 support.

Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
Register pipelines and device support
docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND, TRUNC, EXPM1, SOFTPLUS)
Updates to webgpu get_memory
Add argmax
Add argmax,cumsum,sum,sum_rows
Add necessary CPY/GET_ROWS operators
Support for argsort using multi-pass strategy
Update set_rows for i32 indices, move to pre-wgsl
Port unary operators to pre-wgsl and support FILL
Implement PAD
Add support for top-k
clean up, scope pipeline init mutex
fix newline
Add support for log
Update LOG for better precision, and ops doc

Co-authored-by: Abhijit Ramesh abhijitramesh2k@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler: