github ggml-org/llama.cpp b7851

latest releases: b7865, b7864, b7862...
19 hours ago
Details

ggml webgpu: Split shared state (webgpu_context) into global state and per-thread state (#18976)

  • Squashed commit of the following:

commit b3c6bf4
Author: Abhijit Ramesh abhijitramesh2k@gmail.com
Date: Mon Dec 1 18:29:00 2025 -0800

ggml webgpu: fix xielu parameter passing (#11)

The XIELU operation was incorrectly using static_cast to convert
float parameters to uint32_t, which converted numeric values instead
of preserving IEEE 754 bit patterns. This caused incorrect values
to be interpreted by the GPU shader.

* Use reinterpret_cast to preserve float bit patterns when passing
  through uint32_t params buffer
* Update WGSL shader parameter types from u32 to f32
* Re-enable XIELU support (was disabled due to numerical issues)

Fixes NMSE test failures for XIELU operation on WebGPU backend.

commit 5ca9b5e
Author: neha-ha 137219201+neha-ha@users.noreply.github.com
Date: Tue Nov 18 12:17:00 2025 -0800

Refactored pipelines and workgroup calculations (#10)

* refactored pipelines

* refactored workgroup calculation

* removed commented out block of prior maps

* Clean up ceiling division pattern

---------

Co-authored-by: Neha Abbas <nehaabbas@eduroam-169-233-141-223.ucsc.edu>
Co-authored-by: Reese Levine <reeselevine1@gmail.com>

Author: James Contini jamescontini@gmail.com
Date: Wed Oct 29 23:13:06 2025 -0700

formatted embed wgsl and ggml-webgpu.cpp

commit e1f6bae
Author: James Contini jamescontini@gmail.com
Date: Wed Oct 29 23:08:37 2025 -0700

implemented REPL_Template support and removed bug in unary operators kernel

commit 8c70b8f
Author: James Contini jamescontini@gmail.com
Date: Wed Oct 15 16:14:20 2025 -0700

responded and dealt with PR comments

commit f9282c6
Author: James Contini jamescontini@gmail.com
Date: Sun Oct 12 13:41:41 2025 -0700

removed unnecesarry checking if node->src[1] exists for unary operators

commit 4cf28d7
Author: James Contini jamescontini@gmail.com
Date: Sun Oct 12 13:32:45 2025 -0700

All operators (inlcluding xielu) working

commit 74c6add
Author: James Contini jamescontini@gmail.com
Date: Fri Oct 10 13:16:48 2025 -0700

fixed autoconfig

commit 3627499
Author: James Contini jamescontini@gmail.com
Date: Fri Oct 10 13:10:46 2025 -0700

removed vestigial files

commit cb08583
Author: James Contini jamescontini@gmail.com
Date: Fri Oct 10 12:59:32 2025 -0700

abides by editor-config

commit 5360e28
Author: James Contini jamescontini@gmail.com
Date: Fri Oct 10 12:45:57 2025 -0700

rms_norm double declaration bug atoned

commit 7b09baa
Merge: 8a6ec84 74b8fc1
Author: James Contini jamescontini@gmail.com
Date: Fri Oct 10 11:50:03 2025 -0700

resolving merge conflicts

commit 8a6ec84
Author: James Contini jamescontini@gmail.com
Date: Wed Oct 8 18:06:47 2025 -0700

unary operators pass ggml tests

commit c3ae382
Author: James Contini jamescontini@gmail.com
Date: Wed Oct 1 16:22:40 2025 -0700

neg passes backend test

commit aa1c9b2
Author: James Contini jamescontini@gmail.com
Date: Tue Sep 30 23:55:27 2025 -0700

neg f16xf32xip builds and runs, havent actually ran a model that uses neg kernel yet though

Co-authored-by: James Contini jamescontini@gmail.com
Co-authored-by: Neha Abbas neabbas@ucsc.edu
Co-authored-by: Abhijit Ramesh abhijitramesh2k@gmail.com

  • Remove extra code and format

  • Add ops documentation (finally)

  • ggml webgpu: add SOFTPLUS unary operator

Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32
precision for intermediate calculations to prevent f16 overflow.

  • Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)

  • Register pipelines and device support

  • Follow Vulkan backend numerical stability pattern

  • ggml webgpu: add EXPM1 unary operator

Implements EXPM1 (exp(x) - 1) with f16/f32 support.

  • Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)

  • Register pipelines and device support

  • ggml webgpu: add FLOOR unary operator

Implements FLOOR (rounds down to nearest integer) with f16/f32 support.

  • Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)

  • Register pipelines and device support

  • ggml webgpu: add CEIL unary operator

Implements CEIL (rounds up to nearest integer) with f16/f32 support.

  • Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)

  • Register pipelines and device support

  • ggml webgpu: add ROUND unary operator

Implements ROUND (rounds to nearest integer) with f16/f32 support.

  • Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)

  • Register pipelines and device support

  • ggml webgpu: add TRUNC unary operator

Implements TRUNC (truncates towards zero) with f16/f32 support.

  • Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)

  • Register pipelines and device support

  • docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND, TRUNC, EXPM1, SOFTPLUS)

  • Updates to webgpu get_memory

  • Move shared state (webgpu_context) and device creation out of registration context, device context, and buffer context, and move into backend context

  • Small cleanup

  • Move Instance, Device, Adapter, Device creation, and capabilities to global state while moving Queue, pipelines, and buffers to per-thread state.

  • Cleanups

  • More cleanup

  • Move staging_buf mutex to global context

  • Resolve merge

  • Resolve merge

  • Resolve merge

  • Clean up merge errors, delete forward declaration, and run clang-format

  • Rename device_init to backend_init

  • Move webgpu_context to backend_context

  • Move buffer context members into global context and refactor function calls

  • Run clang-format

  • Remove commends

  • Move parameter buffers to per-thread, add single memset_tensor param buf

  • Fix CI compilation issue

  • Fix builds for emscripten not supporting subgroups

  • cleanup

  • cleanup


Co-authored-by: Reese Levine reeselevine1@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.