Details
ggml webgpu: Split shared state (webgpu_context) into global state and per-thread state (#18976)
- Squashed commit of the following:
commit b3c6bf4
Author: Abhijit Ramesh abhijitramesh2k@gmail.com
Date: Mon Dec 1 18:29:00 2025 -0800
ggml webgpu: fix xielu parameter passing (#11)
The XIELU operation was incorrectly using static_cast to convert
float parameters to uint32_t, which converted numeric values instead
of preserving IEEE 754 bit patterns. This caused incorrect values
to be interpreted by the GPU shader.
* Use reinterpret_cast to preserve float bit patterns when passing
through uint32_t params buffer
* Update WGSL shader parameter types from u32 to f32
* Re-enable XIELU support (was disabled due to numerical issues)
Fixes NMSE test failures for XIELU operation on WebGPU backend.
commit 5ca9b5e
Author: neha-ha 137219201+neha-ha@users.noreply.github.com
Date: Tue Nov 18 12:17:00 2025 -0800
Refactored pipelines and workgroup calculations (#10)
* refactored pipelines
* refactored workgroup calculation
* removed commented out block of prior maps
* Clean up ceiling division pattern
---------
Co-authored-by: Neha Abbas <nehaabbas@eduroam-169-233-141-223.ucsc.edu>
Co-authored-by: Reese Levine <reeselevine1@gmail.com>
Author: James Contini jamescontini@gmail.com
Date: Wed Oct 29 23:13:06 2025 -0700
formatted embed wgsl and ggml-webgpu.cpp
commit e1f6bae
Author: James Contini jamescontini@gmail.com
Date: Wed Oct 29 23:08:37 2025 -0700
implemented REPL_Template support and removed bug in unary operators kernel
commit 8c70b8f
Author: James Contini jamescontini@gmail.com
Date: Wed Oct 15 16:14:20 2025 -0700
responded and dealt with PR comments
commit f9282c6
Author: James Contini jamescontini@gmail.com
Date: Sun Oct 12 13:41:41 2025 -0700
removed unnecesarry checking if node->src[1] exists for unary operators
commit 4cf28d7
Author: James Contini jamescontini@gmail.com
Date: Sun Oct 12 13:32:45 2025 -0700
All operators (inlcluding xielu) working
commit 74c6add
Author: James Contini jamescontini@gmail.com
Date: Fri Oct 10 13:16:48 2025 -0700
fixed autoconfig
commit 3627499
Author: James Contini jamescontini@gmail.com
Date: Fri Oct 10 13:10:46 2025 -0700
removed vestigial files
commit cb08583
Author: James Contini jamescontini@gmail.com
Date: Fri Oct 10 12:59:32 2025 -0700
abides by editor-config
commit 5360e28
Author: James Contini jamescontini@gmail.com
Date: Fri Oct 10 12:45:57 2025 -0700
rms_norm double declaration bug atoned
commit 7b09baa
Merge: 8a6ec84 74b8fc1
Author: James Contini jamescontini@gmail.com
Date: Fri Oct 10 11:50:03 2025 -0700
resolving merge conflicts
commit 8a6ec84
Author: James Contini jamescontini@gmail.com
Date: Wed Oct 8 18:06:47 2025 -0700
unary operators pass ggml tests
commit c3ae382
Author: James Contini jamescontini@gmail.com
Date: Wed Oct 1 16:22:40 2025 -0700
neg passes backend test
commit aa1c9b2
Author: James Contini jamescontini@gmail.com
Date: Tue Sep 30 23:55:27 2025 -0700
neg f16xf32xip builds and runs, havent actually ran a model that uses neg kernel yet though
Co-authored-by: James Contini jamescontini@gmail.com
Co-authored-by: Neha Abbas neabbas@ucsc.edu
Co-authored-by: Abhijit Ramesh abhijitramesh2k@gmail.com
-
Remove extra code and format
-
Add ops documentation (finally)
-
ggml webgpu: add SOFTPLUS unary operator
Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32
precision for intermediate calculations to prevent f16 overflow.
-
Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
-
Register pipelines and device support
-
Follow Vulkan backend numerical stability pattern
-
ggml webgpu: add EXPM1 unary operator
Implements EXPM1 (exp(x) - 1) with f16/f32 support.
-
Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
-
Register pipelines and device support
-
ggml webgpu: add FLOOR unary operator
Implements FLOOR (rounds down to nearest integer) with f16/f32 support.
-
Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
-
Register pipelines and device support
-
ggml webgpu: add CEIL unary operator
Implements CEIL (rounds up to nearest integer) with f16/f32 support.
-
Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
-
Register pipelines and device support
-
ggml webgpu: add ROUND unary operator
Implements ROUND (rounds to nearest integer) with f16/f32 support.
-
Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
-
Register pipelines and device support
-
ggml webgpu: add TRUNC unary operator
Implements TRUNC (truncates towards zero) with f16/f32 support.
-
Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
-
Register pipelines and device support
-
docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND, TRUNC, EXPM1, SOFTPLUS)
-
Updates to webgpu get_memory
-
Move shared state (webgpu_context) and device creation out of registration context, device context, and buffer context, and move into backend context
-
Small cleanup
-
Move Instance, Device, Adapter, Device creation, and capabilities to global state while moving Queue, pipelines, and buffers to per-thread state.
-
Cleanups
-
More cleanup
-
Move staging_buf mutex to global context
-
Resolve merge
-
Resolve merge
-
Resolve merge
-
Clean up merge errors, delete forward declaration, and run clang-format
-
Rename device_init to backend_init
-
Move webgpu_context to backend_context
-
Move buffer context members into global context and refactor function calls
-
Run clang-format
-
Remove commends
-
Move parameter buffers to per-thread, add single memset_tensor param buf
-
Fix CI compilation issue
-
Fix builds for emscripten not supporting subgroups
-
cleanup
-
cleanup
Co-authored-by: Reese Levine reeselevine1@gmail.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: