github ggml-org/llama.cpp b7962

latest release: b7964
2 hours ago
Details

ggml-webgpu: JIT compile binary operators and handle binding overlaps (#19310)

  • ggml webgpu: port binary operators to use pre-wgsl

  • Add binary.wgsl: unified shader with conditionals for all 4 ops

  • Add gen_binary_shaders.cpp: build tool for using pre_wgsl preprocessor

  • Remove bin_op.tmpl.wgsl and binary.wgsl (Python template)

  • Update CMake to generate binary operator shaders at build time

  • ggml-webgpu: migrate binary ops to JIT compilation with overlap handling

  • port binary operators from AOT to pre-wgsl JIT compilation

  • add src1=dst overlap handling for binary ops

  • use compile-time workgroup size defines instead of runtime overrides

  • ggml-webgpu: complete overlap handling for binary ops

  • add support for inplace & overlap case in binding setup

  • restructure conditional logic to handle all overlap cases

  • ensure all buffer bindings are correctly assigned for edge cases

  • ggml-webgpu: remove unused binary overlap cases

Remove src0==src1 binary overlap case that never occurs in practice.

  • keep INPLACE (src0==dst), OVERLAP (src1==dst), DEFAULT

  • remove unused src0==src1 and all-same variant

  • refactor wgsl to eliminate duplication

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.