github ggml-org/llama.cpp b8882

3 hours ago
Details

ggml-webgpu(shader): support conv2d kernels. (#21964)

  • ggml(webgpu): fix the busy-polls in Emscripten in the waitAny after #20618, and remove the busy webgpu log

  • Merge with upstream

  • Fix GET_ROWS packed integer NaN when using f16 as memory buffer in shader quants

  • Update Unary wgsl EXP and EXPM1 for f16 stability

  • Fix GET_ROWS IQ4_XS strcut for NaN f16 canonicalization

  • Fix numerical percision for unary sqrt when working with f16

  • Fix NaN canonicalization for packed integers using f16

  • Update err threshold for binary div ops when using f16

  • backend: Keep one Dawn/WebGPU instance alive for the lifetime of the static backend

  • clean: uncomment existing code logs

  • clean: clean the unncessary debug info

  • Refactor and generalize dequant helpers

  • Remove deprecated quant structs

  • Refactor shader defines to reduce repetition

  • Remove error override for F16 type

  • fix: fix the accidential removal of the proper initialization of ctx

  • clean: clean legacy and format code

  • fix: did not modify tests ops

  • shader(conv2d): add conv2d shader kernels and pass f32 and f16 tests

  • shader(conv2d): fix the out of bounds memory access in the weight indexing

  • shader(conv2d): clean unused variables and optimize the computation

  • merge: use the new entries function

  • clean: address the formatting issues

  • clean: address the warning issues

  • clear: clean the shader editorconfig-checker issues

  • clear: clean the shader editorconfig-checker with utf-8


Co-authored-by: Jeremy J. Hartmann jeremy@mtion.tv

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.