ggml-org/llama.cpp b8882 on GitHub

Details

ggml-webgpu(shader): support conv2d kernels. (#21964)

ggml(webgpu): fix the busy-polls in Emscripten in the waitAny after #20618, and remove the busy webgpu log
Merge with upstream
Fix GET_ROWS packed integer NaN when using f16 as memory buffer in shader quants
Update Unary wgsl EXP and EXPM1 for f16 stability
Fix GET_ROWS IQ4_XS strcut for NaN f16 canonicalization
Fix numerical percision for unary sqrt when working with f16
Fix NaN canonicalization for packed integers using f16
Update err threshold for binary div ops when using f16
backend: Keep one Dawn/WebGPU instance alive for the lifetime of the static backend
clean: uncomment existing code logs
clean: clean the unncessary debug info
Refactor and generalize dequant helpers
Remove deprecated quant structs
Refactor shader defines to reduce repetition
Remove error override for F16 type
fix: fix the accidential removal of the proper initialization of ctx
clean: clean legacy and format code
fix: did not modify tests ops
shader(conv2d): add conv2d shader kernels and pass f32 and f16 tests
shader(conv2d): fix the out of bounds memory access in the weight indexing
shader(conv2d): clean unused variables and optimize the computation
merge: use the new entries function
clean: address the formatting issues
clean: address the warning issues
clear: clean the shader editorconfig-checker issues
clear: clean the shader editorconfig-checker with utf-8

Co-authored-by: Jeremy J. Hartmann jeremy@mtion.tv

macOS/iOS:

Linux:

Android:

Windows:

openEuler: