github ggml-org/llama.cpp b8749

latest releases: b8760, b8759, b8757...
15 hours ago
Details

ggml-webgpu: address quantization precision and backend lifecycle managment (#21521)

  • ggml(webgpu): fix the busy-polls in Emscripten in the waitAny after #20618, and remove the busy webgpu log

  • Merge with upstream

  • Fix GET_ROWS packed integer NaN when using f16 as memory buffer in shader quants

  • Update Unary wgsl EXP and EXPM1 for f16 stability

  • Fix GET_ROWS IQ4_XS strcut for NaN f16 canonicalization

  • Fix numerical percision for unary sqrt when working with f16

  • Fix NaN canonicalization for packed integers using f16

  • Update err threshold for binary div ops when using f16

  • backend: Keep one Dawn/WebGPU instance alive for the lifetime of the static backend

  • clean: uncomment existing code logs

  • clean: clean the unncessary debug info

  • Refactor and generalize dequant helpers

  • Remove deprecated quant structs

  • Refactor shader defines to reduce repetition

  • Remove error override for F16 type

  • fix: fix the accidential removal of the proper initialization of ctx

  • clean: clean legacy and format code

  • fix: did not modify tests ops


Co-authored-by: Jeremy J. Hartmann jeremy@mtion.tv

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.