ggml-org/llama.cpp b7247 on GitHub

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

ggml webgpu: add support for emscripten builds (#17184)

Faster tensors (#8)

Add fast matrix and matrix/vector multiplication.

Use map for shader replacements instead of pair of strings
Wasm (#9)
webgpu : fix build on emscripten
more debugging stuff
test-backend-ops: force single thread on wasm
fix single-thread case for init_tensor_uniform
use jspi
add pthread
test: remember to set n_thread for cpu backend
Add buffer label and enable dawn-specific toggles to turn off some checks
Intermediate state
Fast working f16/f32 vec4
Working float fast mul mat
Clean up naming of mul_mat to match logical model, start work on q mul_mat
Setup for subgroup matrix mat mul
Basic working subgroup matrix
Working subgroup matrix tiling
Handle weirder sg matrix sizes (but still % sg matrix size)
Working start to gemv
working f16 accumulation with shared memory staging
Print out available subgroup matrix configurations
Vectorize dst stores for sg matrix shader
Gemv working scalar
Minor set_rows optimization (#4)
updated optimization, fixed errors
non vectorized version now dispatches one thread per element
Simplify
Change logic for set_rows pipelines

Co-authored-by: Neha Abbas nehaabbas@macbookpro.lan
Co-authored-by: Neha Abbas nehaabbas@ReeseLevines-MacBook-Pro.local
Co-authored-by: Reese Levine reeselevine1@gmail.com

Comment on dawn toggles
Working subgroup matrix code for (semi)generic sizes
Remove some comments
Cleanup code
Update dawn version and move to portable subgroup size
Try to fix new dawn release
Update subgroup size comment
Only check for subgroup matrix configs if they are supported
Add toggles for subgroup matrix/f16 support on nvidia+vulkan
Make row/col naming consistent
Refactor shared memory loading
Move sg matrix stores to correct file
Working q4_0
Formatting
Work with emscripten builds
Fix test-backend-ops emscripten for f16/quantized types
Use emscripten memory64 to support get_memory
Add build flags and try ci

Co-authored-by: Xuan Son Nguyen son@huggingface.co

Remove extra whitespace
Move wasm single-thread logic out of test-backend-ops for cpu backend
Disable multiple threads for emscripten single-thread builds in ggml_graph_plan
Fix .gitignore
Add memory64 option and remove unneeded macros for setting threads to 1

Co-authored-by: Xuan Son Nguyen son@huggingface.co

macOS/iOS:

Linux:

Windows:

openEuler: