github ggml-org/llama.cpp b7247

3 hours ago

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

ggml webgpu: add support for emscripten builds (#17184)

  • Faster tensors (#8)

Add fast matrix and matrix/vector multiplication.

  • Use map for shader replacements instead of pair of strings

  • Wasm (#9)

  • webgpu : fix build on emscripten

  • more debugging stuff

  • test-backend-ops: force single thread on wasm

  • fix single-thread case for init_tensor_uniform

  • use jspi

  • add pthread

  • test: remember to set n_thread for cpu backend

  • Add buffer label and enable dawn-specific toggles to turn off some checks

  • Intermediate state

  • Fast working f16/f32 vec4

  • Working float fast mul mat

  • Clean up naming of mul_mat to match logical model, start work on q mul_mat

  • Setup for subgroup matrix mat mul

  • Basic working subgroup matrix

  • Working subgroup matrix tiling

  • Handle weirder sg matrix sizes (but still % sg matrix size)

  • Working start to gemv

  • working f16 accumulation with shared memory staging

  • Print out available subgroup matrix configurations

  • Vectorize dst stores for sg matrix shader

  • Gemv working scalar

  • Minor set_rows optimization (#4)

  • updated optimization, fixed errors

  • non vectorized version now dispatches one thread per element

  • Simplify

  • Change logic for set_rows pipelines


Co-authored-by: Neha Abbas nehaabbas@macbookpro.lan
Co-authored-by: Neha Abbas nehaabbas@ReeseLevines-MacBook-Pro.local
Co-authored-by: Reese Levine reeselevine1@gmail.com

  • Comment on dawn toggles

  • Working subgroup matrix code for (semi)generic sizes

  • Remove some comments

  • Cleanup code

  • Update dawn version and move to portable subgroup size

  • Try to fix new dawn release

  • Update subgroup size comment

  • Only check for subgroup matrix configs if they are supported

  • Add toggles for subgroup matrix/f16 support on nvidia+vulkan

  • Make row/col naming consistent

  • Refactor shared memory loading

  • Move sg matrix stores to correct file

  • Working q4_0

  • Formatting

  • Work with emscripten builds

  • Fix test-backend-ops emscripten for f16/quantized types

  • Use emscripten memory64 to support get_memory

  • Add build flags and try ci


Co-authored-by: Xuan Son Nguyen son@huggingface.co

  • Remove extra whitespace

  • Move wasm single-thread logic out of test-backend-ops for cpu backend

  • Disable multiple threads for emscripten single-thread builds in ggml_graph_plan

  • Fix .gitignore

  • Add memory64 option and remove unneeded macros for setting threads to 1


Co-authored-by: Xuan Son Nguyen son@huggingface.co

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.