github ggml-org/llama.cpp b8091

one hour ago
Details

ggml webgpu: shader library organization (#19530)

  • Basic JIT compilation for mul_mat, get_rows, and scale (#17)

  • scale jit working

  • preliminary working jit for getrows and mulmat, needs refining

  • simplified mul_mat preprocessing switch statement

  • get_rows fixes, mul_mat refinement

  • formatted + last edits

  • removed some extraneous prints

  • fixed get_rows, fixed workgroup dispatch in mul_mat. no gibberish

  • small fix

  • some changes, working

  • get_rows and mul_mat jit fixed and working

  • Update formatting

  • formatting

  • Add header


Co-authored-by: Neha Abbas nehaabbas@ReeseLevines-MacBook-Pro.local
Co-authored-by: Reese Levine reeselevine1@gmail.com

  • Start work on all-encompassing shader library

  • refactor argmax, set_rows

  • Refactor all but flashattention, mat mul

  • flashattention and matrix multiplication moved to new format

  • clean up preprocessing

  • Formatting

  • remove duplicate constants

  • Split large shaders into multiple static strings


Co-authored-by: neha-ha 137219201+neha-ha@users.noreply.github.com

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.