ggml-org/llama.cpp b8276
on GitHub

latest releases: b8987, b8986, b8984...

one month ago

Details

ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (#20173)

K quant speedup (#20)
Basic JIT compilation for mul_mat, get_rows, and scale (#17)
scale jit working
preliminary working jit for getrows and mulmat, needs refining
simplified mul_mat preprocessing switch statement
get_rows fixes, mul_mat refinement
formatted + last edits
removed some extraneous prints
fixed get_rows, fixed workgroup dispatch in mul_mat. no gibberish
small fix
some changes, working
get_rows and mul_mat jit fixed and working
Update formatting
formatting
Add header

Co-authored-by: Neha Abbas nehaabbas@ReeseLevines-MacBook-Pro.local
Co-authored-by: Reese Levine reeselevine1@gmail.com

Start work on all-encompassing shader library
refactor argmax, set_rows
Refactor all but flashattention, mat mul
no gibberish, all k quants added, merged
vec memory fix
q6_k matching metal on my machine, tests passing
Set tile size for q6_k separately
Separate out fast shaders

Co-authored-by: neha-ha 137219201+neha-ha@users.noreply.github.com

Move towards writeBuffer for params
Move away from multiple buffers for set_rows errors, remove host buffer for parameter buffers, minor cleanups
Remove extra file
Formatting

Co-authored-by: neha-ha 137219201+neha-ha@users.noreply.github.com

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b8276

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications