ggml-org/llama.cpp b7841
on GitHub

latest releases: b7843, b7842

2 hours ago

Details

opencl: add flattened q6_K mv (#19054)

opencl: flatten q6_K and add kernel_mul_mv_q6_K_f32_flat
opencl: clean up
opencl: refactor q6_K mv - put loop body in block_q_6_K_dot_y_flat
opencl: tweak the workgroup size a bit
opencl: output 4 values per subgroup for kernel_mul_mv_q6_K_f32_flat
opencl: proper alignment for q6_K
opencl: boundary handling for flattened q6_K mv
opencl: rename q6_K mv kernel file
opencl: put flattened q6_K mv in its own file
opencl: use lower k in file name
opencl: use K in variable names

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b7841

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications