github ggml-org/llama.cpp b8493

latest release: b8495
3 hours ago
Details

opencl: add q6_K gemm and gemv kernels for Adreno (#20089)

  • opencl: add q6_K noshuffle kernels, initial q6_K gemv, some host code

  • opencl: add q6_K transpose

  • opencl: fix cvt kernel name

  • opencl: add call to q6_K gemv

  • opencl: fix q6_K scale transpose

  • opencl: fix loading for gemv q6_K, refactor

  • opencl: fix transpose_8_buf kernel assignment, refactor

  • opencl: refactor q6_K transpose

  • opencl: add gemm_noshuffle_q6_k_f32

  • opencl: fix qh loading

  • opencl: refactor q6_K gemv host side, release bufs and imgs

  • opencl: refactor

  • opencl: fix q6_K dequant and scale selection

  • opencl: workaround compiler bug, fix dump_tensor

  • opencl: refactor q6_K convert kernels

  • opencl: unpack transformed q6_K in get_tensor

  • opencl: refactor, handle non-uniform workgroups

  • opencl: support non-vector subgroup bcast

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.