github ggml-org/llama.cpp b7548

latest release: b7549
14 hours ago
Details

vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader (#18349)

  • vulkan: Use BK=32 for coopmat2 mul_mat_id

  • vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader

Disable robustness, remove the OOB check in decodeFuncB, and initialize the
row_ids to zero to avoid OOB access.

Don't slice/offset the B matrix to ic * BN, only to adjust the coord back down
to the range [0, BN) in decodeFuncB. Instead just slice with a row offset of
zero and remove the '& (BN - 1)'. This allows the compiler to common some of
the shared memory loads.

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.