github ggml-org/llama.cpp b9088

latest releases: b9090, b9089
4 hours ago
Details

[SYCL] Add BF16 support to GET_ROWS operation (#21391)

Add GGML_TYPE_BF16 to the SYCL backend's GET_ROWS operation, both in
supports_op and in the kernel dispatch. This fixes a performance
regression where models using BF16 embedding tensors (e.g., Gemma4's
per_layer_token_embd.weight) fall back to CPU for the GET_ROWS op,
causing a full GPU-to-CPU tensor transfer every token.

The fix reuses the existing get_rows_sycl_float template with
sycl::ext::oneapi::bfloat16, matching the pattern already used for
sycl::half (F16) and float (F32).

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.