ggml-org/llama.cpp b8724 on GitHub

Details

sycl : add flash-attn support for head size 512 (#21654)

This patch extends the SYCL Flash Attention implementation to support head sizes (DKQ/DV) of 512.

Changes:

Added DKQ/DV 512 cases to both tile and vector Flash Attention kernels.
Updated kernel selection logic to allow vector kernels for head sizes up to 512 (previously 256).
Removed unused/redundant AMD and RDNA-specific configuration functions in fattn-tile.hpp.
Refactored ggml_backend_sycl_buffer_init_tensor to use a switch statement for clearer tensor extra buffer initialization.
Added necessary template instances for the new 512 head size across various quantization types.

macOS/iOS:

Linux:

Windows:

openEuler: