github ggml-org/llama.cpp b8833

3 hours ago
Details

ggml-webgpu: fix compiler warnings and refactor FlashAttention encoding (#21052)

  • Update workflows to remove dependence on llvmpipe

  • Try setting Dawn_DIR

  • remove c++20 initializers

  • Move to proper guid

  • Try avoiding segfaults on vulkan backend process exit

  • Remove compiler warnings on parameter casting

  • Fix soft_max and update reg_tile accumulation to f32 for better precision

  • Refactor flash_attn a bit

  • remove c++20 initializers and format

  • Increase div precision for NVIDIA

  • revert div precision and comment out ggml-ci node for now

  • Formatting

  • Try debugging on a failing CI node

  • Revert "Try debugging on a failing CI node"

This reverts commit 1971e33.

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.