ggml-org/llama.cpp b8833
on GitHub

3 hours ago

Details

ggml-webgpu: fix compiler warnings and refactor FlashAttention encoding (#21052)

Update workflows to remove dependence on llvmpipe
Try setting Dawn_DIR
remove c++20 initializers
Move to proper guid
Try avoiding segfaults on vulkan backend process exit
Remove compiler warnings on parameter casting
Fix soft_max and update reg_tile accumulation to f32 for better precision
Refactor flash_attn a bit
remove c++20 initializers and format
Increase div precision for NVIDIA
revert div precision and comment out ggml-ci node for now
Formatting
Try debugging on a failing CI node
Revert "Try debugging on a failing CI node"

This reverts commit 1971e33.

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b8833

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications