github ggml-org/llama.cpp b8891

latest release: b8892
3 hours ago
Details

ggml-webgpu: Add fused RMS_NORM + MUL (#21983)

  • fused rms_norm_mul + mul

  • Add GGML_WEBGPU_DISABLE_FUSION for being able to disable kernel fusion.

  • Decouple num_fused_ops from webgpu_context; misc cleanup

  • Fix eps handling and remove disable_fusion.

  • Fix not to use c++20 initializers.

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.