github ggml-org/llama.cpp b8811

latest releases: b8816, b8815, b8814...
7 hours ago
Details

ggml-webgpu: compute pass batching and removing profiling overhead (#21873)

  • Update register tiling matmul to use f32 accumulation

  • fix profiling code

  • Fix register tiling matmul for chrome, i'm blaming dawn

  • Update batch tuning value for iOS

  • compile fix

  • Fix use of new load function

  • Move to a single query set for GPU profiling

  • Move to batching compute passes when not profiling

  • Refactor build_multi

  • remove iOS throttling now that we're batching compute passes

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.