github ggml-org/llama.cpp b8189

2 hours ago
Details

ggml webgpu: Clean up per-thread parameter buffer pool and job submission logic (#19772)

  • Allow webgpu_buf_pool to resize if needed, remove inflight_threads, and replace inflight_threads with num_kernels for submission

  • Run clang-format

  • Keep track of num batched kernels that have not been submitted yet

  • Run clang-format

  • Increase buf pool max size

  • Increase param buf pool init size

  • Remove webgpu buf pool resizing

  • Merge with master

  • Add buffer pool growth

  • Move buffer pool growth outside of lock

  • Reduce max pool size to 32

  • Run clang-format

  • Only resize param buf pool

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.