ggml-org/llama.cpp b8189 on GitHub

Details

ggml webgpu: Clean up per-thread parameter buffer pool and job submission logic (#19772)

Allow webgpu_buf_pool to resize if needed, remove inflight_threads, and replace inflight_threads with num_kernels for submission
Run clang-format
Keep track of num batched kernels that have not been submitted yet
Run clang-format
Increase buf pool max size
Increase param buf pool init size
Remove webgpu buf pool resizing
Merge with master
Add buffer pool growth
Move buffer pool growth outside of lock
Reduce max pool size to 32
Run clang-format
Only resize param buf pool

macOS/iOS:

Linux:

Windows:

openEuler: