ggml-org/llama.cpp b9291
on GitHub

latest releases: b9951, b9950, b9949...

one month ago

Details

SYCL: improve MoE prefill throughput (#23142)

change k_copy_src1_to_contiguous so that uses a precomputed contiguous mapping where all rows "owned" by an expert are in one slice with a know starts and ends
switch the O(n_as * n_routed_rows) contraption to a counting sort-based procedure with O(n_as + n_routed_rows) complexity

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b9291

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications