ggml-org/llama.cpp b8082
on GitHub

latest release: b8083

3 hours ago

Details

cuda : enable CUDA graphs for MMID 1 <= BS <= 4 (#19645)

cuda : enable CUDA graphs for MMID BS <= 4
cont : add stream capture check

Co-authored-by: Oliver Simons osimons@nvidia.com

cont : add MMVQ_MMID_MAX_BATCH_SIZE

Co-authored-by: Oliver Simons osimons@nvidia.com

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b8082

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications