ggml-org/llama.cpp b8873
on GitHub

latest releases: b9538, b9537, b9536...

one month ago

Details

openvino: driver setup, CI split, thread safety, and NPU optimizations (#21944)

Thread safety per request only
Fix ROPE yarn case
Fix sticky stateful config
Use i4/i8 directly for symmetric quant
Use weightless caching
Add WeightlessCacheAttribute to reduce NPU memory usage
Gelu tanh support (#125)
Imrope support (#126)
fix(openvino): explicit ov::Tensor frees in ggml_backend_openvino_free
add GPU,NPU support in OV Dockerfile
add build-openvino.yml ci
Fix sticky stateful config
add concurrency to ov-gpu ci runs. Move OV CI to build-openvino.yml
fix thread-safety of shared runtime context
rope type abstraction for frontend translations
fix editorconfig

Co-authored-by: Mustafa Cavus mustafa.cavus@intel.com
Co-authored-by: Dan Hoffman dhoff749@gmail.com
Co-authored-by: Ravi Panchumarthy ravi.panchumarthy@intel.com

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b8873

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications