github ggml-org/llama.cpp b8873

latest releases: b8881, b8880, b8878...
4 hours ago
Details

openvino: driver setup, CI split, thread safety, and NPU optimizations (#21944)

  • Thread safety per request only

  • Fix ROPE yarn case

  • Fix sticky stateful config

  • Use i4/i8 directly for symmetric quant

  • Use weightless caching

  • Add WeightlessCacheAttribute to reduce NPU memory usage

  • Gelu tanh support (#125)

  • Imrope support (#126)

  • fix(openvino): explicit ov::Tensor frees in ggml_backend_openvino_free

  • add GPU,NPU support in OV Dockerfile

  • add build-openvino.yml ci

  • Fix sticky stateful config

  • add concurrency to ov-gpu ci runs. Move OV CI to build-openvino.yml

  • fix thread-safety of shared runtime context

  • rope type abstraction for frontend translations

  • fix editorconfig


Co-authored-by: Mustafa Cavus mustafa.cavus@intel.com
Co-authored-by: Dan Hoffman dhoff749@gmail.com
Co-authored-by: Ravi Panchumarthy ravi.panchumarthy@intel.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.