Details
openvino: driver setup, CI split, thread safety, and NPU optimizations (#21944)
-
Thread safety per request only
-
Fix ROPE yarn case
-
Fix sticky stateful config
-
Use i4/i8 directly for symmetric quant
-
Use weightless caching
-
Add WeightlessCacheAttribute to reduce NPU memory usage
-
Gelu tanh support (#125)
-
Imrope support (#126)
-
fix(openvino): explicit ov::Tensor frees in ggml_backend_openvino_free
-
add GPU,NPU support in OV Dockerfile
-
add build-openvino.yml ci
-
Fix sticky stateful config
-
add concurrency to ov-gpu ci runs. Move OV CI to build-openvino.yml
-
fix thread-safety of shared runtime context
-
rope type abstraction for frontend translations
-
fix editorconfig
Co-authored-by: Mustafa Cavus mustafa.cavus@intel.com
Co-authored-by: Dan Hoffman dhoff749@gmail.com
Co-authored-by: Ravi Panchumarthy ravi.panchumarthy@intel.com
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled)
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu s390x (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Android:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: