Details
graph : utilize ggml_build_forward_select() to avoid reallocations (#18898)
-
graph : avoid branches between embedding and token inputs
-
models : make deepstack graphs (e.g. Qwen3 VL) have constant topology
-
ci : enable -DGGML_SCHED_NO_REALLOC=ON for server CI
-
cont : pad token embeddings to n_embd_inp
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: