github ggml-org/llama.cpp b7819

5 hours ago
Details

graph : utilize ggml_build_forward_select() to avoid reallocations (#18898)

  • graph : avoid branches between embedding and token inputs

  • models : make deepstack graphs (e.g. Qwen3 VL) have constant topology

  • ci : enable -DGGML_SCHED_NO_REALLOC=ON for server CI

  • cont : pad token embeddings to n_embd_inp

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.