ggml-org/llama.cpp b7819
on GitHub

latest releases: b8327, b8325, b8323...

one month ago

Details

graph : utilize ggml_build_forward_select() to avoid reallocations (#18898)

graph : avoid branches between embedding and token inputs
models : make deepstack graphs (e.g. Qwen3 VL) have constant topology
ci : enable -DGGML_SCHED_NO_REALLOC=ON for server CI
cont : pad token embeddings to n_embd_inp

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b7819

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications