ggml-org/llama.cpp b7864
on GitHub

latest releases: b8390, b8389, b8388...

one month ago

Details

spec : add self‑speculative decoding (no draft model required) + refactor (#18471)

server: introduce self-speculative decoding
server: moved self-call into speculative.cpp
can_speculate() includes self-speculation

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

server: can_speculate() tests self-spec
server: replace can_speculate() with slot.can_speculate()

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

common: use %zu format specifier for size_t in logging

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

server: can_speculate() requires a task instance
common: ngram map, config self-speculative decoding
common: add enum common_speculative_type
common: add vector of speculative states
common: add option --spec-draftless
server: cleanup (remove slot.batch_spec, rename)
common: moved self-spec impl to ngram-map
common: cleanup (use common_speculative_state_draft)
spec : refactor
cont : naming
spec: remove --spec-config
doc: (draftless) speculative decoding
common: print performance in spec decoding
minor : cleanup
common : better names
minor : cleanup + fix build
minor: comments
CODEOWNERS: add common/ngram-map.* (#18471)
common : rename speculative.draftless_type -> speculative.type
ngram-map : fix uninitialized values
ngram-map : take into account the input can become shorter
ngram-map : revert len check for now
arg : change --spec-draftless -> --spec-type
spec : add common_speculative_state::accept()
spec : refactor + add common_speculative_begin()
spec : fix begin() call with mtmd
spec : additional refactor + remove common_speculative_params

Co-authored-by: Georgi Gerganov ggerganov@gmail.com
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b7864

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications