github ggml-org/llama.cpp b7864

latest release: b7865
2 hours ago
Details

spec : add self‑speculative decoding (no draft model required) + refactor (#18471)

  • server: introduce self-speculative decoding

  • server: moved self-call into speculative.cpp

  • can_speculate() includes self-speculation

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

  • server: can_speculate() tests self-spec

  • server: replace can_speculate() with slot.can_speculate()

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

  • common: use %zu format specifier for size_t in logging

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

  • server: can_speculate() requires a task instance

  • common: ngram map, config self-speculative decoding

  • common: add enum common_speculative_type

  • common: add vector of speculative states

  • common: add option --spec-draftless

  • server: cleanup (remove slot.batch_spec, rename)

  • common: moved self-spec impl to ngram-map

  • common: cleanup (use common_speculative_state_draft)

  • spec : refactor

  • cont : naming

  • spec: remove --spec-config

  • doc: (draftless) speculative decoding

  • common: print performance in spec decoding

  • minor : cleanup

  • common : better names

  • minor : cleanup + fix build

  • minor: comments

  • CODEOWNERS: add common/ngram-map.* (#18471)

  • common : rename speculative.draftless_type -> speculative.type

  • ngram-map : fix uninitialized values

  • ngram-map : take into account the input can become shorter

  • ngram-map : revert len check for now

  • arg : change --spec-draftless -> --spec-type

  • spec : add common_speculative_state::accept()

  • spec : refactor + add common_speculative_begin()

  • spec : fix begin() call with mtmd

  • spec : additional refactor + remove common_speculative_params


Co-authored-by: Georgi Gerganov ggerganov@gmail.com
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.