ggml-org/llama.cpp b9235
on GitHub

latest releases: b9874, b9873, b9871...

one month ago

Details

llama : MTP clean-up (#23269)

llama : disable equal splits for recurrent memory with partial rollback
spec : re-enable p-min with MTP drafts
spec : re-enable ngram spec in combination with RS rollback
spec : fix ngram-map-* params
spec : fix acceptance logic in combined ngram + draft configs
graph : fix reuse for combined token + embd batches
spec : log parameters for each speculative implementation

add LOG_INF in each constructor with implementation type and parameters
extract device string logic into common_speculative_get_devices_str()
move 'adding speculative implementation' log from init into constructors

Assisted-by: llama.cpp:local pi

spec : extend --spec-default with ngram-map-k4v

Assisted-by: llama.cpp:local pi

minor : fix n_embd log
args : update draft.n_max == 3 + regen docs
spec : relax ngram-mod rejection thold to 0.25 @ 5 low
logs : improve
docs : update speculative decoding CLI argument documentation

Add missing draft model CPU scheduling and tensor override parameters
Update --spec-type to include all available types (excluding draft-eagle3 WIP)
Fix default values to match implementation (n_max=3, n_min=0, p_min=0.0)
Remove deprecated options (spec-draft-ctx-size, spec-draft-replace)
Add environment variables for new parameters

Assisted-by: llama.cpp:local pi

arg : step-back on adding k4v to the default spec config
cont : fix name

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b9235

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications