ggml-org/llama.cpp b9745
on GitHub

2 hours ago

Details

spec : Support Step3.5/3.7 flash mtp3 (#24340)

add mtp_layer_offset + include nextn flags in graph reuse
add llama_set_mtp_layer_offset + llama_model_n_nextn_layer API
offset head select + require all MTP blocks
speculative multi-head process()
speculative multi-head draft()
gather outputs via inp_out_ids
cleanup
fix core
minor cleanup
merged draft_multi_head into draft()
mtp rename nextn
Apply suggestions from code review

Co-authored-by: Aman Gupta amangupta052@gmail.com

clean-up comments
fix for multi seq
apply suggestions && chain-heads comment
add a reference for chain_heads discussion

Co-authored-by: Aman Gupta amangupta052@gmail.com

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

Check out latest releases or
releases around ggml-org/llama.cpp b9745

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications