github ggml-org/llama.cpp b9745

2 hours ago
Details

spec : Support Step3.5/3.7 flash mtp3 (#24340)

  • add mtp_layer_offset + include nextn flags in graph reuse

  • add llama_set_mtp_layer_offset + llama_model_n_nextn_layer API

  • offset head select + require all MTP blocks

  • speculative multi-head process()

  • speculative multi-head draft()

  • gather outputs via inp_out_ids

  • cleanup

  • fix core

  • minor cleanup

  • merged draft_multi_head into draft()

  • mtp rename nextn

  • Apply suggestions from code review

Co-authored-by: Aman Gupta amangupta052@gmail.com

  • clean-up comments

  • fix for multi seq

  • apply suggestions && chain-heads comment

  • add a reference for chain_heads discussion


Co-authored-by: Aman Gupta amangupta052@gmail.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.