github ggml-org/llama.cpp b9411

latest release: b9412
one hour ago
Details

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346)

  • llama : support DeepSeek V3.2 model family (with DSA lightning indexer)

  • convert : handle DeepseekV32ForCausalLM architecture

  • ggml : support for f16 GGML_OP_FILL

  • memory : separate hparams argument in llama_kv_cache constructor

  • memory : add llama_kv_cache_dsa memory (KV cache + lightning indexer cache)

  • llama : support for LLM_ARCH_DEEPSEEK32

  • model : llama_model_deepseek32 implementation

  • model : merge two scale operations into one in DSA lightning indexer implementation

  • chore : remove unused code

  • model : support NVFP4 in DeepSeek V3.2

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

  • memory : refactoring TODO

Co-authored-by: ggerganov ggerganov@users.noreply.github.com


Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
Co-authored-by: ggerganov ggerganov@users.noreply.github.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.