ggml-org/llama.cpp b9411 on GitHub

Details

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346)

llama : support DeepSeek V3.2 model family (with DSA lightning indexer)
convert : handle DeepseekV32ForCausalLM architecture
ggml : support for f16 GGML_OP_FILL
memory : separate hparams argument in llama_kv_cache constructor
memory : add llama_kv_cache_dsa memory (KV cache + lightning indexer cache)
llama : support for LLM_ARCH_DEEPSEEK32
model : llama_model_deepseek32 implementation
model : merge two scale operations into one in DSA lightning indexer implementation
chore : remove unused code
model : support NVFP4 in DeepSeek V3.2

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
Co-authored-by: ggerganov ggerganov@users.noreply.github.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

UI: