ggml-org/llama.cpp b7703
on GitHub

latest releases: b8182, b8181, b8180...

one month ago

Details

model: try to improve Qwen3 Next (#18683)

qwen3next: simplify qkvz projection
use ggml_swiglu_split
revert swiglu_split, but remove redundant repeat()
fix missing reshape
rm 2 redundant transposes
move mul_mat(k,q) to outside of chunking
rm redundant cont
improve g_cs_chunk
add comments about no cont
use std::pair instead of ggml_concat
vectorize key_gdiff calculation
rm unused tensor
avoid ggml_concat inside loop
bring back ggml_concat as it may not work on other backend
nits

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b7703

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications