Details
model: try to improve Qwen3 Next (#18683)
-
qwen3next: simplify qkvz projection
-
use ggml_swiglu_split
-
revert swiglu_split, but remove redundant repeat()
-
fix missing reshape
-
rm 2 redundant transposes
-
move mul_mat(k,q) to outside of chunking
-
rm redundant cont
-
improve g_cs_chunk
-
add comments about no cont
-
use std::pair instead of ggml_concat
-
vectorize key_gdiff calculation
-
rm unused tensor
-
avoid ggml_concat inside loop
-
bring back ggml_concat as it may not work on other backend
-
nits
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: