github ggml-org/llama.cpp b7703

latest releases: b7705, b7704
7 hours ago
Details

model: try to improve Qwen3 Next (#18683)

  • qwen3next: simplify qkvz projection

  • use ggml_swiglu_split

  • revert swiglu_split, but remove redundant repeat()

  • fix missing reshape

  • rm 2 redundant transposes

  • move mul_mat(k,q) to outside of chunking

  • rm redundant cont

  • improve g_cs_chunk

  • add comments about no cont

  • use std::pair instead of ggml_concat

  • vectorize key_gdiff calculation

  • rm unused tensor

  • avoid ggml_concat inside loop

  • bring back ggml_concat as it may not work on other backend

  • nits

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.