ggml-org/llama.cpp b8053
on GitHub

latest release: b8054

2 hours ago

Details

models : optimize qwen3next graph (#19375)

models : optimizing qwen3next graph
cont
wip
wip
wip
wip
wip
wip
wip
wip
wip
wip
cont : remove redundant q, g chunking
minor
minor
avoid passing masks around
avoid concats during chunking
naming + shapes
update names and use prefix to disable CUDA graphs

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b8053

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications