github ggml-org/llama.cpp b8053

latest release: b8054
2 hours ago
Details

models : optimize qwen3next graph (#19375)

  • models : optimizing qwen3next graph

  • cont

  • wip

  • wip

  • wip

  • wip

  • wip

  • wip

  • wip

  • wip

  • wip

  • wip

  • cont : remove redundant q, g chunking

  • minor

  • minor

  • avoid passing masks around

  • avoid concats during chunking

  • naming + shapes

  • update names and use prefix to disable CUDA graphs

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.