github ggml-org/llama.cpp b7432

latest releases: b7445, b7444, b7442...
18 hours ago

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

Details

Optimization: Qwen3 next autoregressive pass (#17996)

  • It's Qwen3 Next, the lean mean token generation machine!

  • Apply patches from thread

  • Remove recurrent version, only keep chunked and autoregressive

  • Remove unnecessary conts and asserts

  • Remove more extra conts and asserts

  • Cleanup masking

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.