ggml-org/llama.cpp b7432
on GitHub

latest releases: b9871, b9870, b9867...

6 months ago

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

Details

Optimization: Qwen3 next autoregressive pass (#17996)

It's Qwen3 Next, the lean mean token generation machine!
Apply patches from thread
Remove recurrent version, only keep chunked and autoregressive
Remove unnecessary conts and asserts
Remove more extra conts and asserts
Cleanup masking

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b7432

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications