github ggml-org/llama.cpp b8278

latest release: b8279
3 hours ago
Details

llama-quant : correct n_attention_wv usage (#20357)

  • llama-quant : correct n_attention_wv usage

In #19770, I introduced a regression in the way the
quantize_state_impl counter values were initialized. I was
incrementing and using n_attention_wv in the same loop, when it should
have been fixed by the time we're deciding tensor types in
llama_tensor_get_type_impl (for use_more_bits).

I never observed a difference in any of my
tests

  • it was only after @bartowski kindly pointed this out that I realized
    it was incorrect. (Thanks!)
  • simplify

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.