ggml-org/llama.cpp b8278 on GitHub

Details

llama-quant : correct n_attention_wv usage (#20357)

llama-quant : correct n_attention_wv usage

In #19770, I introduced a regression in the way the
quantize_state_impl counter values were initialized. I was
incrementing and using n_attention_wv in the same loop, when it should
have been fixed by the time we're deciding tensor types in
llama_tensor_get_type_impl (for use_more_bits).

I never observed a difference in any of my
tests