ggml-org/llama.cpp b8929 on GitHub

Details

llama-quant : default ftype param Q5_1 --> Q8_0 (#20828)

Change the default ftype in llama_model_quantize_params from
LLAMA_FTYPE_MOSTLY_Q5_1 to LLAMA_FTYPE_MOSTLY_Q8_0.

In case some external program naively uses the default quantization
params, we should probably default to a known-good type like Q8_0 rather
than Q5_1, which is rather old.

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled)
macOS Intel (x64)
iOS XCFramework

Linux:

Ubuntu x64 (CPU)
Ubuntu arm64 (CPU)
Ubuntu s390x (CPU)
Ubuntu x64 (Vulkan)
Ubuntu arm64 (Vulkan)
Ubuntu x64 (ROCm 7.2)
Ubuntu x64 (OpenVINO)
Ubuntu x64 (SYCL FP32)
Ubuntu x64 (SYCL FP16)

Android:

Android arm64 (CPU)

Windows:

Windows x64 (CPU)
Windows arm64 (CPU)
Windows x64 (CUDA 12) - CUDA 12.4 DLLs
Windows x64 (CUDA 13) - CUDA 13.1 DLLs
Windows x64 (Vulkan)
Windows x64 (SYCL)
Windows x64 (HIP)

openEuler:

openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)