ggml-org/llama.cpp b8557
on GitHub

latest releases: b9894, b9893, b9892...

3 months ago

Details

hexagon: support for IQ4_NL and MXFP4 (#21018)

ggml-hexagon: add IQ4_NL and MXFP4 HMX matmul support

Add IQ4_NL quantization type support to Hexagon backend (buffer
set/get tensor repack, mul_mat, mul_mat_id dispatch)
Implement HVX IQ4_NL vec_dot kernels (1x1, 2x1, 2x2) with
LUT-based 4-bit index to int8 kvalue dequantization
Add MXFP4 HMX dequantization path with E8M0 scale conversion,
including batch-4 fast path and single-tile fallback
Unify quantized row size / scale offset logic to handle Q4_0,
Q8_0, IQ4_NL, and MXFP4 in the DMA fetch path

ggml-hexagon: fix SKIP_QUANTIZE src1 address mismatch in mixed-quant models
Fix the pragma indent

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b8557

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications