Details
hexagon: support for IQ4_NL and MXFP4 (#21018)
- ggml-hexagon: add IQ4_NL and MXFP4 HMX matmul support
- Add IQ4_NL quantization type support to Hexagon backend (buffer
set/get tensor repack, mul_mat, mul_mat_id dispatch) - Implement HVX IQ4_NL vec_dot kernels (1x1, 2x1, 2x2) with
LUT-based 4-bit index to int8 kvalue dequantization - Add MXFP4 HMX dequantization path with E8M0 scale conversion,
including batch-4 fast path and single-tile fallback - Unify quantized row size / scale offset logic to handle Q4_0,
Q8_0, IQ4_NL, and MXFP4 in the DMA fetch path
-
ggml-hexagon: fix SKIP_QUANTIZE src1 address mismatch in mixed-quant models
-
Fix the pragma indent
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: