github ggml-org/llama.cpp b8885

latest releases: b8887, b8886
2 hours ago
Details

mtmd, llama : Update HunyuanVL vision-language model support (#22037)

  • mtmd, llama : add HunyuanVL vision-language model support
  • add LLM_ARCH_HUNYUAN_VL with M-RoPE (XD-RoPE) support
  • add PROJECTOR_TYPE_HUNYUANVL with PatchMerger vision encoder
  • add HunyuanVL-specific M-RoPE position encoding for image tokens
  • add GGUF conversion for HunyuanVL vision and text models
  • add smoke test in tools/mtmd/tests.sh
  • fix: fix HunyuanVL XD-RoPE h/w section order

  • fix: Remove redundant code

  • convert : fix HunyuanOCR / HunyuanVL conversion

  • Tested locally: both HunyuanOCR and HunyuanVL-4B convert to GGUF
  • successfully and produce correct inference output on Metal (F16 / Q8_0).
  • clip : fix -Werror=misleading-indentation in bilinear resize

  • fix CI: convert_hf_to_gguf type check error

  • convert_hf_to_gguf.py: give HunyuanVLTextModel.init an explicit dir_model: Path parameter so ty can infer the type for load_hparams instead of reporting Unknown | None.

Co-authored-by: wendadawen wendadawen@tencent.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.