github ggml-org/llama.cpp b8530

latest release: b8531
2 hours ago
Details

mtmd: Add DeepSeekOCR Support (#17400)

  • mtmd: llama.cpp DeepSeekOCR support
    init commit

  • loading sam tensors

  • mtmd: fix vision model processing

  • deepseek-ocr clip-vit model impl

  • mtmd: add DeepSeek-OCR LM support with standard attention

  • mtmd: successfully runs DeepSeek-OCR LM in llama-cli

  • mtmd: Fix RoPE type for DeepSeek-OCR LM.

  • loading LM
    testing Vision model loading

  • sam warmup working

  • sam erroneous return corrected

  • clip-vit: corrected cls_embd concat

  • clip-vit: model convert qkv_proj split

  • corrected combining of image encoders' results

  • fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model

  • concat image_newline and image_seperator tokens

  • visual_model warmup (technically) works

  • window partitioning using standard ggml ops

  • sam implementation without using CPU only ops

  • clip: fixed warnings

  • Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr

  • mtmd: fix get_rel_pos

  • mtmd: fixed the wrong scaler for get_rel_pos

  • image encoding technically works but the output can't be checked singe image decoding fails

  • mtmd: minor changed

  • mtmd: add native resolution support

    • image encoding debugged
  • issues fixed mainly related wrong config like n_patches etc.
  • configs need to be corrected in the converter
  • mtmd: correct token order

    • dynamic resizing
  • mtmd: quick fix token order

  • mtmd: fix danling pointer

  • mtmd: SAM numerically works

  • mtmd: debug CLIP-L (vit_pre_ln)

  • mtmd: debug CLIP-L & first working DeepSeek-OCR model

  • mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work

  • mtmd: simplify SAM patch embedding

  • mtmd: adapt Pillow image resizing function

  • mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing

  • mtmd: remove --dsocr-mode argument

  • mtmd: refactor code & remove unused helper functions

  • mtmd: fix tensor names for image newlines and view separator

  • clean up

  • reverting automatically removed spaces

  • reverting automatically removed spaces

  • mtmd: fixed bad ocr check in Deepseek2 (LM)

  • mtmd: support combined QKV projection in buid_vit

  • using common build_attn in sam

  • corrected code-branch when flash-attn disabled
    enabling usage of --flash-attn option

  • mtmd: minor fix

  • minor formatting and style

  • fixed flake8 lint issues

  • minor editorconfig-check fixes

  • minor editorconfig-check fixes

  • mtmd: simplify get_rel_pos

  • mtmd: make sam hparams configurable

  • mtmd: add detailed comments for resize_bicubic_pillow

  • mtmd: fixed wrong input setting

  • mtmd: convert model in FP16

  • mtmd: minor fix

  • mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template

  • fix: test-1.jpg ORC issue with small (640) resolution
    setting min-resolution base (1024) max large (1280) for dynamic-resolution

  • minor: editconfig-check fix

  • merge with changes from #17909
    added new opt to tests.sh to disable flash-attn

  • minor: editconfig-check fix

  • testing deepseek-ocr
    quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR

  • quick and (potential) dirty merge with #17909

  • refactoring, one single builder function and static helpers

  • added deepseek-ocr test to tests.sh

  • minor formatting fixes

  • check with fixed expected resutls

  • minor formatting

  • editorconfig-check fix

  • merge with changes from #18042

  • minor

  • added GLM-4.6V to big tests
  • added missing deps for python test
  • convert: minor fix

  • mtmd: format code

  • convert: quick fix

  • convert: quick fix

  • minor python formatting

  • fixed merge build issue

  • merge resolved

  • fixed issues in convert
  • tested several deepseek models
  • minor fix

  • minor

  • Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

    • removed clip_is_deepseekocr
  • removed redundant RESIZE_ALGO_BICUBIC_PILLOW resize-algo
  • simplified image-preprocessing
  • removed/simplified debug functions
    • cleaning commented out code
  • fixing instabilities issues reintroducing resize_bicubic_pillow

    • use f16 model for deepseek-ocr test
  • ignore llama-arch test for deepseek-ocr
  • rename fc_w --> mm_fc_w

  • add links to OCR discussion

  • cleaner loading code

  • add missing .weight to some tensors

  • add default jinja template (to be used by server)

  • move test model to ggml-org

  • rolling back upscale change

  • Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com


Co-authored-by: bluebread hotbread70127@gmail.com
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
Co-authored-by: Xuan Son Nguyen son@huggingface.co
Co-authored-by: Xuan-Son Nguyen thichthat@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.