Details
mtmd: Add DeepSeekOCR Support (#17400)
-
mtmd: llama.cpp DeepSeekOCR support
init commit -
loading sam tensors
-
mtmd: fix vision model processing
-
deepseek-ocr clip-vit model impl
-
mtmd: add DeepSeek-OCR LM support with standard attention
-
mtmd: successfully runs DeepSeek-OCR LM in llama-cli
-
mtmd: Fix RoPE type for DeepSeek-OCR LM.
-
loading LM
testing Vision model loading -
sam warmup working
-
sam erroneous return corrected
-
clip-vit: corrected cls_embd concat
-
clip-vit: model convert qkv_proj split
-
corrected combining of image encoders' results
-
fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model
-
concat image_newline and image_seperator tokens
-
visual_model warmup (technically) works
-
window partitioning using standard ggml ops
-
sam implementation without using CPU only ops
-
clip: fixed warnings
-
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr
-
mtmd: fix get_rel_pos
-
mtmd: fixed the wrong scaler for get_rel_pos
-
image encoding technically works but the output can't be checked singe image decoding fails
-
mtmd: minor changed
-
mtmd: add native resolution support
-
- image encoding debugged
- issues fixed mainly related wrong config like n_patches etc.
- configs need to be corrected in the converter
-
mtmd: correct token order
-
- dynamic resizing
- changes are concerning PR sfallah#4
-
mtmd: quick fix token order
-
mtmd: fix danling pointer
-
mtmd: SAM numerically works
-
mtmd: debug CLIP-L (vit_pre_ln)
-
mtmd: debug CLIP-L & first working DeepSeek-OCR model
-
mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work
-
mtmd: simplify SAM patch embedding
-
mtmd: adapt Pillow image resizing function
-
mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing
-
mtmd: remove --dsocr-mode argument
-
mtmd: refactor code & remove unused helper functions
-
mtmd: fix tensor names for image newlines and view separator
-
clean up
-
reverting automatically removed spaces
-
reverting automatically removed spaces
-
mtmd: fixed bad ocr check in Deepseek2 (LM)
-
mtmd: support combined QKV projection in buid_vit
-
using common build_attn in sam
-
corrected code-branch when flash-attn disabled
enabling usage of --flash-attn option -
mtmd: minor fix
-
minor formatting and style
-
fixed flake8 lint issues
-
minor editorconfig-check fixes
-
minor editorconfig-check fixes
-
mtmd: simplify get_rel_pos
-
mtmd: make sam hparams configurable
-
mtmd: add detailed comments for resize_bicubic_pillow
-
mtmd: fixed wrong input setting
-
mtmd: convert model in FP16
-
mtmd: minor fix
-
mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template
-
fix: test-1.jpg ORC issue with small (640) resolution
setting min-resolution base (1024) max large (1280) for dynamic-resolution -
minor: editconfig-check fix
-
merge with changes from #17909
added new opt to tests.sh to disable flash-attn -
minor: editconfig-check fix
-
testing deepseek-ocr
quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR -
quick and (potential) dirty merge with #17909
-
refactoring, one single builder function and static helpers
-
added deepseek-ocr test to tests.sh
-
minor formatting fixes
-
check with fixed expected resutls
-
minor formatting
-
editorconfig-check fix
-
merge with changes from #18042
-
minor
- added GLM-4.6V to big tests
- added missing deps for python test
-
convert: minor fix
-
mtmd: format code
-
convert: quick fix
-
convert: quick fix
-
minor python formatting
-
fixed merge build issue
-
merge resolved
- fixed issues in convert
- tested several deepseek models
-
minor fix
-
minor
-
Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
-
- removed clip_is_deepseekocr
- removed redundant RESIZE_ALGO_BICUBIC_PILLOW resize-algo
- simplified image-preprocessing
- removed/simplified debug functions
-
- cleaning commented out code
-
fixing instabilities issues reintroducing resize_bicubic_pillow
-
- use f16 model for deepseek-ocr test
- ignore llama-arch test for deepseek-ocr
-
rename fc_w --> mm_fc_w
-
add links to OCR discussion
-
cleaner loading code
-
add missing .weight to some tensors
-
add default jinja template (to be used by server)
-
move test model to ggml-org
-
rolling back upscale change
-
Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
Co-authored-by: bluebread hotbread70127@gmail.com
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
Co-authored-by: Xuan Son Nguyen son@huggingface.co
Co-authored-by: Xuan-Son Nguyen thichthat@gmail.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: