Details
mtmd : chat : Fix extra \n between text and media marker (#19595)
- mtmd : chat : Fix extra \n between text and media marker
Thanks to @tugot17 for detecting and reporting the issue.
For vision models (e.g. LFM2.5-VL-1.6B and Qwen/Qwen3-VL-4B-Instruct) llama-mtmd-cli produces identical output to HF implementation.
However llama-server doesn't. I traced it down to extra newline
inserted after <__media__>.
This happens in to_json_oaicompat, that treats media markers as text
and joins all parts with \n separator.
PR introduces new type media_marker and uses it for media markers.
Extra logic is added to prevent insertion of newlines before and after
media markers.
With this change number of input tokens is identical to HF
implementation and as a result the output is also identical.
I explored other ways to address the issue
- remove completely
\nbetween text parts into_json_oaicompat - merge text messages in server-common.cpp before sending them to
to_json_oaicompat
Please propose alternative ways of fixing this issue.
-
Refactor to use explicite per type ifs
-
Update common/chat.cpp
Co-authored-by: Piotr Wilkin (ilintar) piotr.wilkin@syndatis.com
- Update common_chat_templates_apply_legacy
Co-authored-by: Piotr Wilkin (ilintar) piotr.wilkin@syndatis.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: