ggml-org/llama.cpp b7793 on GitHub

Details

server: /v1/responses (partial) (#18486)

from previous PR
Make instruction(system) as first message
Convert [input_message] (text/image/file)
Rename convert_responses_to_chatcmpl(body) -> response_body
Initial tool call support
Erase instructions field from chatcmpl body
Feed reasoning texts to chat template
Use std::vector instead of opaque json array
Make output_item.added events consistent
Move server_task_result_cmpl_partial::update from header to source
Match ID of output_item.added and .done events
Add function_call only if there is no "fc_" prefix
Add function call output at non-streaming API
Test if ID is persistent
Add doc
Fix style - use trailing comma
Rewrite state management
catch up with upstream/master
Fix style - "type" is the first item of SSE data
Explicitly check "instructions" from response_body
Make lambdas static
Check if reasoning content exists
Add oai_resp_id to task_result_state(also initialized at ctor), server_task_result_cmpl_partial, and server_task_result_cmpl_final
Reject input_file since it is not supported by chatcmpl
Add "fc_" prefix to non-straming function call id as coderabbit pointed out

Co-authored-by: openingnow <>

macOS/iOS:

Linux:

Windows:

openEuler: