github ggml-org/llama.cpp b7793

latest releases: b7801, b7798, b7795...
3 hours ago
Details

server: /v1/responses (partial) (#18486)

  • from previous PR

  • Make instruction(system) as first message

  • Convert [input_message] (text/image/file)

  • Rename convert_responses_to_chatcmpl(body) -> response_body

  • Initial tool call support

  • Erase instructions field from chatcmpl body

  • Feed reasoning texts to chat template

  • Use std::vector instead of opaque json array

  • Make output_item.added events consistent

  • Move server_task_result_cmpl_partial::update from header to source

  • Match ID of output_item.added and .done events

  • Add function_call only if there is no "fc_" prefix

  • Add function call output at non-streaming API

  • Test if ID is persistent

  • Add doc

  • Fix style - use trailing comma

  • Rewrite state management

  • catch up with upstream/master

  • Fix style - "type" is the first item of SSE data

  • Explicitly check "instructions" from response_body

  • Make lambdas static

  • Check if reasoning content exists

  • Add oai_resp_id to task_result_state(also initialized at ctor), server_task_result_cmpl_partial, and server_task_result_cmpl_final

  • Reject input_file since it is not supported by chatcmpl

  • Add "fc_" prefix to non-straming function call id as coderabbit pointed out


Co-authored-by: openingnow <>

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.