github ggml-org/llama.cpp b7644

latest releases: b7703, b7700, b7699...
4 days ago
Details

server : add thinking content blocks to Anthropic Messages API (#18551)

  • server : add thinking content blocks to Anthropic Messages API

Add support for returning reasoning/thinking content in Anthropic API
responses when using models with --reasoning-format deepseek and the
thinking parameter enabled.

  • Non-streaming: adds thinking block before text in content array
  • Streaming: emits thinking_delta events with correct block indices
  • Partial streaming: tracks reasoning state across chunks via
    anthropic_has_reasoning member variable

Tested with bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF model.

  • server : fix Anthropic API streaming for thinking content blocks

Add signature field and fix duplicate content_block_start events in
Anthropic Messages API streaming responses for reasoning models.

  • server: refactor Anthropic streaming state to avoid raw pointer

Replace raw pointer to task_result_state with direct field copies:

  • Copy state fields in update() before processing chunk
  • Use local copies in to_json_anthropic() instead of dereferencing
  • Pre-compute state updates for next chunk in update()

This makes the data flow clearer and avoids unsafe pointer patterns.

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.