Patch Changes
-
#451
2a62e23Thanks @mchenco! - Fix reasoning content being concatenated into assistant message content in multi-turn conversationsPreviously, reasoning parts in assistant messages were concatenated into the
contentstring when building message history. This caused models likekimi-k2.5anddeepseek-r1to receive their own internal reasoning as if it were spoken text, corrupting the conversation history and resulting in empty text responses or leaked special tokens on subsequent turns.Reasoning parts are now sent as the
reasoningfield on the assistant message object, which is the field name vLLM expects on input for reasoning models (kimi-k2.5, glm-4.7-flash).