Details
common : fix Step-3.5-Flash format detection and thinking support (#19635)
- common : fix Step-3.5-Flash format detection and thinking support
Step-3.5-Flash uses the same XML-style tool call format as Qwen3-Coder
(<tool_call><function=...><parameter=...>) but its Jinja template lacks
the bare and plural markers that the detection
logic previously required. This caused it to fall through to Hermes 2
Pro, which doesn't call func_args_not_string(), so arguments stayed as
JSON strings and templates using arguments|items crashed.
Additionally, the Qwen3-Coder-XML format handler had no thinking support.
Models like Step-3.5-Flash that unconditionally emit in their
generation prompt need the same thinking_forced_open handling that
Nemotron v3 and Hermes 2 Pro already have, otherwise reasoning_content
is never separated from content in API responses.
Changes:
- Relax Qwen3-Coder XML detection to only require the 3 shared markers
- Tighten Nemotron v3 branch to also require bare and plural
, preventing Step-3.5-Flash from being misrouted via - Add thinking_forced_open support to Qwen3-Coder-XML init function
- Add / to preserved tokens
- Fix build_grammar_xml_tool_call to handle thinking_forced_open in the
grammar root rule, allowing before tool calls - Add Step-3.5-Flash chat template and format detection test
Builds on: #19283
- chat : route Step-3.5-Flash to Nemotron v3 PEG parser, add tests
Step-3.5-Flash uses the same XML tool call format as Qwen3-Coder and
Nemotron 3 Nano (<tool_call>/<function=...>/<parameter=...>) but with
unconditional output. Route it to the Nemotron v3 PEG parser
for streaming and schema-aware parameter parsing.
Detection: templates with + XML tool tags use Nemotron v3 PEG
parser; templates without (Qwen3-Coder) use GBNF grammar.
Tests cover: basic messages, tool calls with/without thinking content,
parallel tool calls, code string parameters, optional
closing tags, and JSON schema response format.
- chat : remove dead thinking code from qwen3_coder_xml
Remove thinking handling code that became unreachable after routing
Step-3.5-Flash to the Nemotron v3 PEG parser. Qwen3-Coder has no
in its template, so the thinking_forced_open logic, preserved
tokens, and grammar prefix were dead paths.
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: