Support for the OpenAI Responses API
Most reasoning-capable OpenAI models now use the /v1/responses endpoint instead of /v1/chat/completions. This enables interleaved reasoning across tool calls for GPT-5 class models. #1435
- New
ResponsesandAsyncResponsesmodel classes driving the OpenAI Responses API. The existingChatandAsyncChatclasses are unchanged so other plugins that import them keep working. - The following models now use the Responses API by default:
o1,o3-mini,o3,o4-mini,gpt-5,gpt-5-mini,gpt-5-nano,gpt-5.1,gpt-5.2,gpt-5.4,gpt-5.4-mini,gpt-5.4-nano,gpt-5.5(and their pinned date variants). - Use
-o chat_completions 1to fall back to the older/v1/chat/completionscode path for any of these models. - Encrypted reasoning items are captured as
provider_metadataonReasoningPartobjects and round-tripped back to OpenAI on subsequent turns. - Reasoning summaries are now requested with
"summary": "auto"so visible reasoning text is streamed back where the model produces it, unless--hide-reasoningorhide_reasoning=is set. - This means OpenAI prompts run using
llm promptthat return reasoning tokens will display those on standard error.
CLI
- New
llm -m model --optionsflag to list the options supported by a given model. #1441 - The
-R/--no-reasoningoption has been renamed to-R/--hide-reasoning.
Python API
- New
hide_reasoning=Truekeyword argument onmodel.prompt(),conversation.prompt(),model.chain(),conversation.chain(), and their async counterparts, exposed to model plugins asprompt.hide_reasoning. Model plugins can use this to decide if they should request visible reasoning summaries from their providers. #1442 - New
options=dict keyword argument onModel.prompt(),Conversation.prompt(),Response.reply(), and their async equivalents, matching the pattern already used by.chain(). The previous**kwargsform continues to work for backwards compatibility but is no longer documented, and will be removed in the future. #1432
Bug fixes
add_tool_call()calls that were not also recorded as stream events are now correctly emitted asToolCallPartobjects when assembling response parts, so they survive serialization viaresponse.to_dict(). #1433