This alpha introduces a major backwards-compatible refactor. Models can now be prompted with a list of messages, OpenAI Chat Completions style, and the response can now be iterated over as a sequence of mixed types of content, for example reasoning tokens mixed with text tokens mixed with tool calls.
For more background on this release take a look at the annotated release notes on my blog.
Prompt inputs and response outputs are now expressed as a list of Message objects, each containing typed Part objects (text, reasoning, tool calls, tool results, attachments).
The llm CLI tool can now display reasoning tokens while executing a prompt.
Plugin authors should read the expanded Advanced model plugins documentation, which now covers StreamEvent, consuming prompt.messages, and round-tripping opaque provider metadata such as Anthropic extended-thinking signatures and Gemini thoughtSignature values.
Structured messages and streaming events
- New
llm.Messagevalue type and constructor helpersllm.user(),llm.assistant(),llm.system(), andllm.tool_message()for building structured prompt inputs. The helpers accept strings,Attachmentinstances, or nestedPartlists. - New
messages=keyword argument onmodel.prompt(),conversation.prompt(),model.chain(),conversation.chain(), and their async counterparts. Theprompt=,system=,attachments=, andtool_results=keywords still work and synthesize into the sameMessagelist internally. - New
response.stream_events()andresponse.astream_events()methods yielding typedStreamEventobjects (typeis one of"text","reasoning","tool_call_name","tool_call_args","tool_result", plus aredacted=Truemarker for opaque reasoning). Iterating againstresponsedirectly continues to yield only text strings. - New
response.messages()method (async:await response.messages()) returning the assembledlist[Message]produced by the model. Calling it forces execution if the response prompt has not yet been executed. - New
response.reply(prompt=None, **kwargs)method that continues the conversation from anyResponse, regardless of origin. When the previous response made tool calls andtool_results=was not passed,reply()automatically executes the pending tool calls and threads the results into the next turn. On async responsesreply()is awaitable. - New
response.to_dict()andResponse.from_dict(data, *, model=None)for JSON-safe serialization of a full conversation turn --- model id, input chain, assembled output (including reasoning parts and provider metadata), options, and audit fields. Reasoning signatures andthoughtSignaturevalues round-trip viaprovider_metadata, so multi-turn extended thinking works across process boundaries. - New
llm/serialization.pymodule exposingMessageDict,PartDict,ResponseDict,PromptDict,UsageDict,AttachmentDict, and the per-Part TypedDicts. Everyto_dict()/from_dict()method is annotated with the matching TypedDict. Response.prompt.messagesis now the canonical structured input across the entire conversation chain.Conversation.promptandAsyncConversation.promptpre-compute the full chain (prior input + prior output + new turn) before constructing the nextPrompt, soresponse.prompt.messagesis always exactly what the model was sent.
CLI
llm promptandllm chatnow display visible reasoning text to stderr in a dim style while the response streams.- New
-R/--no-reasoningflag forllm promptandllm chatto suppress the reasoning stream. llm logsnow renders any visible reasoning emitted during a response under a## Reasoningheading above the response.- New
reasoningcolumn on theresponsestable populated from the visible-reasoning text.