Patch Changes
-
#1667
919bfaaThanks @threepointone! - fix(think): make the parallel-tool auto-continuation barrier event-driven (#1650, follow-up to #1649)#1649 added a barrier so auto-continuation waits for all of a step's parallel client-tool results before firing, but bounded the wait with a fixed 60s timeout that fired through on expiry. That timeout was the wrong primary mechanism: a human-in-the-loop tool with no
execute(anask_user/display_ui-style prompt) emitted in parallel with a fast tool legitimately parks atinput-availablefor minutes, so the barrier would fire through and repair the still-open tool to errored while the user was answering. Orphans (a client disconnecting mid-batch) also pinned the isolate alive viakeepAlivefor the full 60s.Auto-continuation is only ever triggered by a tool-result/approval event, so the barrier is now purely event-driven. When the coalesce timer fires on an incomplete batch, Think drains the in-flight applies, re-checks, and — if a sibling is still unanswered — returns without firing and without holding the isolate, leaving the pending continuation in place. The next sibling's result re-arms the timer (or, after eviction, re-creates the pending state from the persisted transcript) and re-runs the check; the continuation fires exactly once when the final sibling lands. A legitimately slow human answer never fires through to a spurious error, a true orphan never auto-continues and never pins the isolate, and the case is self-healing across hibernation. This removes
AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS(and itsconsole.warn) from the Think path entirely.Because the barrier now keys off events rather than polling message state, it also handles the case where the result that completes a parallel batch is an errored one: the client sends
autoContinue: falsefor an errored tool result, so that event no longer schedules a continuation. When a sibling has already opted in (a pending continuation exists), such a result now re-arms the barrier check so the batch still continues exactly once — without ever creating a continuation for a standalone errored tool.Crucially, this also fixes #1649's headline race (
MissingToolResultsError): the model emits parallel tool calls sequentially within one step, so a fast client tool can resolve and round-trip a result to the server while the model is still streaming the slower siblings — at which point those siblings exist nowhere (not in the persisted transcript, not in the in-flight accumulator), so no batch check can see them and the barrier fires prematurely. The continuation then repairs the later-materialized siblings to errored. The barrier now holds while the assistant turn is streaming (_streamingAssistant != null) and re-checks when the stream finalizes (_onStreamingTurnFinalized) — which also covers the all-fast batch whose every result landed mid-stream, where there is no later tool-result event to re-arm it.@cloudflare/ai-chatkeeps the bounded-wait barrier for now (its barrier runs inside the queued continuation turn and can't return-and-wait without occupying the chat-turn queue); making it event-driven requires moving the batch gate before queueing, tracked alongside the think↔ai-chat unification (#1642). -
#1671
ebd0bf2Thanks @threepointone! - fix(think): don't re-arm the auto-continuation barrier when an RPC stall routes into bounded recovery (#1667 follow-up)The RPC streaming path (
_streamResultToRpcCallback) re-armed the auto-continuation coalesce timer in itsfinallyeven on the stream-stall recovery early-returns (scheduled/exhausted), unlike the WebSocket_streamResultrecovery paths which deliberately do a plain_streamingAssistant = nullwithout re-arming. When a parallel tool batch had a pending continuation at the moment the stall watchdog fired, that re-arm could fire a second continuation alongside the alarm-scheduled recovery continuation — a spurious double model invocation on the turn queue. The RPC recovery early-returns now mirror the WebSocket path (plain clear, no re-arm); the scheduled recovery continuation re-runs the turn and its own stream finalize re-triggers the held barrier exactly once.