github cloudflare/agents @cloudflare/think@0.8.2

7 hours ago

Patch Changes

  • #1667 919bfaa Thanks @threepointone! - fix(think): make the parallel-tool auto-continuation barrier event-driven (#1650, follow-up to #1649)

    #1649 added a barrier so auto-continuation waits for all of a step's parallel client-tool results before firing, but bounded the wait with a fixed 60s timeout that fired through on expiry. That timeout was the wrong primary mechanism: a human-in-the-loop tool with no execute (an ask_user/display_ui-style prompt) emitted in parallel with a fast tool legitimately parks at input-available for minutes, so the barrier would fire through and repair the still-open tool to errored while the user was answering. Orphans (a client disconnecting mid-batch) also pinned the isolate alive via keepAlive for the full 60s.

    Auto-continuation is only ever triggered by a tool-result/approval event, so the barrier is now purely event-driven. When the coalesce timer fires on an incomplete batch, Think drains the in-flight applies, re-checks, and — if a sibling is still unanswered — returns without firing and without holding the isolate, leaving the pending continuation in place. The next sibling's result re-arms the timer (or, after eviction, re-creates the pending state from the persisted transcript) and re-runs the check; the continuation fires exactly once when the final sibling lands. A legitimately slow human answer never fires through to a spurious error, a true orphan never auto-continues and never pins the isolate, and the case is self-healing across hibernation. This removes AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS (and its console.warn) from the Think path entirely.

    Because the barrier now keys off events rather than polling message state, it also handles the case where the result that completes a parallel batch is an errored one: the client sends autoContinue: false for an errored tool result, so that event no longer schedules a continuation. When a sibling has already opted in (a pending continuation exists), such a result now re-arms the barrier check so the batch still continues exactly once — without ever creating a continuation for a standalone errored tool.

    Crucially, this also fixes #1649's headline race (MissingToolResultsError): the model emits parallel tool calls sequentially within one step, so a fast client tool can resolve and round-trip a result to the server while the model is still streaming the slower siblings — at which point those siblings exist nowhere (not in the persisted transcript, not in the in-flight accumulator), so no batch check can see them and the barrier fires prematurely. The continuation then repairs the later-materialized siblings to errored. The barrier now holds while the assistant turn is streaming (_streamingAssistant != null) and re-checks when the stream finalizes (_onStreamingTurnFinalized) — which also covers the all-fast batch whose every result landed mid-stream, where there is no later tool-result event to re-arm it.

    @cloudflare/ai-chat keeps the bounded-wait barrier for now (its barrier runs inside the queued continuation turn and can't return-and-wait without occupying the chat-turn queue); making it event-driven requires moving the batch gate before queueing, tracked alongside the think↔ai-chat unification (#1642).

  • #1671 ebd0bf2 Thanks @threepointone! - fix(think): don't re-arm the auto-continuation barrier when an RPC stall routes into bounded recovery (#1667 follow-up)

    The RPC streaming path (_streamResultToRpcCallback) re-armed the auto-continuation coalesce timer in its finally even on the stream-stall recovery early-returns (scheduled/exhausted), unlike the WebSocket _streamResult recovery paths which deliberately do a plain _streamingAssistant = null without re-arming. When a parallel tool batch had a pending continuation at the moment the stall watchdog fired, that re-arm could fire a second continuation alongside the alarm-scheduled recovery continuation — a spurious double model invocation on the turn queue. The RPC recovery early-returns now mirror the WebSocket path (plain clear, no re-arm); the scheduled recovery continuation re-runs the turn and its own stream finalize re-triggers the held barrier exactly once.

Don't miss a new agents release

NewReleases is sending notifications on new releases.