Patch Changes
-
#1657
7bff8d7Thanks @threepointone! - fix(think): apply client-tool results that arrive mid-stream so they aren't dropped (#1649 follow-up)The serialization fix in #1657 stopped parallel results from clobbering each other, but a deeper window remained: during a streaming turn the assistant message lives only in the in-flight
StreamAccumulatoruntil_persistAssistantMessagewrites it at the turn boundary. Thetool-input-availablechunk is broadcast to the client mid-stream, so a fast client can resolve the tool and sendcf_agent_tool_resultbefore the message is ever persisted._applyToolUpdateToMessagesonly scanned durable storage, so the apply silently no-op'd, the end-of-stream persist then wroteinput-available, and the auto-continuation's transcript repair errored the call with "The tool call was interrupted before a result was recorded."_applyToolUpdateToMessagesnow applies the update to the in-flight accumulator (in place, so it rides into the eventual persist) in addition to durable storage, mirroring@cloudflare/ai-chat's_streamingMessagehandling. The accumulator is exposed via_streamingAssistantfor the duration of each streaming turn and cleared on every exit path and onresetTurnState. Applying to both locations is monotonic, so a stall-recovery partial persist can't downgrade an already-applied result back toinput-available. -
#1665
13d6db0Thanks @threepointone! - Avoid starting empty submission and workflow notification drains during agent startup, preventing short-lived facet initializations from leaving background keep-alive work behind. -
#1661
41315b6Thanks @threepointone! - Unwedge sessions corrupted by a malformedtool_use.input, and make the failure observable.-
Read-side repair gap. Transcript repair already normalized a
null/undefined/stringified-JSON tool input, but left an empty string"", an array, and other non-object primitives untouched — so a session that persisted one of those shapes before the write-side guard shipped kept 400ing forever withtool_use.input: Input should be an object(Anthropic rejects array inputs the same way it rejects""/null)._normalizeToolInputnow delegates to the sharednormalizeToolInput, collapsing any non-object input to{}so the pre-send repair pass rescues the session on its next turn. -
Observability. An AI-SDK provider error surfaces as a stream error part, not a thrown exception, so it took the in-band
errorbranch that emittedmessage:errorbut neverchat:request:failed. That branch now also emitschat:request:failed(stage: "stream"), so observers and turn-count telemetry see the post-beforeTurn, in-stream failure class without needing to know whether the error threw or arrived as a chunk.
-
-
#1657
7bff8d7Thanks @threepointone! - fix(think): serialize parallel client-tool result/approval applies so siblings aren't clobbered (#1649 follow-up)The auto-continuation barrier added in #1651 stopped premature continuation, but a deeper race remained in Think. Each
tool-result/tool-approvalWebSocket message fired an independent read-modify-write of the whole assistant message, and_applyToolUpdateToMessagesawaits a storage read before its write. When the model fanned out parallel tool calls, the concurrent applies all read the sameinput-availablesnapshot, each patched only its own part, and the last write clobbered its siblings back toinput-available. The continuation barrier then timed out and the transcript-repair backstop errored the lost calls with "The tool call was interrupted before a result was recorded."Applies are now chained off a serialization tail so each read-modify-write commits atomically in arrival order.
_pendingInteractionPromisestill tracks the newest link, so the barrier's single-slot wake-up transitively waits for every predecessor.The same serialization is applied to
@cloudflare/ai-chatdefensively: its apply is currently synchronous (no await between the message read and the SQLite write), so it does not exhibit this clobber today, but the queue keeps the invariant safe if that ever changes. -
#1659
f99f890Thanks @threepointone! - Fix two chat-recovery failures that could leave a turn wedged at a half-finished assistant message after a deploy/eviction, with no terminal banner.-
Server-tool recovery deadlock. When a server-side tool's
execute()was interrupted by an eviction, the recovered turn's orphaned tool part was left atinput-available— but no clienttool-resultwill ever arrive for a server tool, sowaitUntilStablecould never converge. The recovery continuation burned its whole attempt budget on a wait that could not succeed.waitUntilStablenow treats aninput-availablepart as pending only when it is genuinely client-resolvable (a registered client tool whose result the SPA can replay, or anapproval-requestedpart). A dead server-tool orphan no longer blocks stability, so recovery converges and the existing transcript-repair pass flips the orphan to an errored result and the model continues the turn. -
Silent seal on a thrown recovery callback. A non-reset error thrown by
_chatRecoveryContinue/_chatRecoveryRetrywas re-thrown and then swallowed by the scheduler, which deleted the one-shot recovery alarm row — terminating the turn with noonExhaustedevent and no terminal banner. The recovery callbacks now terminalize a non-reset throw through the same exhaustion path (firingonExhaustedwith reasonrecovery_errorand delivering theterminalMessage), while still re-throwing a genuine Durable Object code-update reset so the platform re-runs recovery on the fresh isolate. The terminal banner is also now broadcast before the bookkeeping storage writes in the exhaustion path, and those writes are best-effort, so a storage failure during give-up can no longer suppress the user-visible terminalization.
-