github cloudflare/agents agents@0.14.2

latest release: @cloudflare/ai-chat@0.8.2
6 hours ago

Patch Changes

  • #1684 ab6dd95 Thanks @threepointone! - warn when chatRecovery is configured in onStart() (applied too late for wake recovery)

    On every Durable Object wake the SDK evaluates chat-recovery budgets — and may seal an interrupted turn, firing onExhaustedbefore the user's onStart() runs (_checkRunFibers() is ordered ahead of onStart()). A chatRecovery config produced inside onStart() is therefore read as the built-in defaults at the moment recovery decides, so a configured maxRecoveryWork / shouldKeepRecovering / onExhausted silently never applies to the recovery that matters.

    This is now documented on ChatRecoveryConfig and the chatRecovery fields of Think / AIChatAgent, and the SDK logs a one-time warning if it detects chatRecovery being reassigned during onStart(). The warning fires both for a custom config object and for chatRecovery = true (enabling recovery / its defaults too late); assigning false (disabling) in onStart() is intentionally not warned, since recovery already ran with the pre-onStart() value and disabling it afterward is a benign no-op for that wake. The fix is to assign chatRecovery as a class field or in the constructor.

  • #1672 f96a2ba Thanks @threepointone! - fix(chat-recovery): a turn making forward progress now survives unbounded deploy churn; add a work budget + shouldKeepRecovering runaway guard

    Durable chat recovery used to bound a single incident with a non-resetting 15-minute wall-clock ceiling (CHAT_RECOVERY_MAX_WINDOW_MS). That ceiling was overloaded — it served as both a recovery-duration bound and a runaway-loop guard — and it terminated healthy, actively-progressing turns that simply took longer than 15 minutes of wall-clock to finish while being repeatedly interrupted by a dense deploy window, sealing them with reason="max_recovery_window_exceeded" and discarding completed work.

    The two jobs are now decoupled (see design/rfc-chat-recovery-work-budget.md):

    • Duration is no longer a bound for a progressing turn. The non-resetting wall-clock ceiling is removed. A turn that keeps producing content survives unbounded deploy churn. Stuck turns are still sealed by the no-progress window (5 min, resets on progress); tight no-progress alarm loops by the attempt cap.
    • New runaway-loop guard, keyed to work, not time. The existing durable, monotonic, reconnect-immune progress counter is reused as a work meter. chatRecovery.maxRecoveryWork caps the produced content/tool units since an incident opened; exceeding it seals with reason="work_budget_exceeded". Defaults to Infinity — the SDK ships the mechanism but imposes no implicit cap, so it never terminates a progressing turn on its own.
    • New caller predicate. chatRecovery.shouldKeepRecovering(ctx) is consulted per recovery attempt from the second onward (only when no hard bound has already sealed the incident); returning false seals with reason="recovery_aborted". This is where integrators express token/cost/step budgets the SDK should not hardcode. A throwing predicate is logged and treated as "keep recovering".
    • The no-progress timeout is now configurable. chatRecovery.noProgressTimeoutMs (default 5 min, resets on progress) is the primary stuck-turn bound, now overridable per agent instead of a hardcoded constant.

    New public types from agents/chat: ChatRecoveryProgressContext. New ChatRecoveryConfig fields: maxRecoveryWork, shouldKeepRecovering, noProgressTimeoutMs. ChatRecoveryExhaustedContext.reason gains work_budget_exceeded and recovery_aborted; max_recovery_window_exceeded is retained as an open-string value but is no longer emitted.

    Both @cloudflare/ai-chat and @cloudflare/think (which carries its own copy of the recovery engine) are updated identically. Defaults are unchanged except that a progressing turn is no longer terminated by wall-clock age.

  • #1668 d40cc8a Thanks @ghostwriternr! - Fix RPC resource leaks in workflows.

    Workflows that use waitForApproval() or ThinkWorkflow.prompt() now release their RPC stubs promptly, preventing resource leaks and the associated "RPC stub was not disposed" warnings in your logs.

  • #1679 c8d1d32 Thanks @threepointone! - fix(sub-agents): a facet sub-agent no longer touches the root DO's WebSockets, fixing a production-only "Cannot perform I/O on behalf of a different Durable Object (Native)" crash (#1677)

    A sub-agent (facet) that called setState(), broadcast(), or otherwise enumerated connections — directly or indirectly via the internal _broadcastProtocol() — could crash in production with Cannot perform I/O on behalf of a different Durable Object. ... (I/O type: Native). It reproduced when the root Agent held a live (hibernatable) WebSocket connection and the child facet was freshly bootstrapped; it never reproduced in wrangler dev/miniflare, which made it hard to catch.

    Root cause: the Agent overrides of getConnections() and getConnection() fell through to super.getConnections() / super.getConnection() for facets too. On a facet, that resolves to the host/root DO's hibernatable WebSockets, and reading their attachments from the facet's I/O context is a cross-DO native I/O access that workerd aborts. setState() tripped it only incidentally, because _broadcastProtocol() enumerates connections to compute its exclude list before sending anything.

    Fix: a facet's client connections are all virtual (real sockets owned by the root and bridged in), so getConnections()/getConnection() now return only the facet's virtual sub-agent connections and never fall through to the host DO's sockets. Delivery of facet state updates to clients connected directly to the sub-agent is unchanged.

  • #1670 5d64940 Thanks @threepointone! - Fix: a deploy that interrupts an in-flight runAgentTool child no longer abandons the still-running child as interrupted.

    Parent recovery re-attaches to a still-running child and tails it to its real terminal. Previously that re-attach used a flat 120s wall-clock budget that was not reset by the child's forward progress, so a healthy child whose recovery legitimately ran longer than the budget was sealed interrupted (and its already-completed work re-run from scratch), even while it was actively streaming.

    The re-attach budget is now progress-keyed: it bounds how long the parent waits with no forward progress from the child (resetting on every forwarded chunk), so a genuinely hung/silent child still seals interrupted after one no-progress window and can never block recovery forever, while a healthy child that keeps streaming is followed through to terminal. The parent re-arms (opens a fresh tail) only when the child's stream closes cleanly while it is still advancing — i.e. a re-evicted-but-progressing child. A full no-progress window (the child went silent) seals no-progress immediately even if the child streamed earlier in that window; it no longer grants a bonus window. This is both the honest stall signal and what keeps at most one pending tail reader alive per re-attach (no per-cycle reader accumulation).

    @cloudflare/think and @cloudflare/ai-chat additionally finalize a child facet's own agent-tool run row as soon as its recovered turn settles — regardless of whether recovery took the continue path (_chatRecoveryContinue) or the pre-stream retry path (_chatRecoveryRetry) — so a re-attached parent collects the terminal result immediately instead of waiting out a full no-progress window after the child has already finished.

    This release also adds:

    • Typed interrupted cause. RunAgentToolResult, the agentTool() AgentToolFailure envelope, the onAgentToolFinish lifecycle result, and the agent-tool-event wire event (kind "interrupted") now carry a machine-readable reason (AgentToolInterruptedReason: "no-progress" | "window-exceeded" | "not-tailable" | "inspect-timeout" | "inspect-failed" | "recovery-deadline") and a childStillRunning boolean on interrupted results, so callers (and UIs) can branch on why a run was abandoned (and whether the child is still running) instead of pattern-matching the human-readable error prose. retryable stays coarse (always true for interrupted); refine with reason / childStillRunning. These fields are persisted (schema bump), so they survive a reconnect replay — a client that reconnects after an interrupt reconstructs the same reason / childStillRunning a live client saw, rather than undefined. The persisted cause is cleared when a soft interrupted row is later repaired to completed/error.
    • Configurable re-attach budgets. Two new public AgentStaticOptionsagentToolReattachNoProgressTimeoutMs (default 120000, the progress-keyed no-progress budget) and agentToolReattachMaxWindowMs (default Infinity — no implicit wall-clock cap) — let an Agent tune re-attach. The hard ceiling defaults to uncapped to mirror chat-recovery's maxRecoveryWork: Infinity: a re-attached parent follows a healthy, still-advancing child for as long as it makes progress — exactly as it would on the live (never-evicted) path — so it never abandons a long-running-but-healthy child that simply outlasts a fixed wall clock under deploy churn. A hung/silent child is bounded by the no-progress budget; a content-runaway is bounded uniformly (live and recovery) by the child's own maxRecoveryWork / shouldKeepRecovering. Integrators that want a hard wall-clock cap (and the window-exceeded child teardown it triggers) can set agentToolReattachMaxWindowMs to a finite value. Symmetrically, setting agentToolReattachNoProgressTimeoutMs to Infinity now means "never seal on no-progress" (a silent-but-alive child is followed until its stream closes or the hard ceiling fires) instead of silently skipping the wait — 0 remains the "don't wait, collect only an already-terminal child" sentinel.
    • Give-up teardown (ceiling only). When the parent gives up at the hard window-exceeded ceiling — where the child has had its full recovery window and is truly exhausted — it now cancels the child (childStillRunning: false) so it stops consuming a fiber / keep-alive. no-progress give-ups stay soft (childStillRunning: true): the child is left running so a re-issue can still re-attach and repair it if it self-heals, preserving the repair-on-re-issue path. In both @cloudflare/think and @cloudflare/ai-chat, cancelAgentToolRun also aborts an in-flight chat-recovery turn (not just the original in-isolate run) and releases live tails — Think sweeps its _submissionAbortControllers, ai-chat its request AbortRegistry (abortAllRequests) — so a torn-down child stops grinding instead of finishing an orphaned recovered turn.
  • #1680 8f9500a Thanks @threepointone! - Remove the now-redundant _suppressProtocolBroadcasts facet-bootstrap guard.

    This flag was added in #1425 to stop _broadcastProtocol() from enumerating the
    parent DO's WebSockets during facet bootstrap (the cross-DO Native I/O crash,
    #1410/#1677). The proper fix in #1679 makes getConnections()/broadcast()
    facet-safe at the source — on a facet they return only virtual sub-agent
    connections and route through the parent bridge, never touching the parent's own
    sockets. With that, suppressing broadcasts during bootstrap is unnecessary, and
    removing it also lets legitimate state sync run during the bootstrap window.

    The separate request/WebSocket/email native-handle clearing from #1425 is
    retained, since #1679 does not cover that vector.

  • #1675 d915bc6 Thanks @threepointone! - The skill runner now imports just-bash and @cloudflare/codemode statically instead of dynamically, and both have moved from optional peer dependencies to regular dependencies of agents. The dynamic imports were ineffective in bundled Workers (the bundler includes them eagerly regardless) and triggered INEFFECTIVE_DYNAMIC_IMPORT warnings when bundled alongside @cloudflare/think, which imports them statically. @cloudflare/think also now statically imports its internal ExtensionManager instead of dynamically, removing the third such warning.

  • #1662 df6c0d6 Thanks @threepointone! - Add opt-in recovery for mid-turn context-window overflow.

    Compaction only fires between turns (Session.compactAfter checks the threshold on appendMessage). A single long, tool-heavy turn grows the prompt step-by-step inside one streamText loop and can exceed the model's context window mid-turn, before the next pre-turn check — the provider then 400s ("prompt is too long" / context_length_exceeded) and the turn dies terminally. Think deliberately ships no provider-specific error matching, so it could neither detect nor recover from this.

    This adds opt-in, provider-agnostic recovery (all default off — no behavior change unless enabled), configured through a single contextOverflow property on Think:

    • classifyChatError(error, ctx) — the app maps a raw error (or the in-stream error string) to a ChatErrorClassification ("context_overflow" | "rate_limit" | "transient" | "fatal" | "unknown"). Same framework-owns-the-mechanism / app-owns-the-provider-knowledge split as tokenCounter. The classification is also threaded to onChatError/observers via ChatErrorContext.classification. The bundled, exported defaultContextOverflowClassifier covers the common providers (Anthropic, OpenAI, Google, Bedrock, …) for apps that do not need custom classification.
    • contextOverflow.reactive + contextOverflow.maxRetries — when a turn fails with a context_overflow the app classified, Think discards the truncated partial, runs session.compact(), and re-runs the turn (bounded) from the compacted history instead of dying. The partial is intentionally not persisted: the retry restarts the turn from scratch, so keeping the cut-off partial would orphan a half-finished assistant message beside the recovered answer (and duplicate any tool work the retry re-issues). A no-op compaction or a spent budget surfaces the overflow terminally through onChatError with classification: "context_overflow" — never a silent end, never an infinite loop. Wired into the WebSocket, chat()/RPC, and programmatic (saveMessages/submitMessages) turn paths.
    • contextOverflow.proactive — a { maxInputTokens, headroom?, maxCompactions? } pre-step guard: when the previous step's model-reported usage.inputTokens crosses maxInputTokens * (headroom ?? 0.9), Think compacts in place and feeds the recompacted history into the upcoming step, heading off the provider 400 before it happens. Keys off model-reported usage (every provider reports it), not provider error strings. Bounded per step loop by its own maxCompactions (default 1, independent of the reactive maxRetries budget).

    Also adds a chat:context:compacted observability event (agents) emitted (once) on both proactive and reactive compaction.

    Notes:

    • Provider context-overflow errors always surface as in-stream error parts (confirmed against the AI SDK: streamText re-enqueues even top-level rejections as { type: "error" } fullStream parts, and toUIMessageStream passes them through without throwing), so the in-stream seam catches them on every path; the thrown-error catch path does not need separate wiring.
    • Recovery effectiveness depends on the app's compaction config — a no-op compaction cannot rescue an over-budget turn (handled gracefully: terminal, not a loop). A one-time warning fires if contextOverflow.reactive is enabled but classifyChatError was never overridden.
  • #1675 d915bc6 Thanks @threepointone! - The agents/vite plugin now stubs turndown by default. turndown (pulled in transitively by just-bash for the workspace bash tool and skill runner) runs a top-level require() in its Node DOM fallback, which throws ReferenceError: require is not defined at Worker startup — even when the bash tool is never used. The plugin replaces it with an inert stub so Workers deploys stay clean. Opt out with agents({ stubTurndown: false }) if your app uses turndown directly.

Don't miss a new agents release

NewReleases is sending notifications on new releases.