github cloudflare/agents @cloudflare/think@0.11.1

4 hours ago

Patch Changes

  • #1826 1bbd9bc Thanks @threepointone! - Add a tight, OOM-specific retry budget to chat recovery so a memory-limit crash loop seals fast and attributably (#1825).

    When a recovery turn hits a Durable Object memory-limit reset (the isolate exceeded its 128 MB limit), recovery now classifies it as a distinct, deterministic failure rather than a deploy-style transient. A memory reset re-OOMs on re-run (the turn's working set, not the platform, is the cause), so it must NOT be deferred and retried forever like a code-update/connection-lost transient. Each such crash bumps a durable per-incident oomAttempts counter; recovery retries a small number of times (new chatRecovery.maxOomRetries, default 3) — in case the OOM was a transient spike — then seals with reason="out_of_memory". This is far tighter than the generic maxRecoveryWork backstop because an OOM is attributable and each re-run re-runs the model.

    This complements the finite maxRecoveryWork default: the OOM budget is the fast path for memory resets that surface as catchable errors thrown from recovery bookkeeping (e.g. storage/SQL rejections after the reset), while maxRecoveryWork remains a backstop for the hard-kill case where no in-isolate code runs to record the OOM.

    Adds an alarm-boundary circuit breaker (agents) as the universal backstop for the case the in-DO budgets can't catch (#1825): a memory-limit reset that bypasses them entirely — thrown before the budget code runs (e.g. boot-time state hydration OOMs), or whose own small writes also OOM under memory pressure. Left unhandled, such an error propagates out of alarm() and the platform auto-retries the alarm forever, re-running the doomed, billable turn each cycle. Agent.alarm() now intercepts ONLY Durable Object memory-limit resets at the outermost frame — where the heavy turn has unwound and GC has reclaimed its footprint, so the seal/purge writes can land where mid-turn ones OOMed. A durable strike counter tolerates a few resets (new static options.maxAlarmMemoryLimitStrikes, default 3) — backing off the looping rows so the retry is not a hot loop — then seals the recovery (out_of_memory) and surgically purges only the looping schedule rows, leaving unrelated scheduled tasks intact. A new alarm:memory_limit_reset observability event is emitted. Everything except memory-limit resets re-throws exactly as before.

    Also broadens and exports the isDurableObjectMemoryLimitReset(error) predicate from agents (a sibling to isDurableObjectCodeUpdateReset / isPlatformTransientError): it now matches the shared "exceeded its memory limit" fragment so truncated/reworded surfacings (observed in real #1825 logs) still classify.

  • #1826 1bbd9bc Thanks @threepointone! - Fix neverending chat-recovery retries when a Durable Object isolate runs out of memory mid-turn (#1825).

    chatRecovery.maxRecoveryWork now defaults to a generous finite backstop (1000) instead of Infinity. An isolate that exceeds its memory limit and is reset mid-stream has usually already streamed a little content, which bumps the durable progress counter. On the next wake recovery reads that as forward progress and resets both progress-keyed bounds — the attempt cap (maxAttempts) and the no-progress window (noProgressTimeoutMs) — and because each crash lands inside the alarm-debounce window the attempt counter is pinned too. With the work budget disabled (Infinity), no instrument could ever seal the turn, so recovery re-ran the turn (and its LLM calls) forever. The work meter is the one signal that keeps climbing across such a loop, so a finite default seals a runaway with reason="work_budget_exceeded" instead of looping.

    Work only accrues from the first interruption until the turn completes, so a normal interrupted turn never approaches the cap. A very long agentic turn that legitimately produces a large amount of content under heavy interruption can raise maxRecoveryWork (or set it to Infinity to restore the previous fully-unbounded behavior, ideally paired with a shouldKeepRecovering predicate that bounds the runaway via real token/cost accounting).

  • #1821 de6a695 Thanks @threepointone! - Add an opt-in, read-only HTTP fetch capability for Think agents via the new @cloudflare/think/tools/fetch export and a fetchTools property on Think.

    createFetchTools() generates a generic, allowlisted fetch_url tool plus one fetch_<name> tool per named service-binding/Fetcher target. It is GET-only with Workers-grounded SSRF defenses (private/loopback/link-local/*.internal blocking, URL normalization, credential rejection), separate download/model/workspace size limits (maxBytes, maxModelChars, response: "workspace" spill), an allowlist-aware redirect policy with cross-origin header stripping, a model header allowlist, and a tool:fetch observability event. Disabled by default.

  • #1823 b58b5a3 Thanks @threepointone! - Improve Think's tool-call lifecycle hooks (follow-ups from #1343):

    • Preserve preliminary streaming through beforeToolCall. Tools whose execute is an async generator (async function* execute(...)) now stream their preliminary tool-results to the model even though Think wraps execute to consult beforeToolCall first. Non-streaming tools keep a scalar wrapper, so they never emit a synthetic preliminary chunk. The non-canonical async () => makeIterator() form (a Promise<AsyncIterable>) still collapses to its last yielded value, matching the raw AI SDK.
    • Per-tool typing on the lifecycle contexts. When an explicit TOOLS generic is passed, narrowing on ctx.toolName now narrows ctx.input on beforeToolCall and — new — ctx.output on afterToolCall's success branch to that tool's inferred output type. Dynamic tools stay unknown. Behavior with the default ToolSet is unchanged.

Don't miss a new agents release

NewReleases is sending notifications on new releases.