github cloudflare/ai workers-ai-provider@3.3.0

5 hours ago

Minor Changes

  • #590 fe0d182 Thanks @threepointone! - The AI Gateway delegate gains cross-vendor server-side fallback
    (fallback: { mode: "server" }) — multiple vendors in one gateway run, with the
    winner selected via cf-aig-step.

    The gateway delegate now reaches header parity with the run path: the gateway
    path forwards cacheKey, eventId, requestTimeoutMs, and retries from the
    gateway options as cf-aig-* headers, and DelegateCallOptions gains two new
    universal-endpoint controls — byokAlias (cf-aig-byok-alias, select a stored
    BYOK key by alias) and zdr (cf-aig-zdr, per-request Zero Data Retention
    override for Unified Billing, applied on both transports).

    Internally, the provider registry, cf-aig-* header building, resumable-stream
    engine, and Workers AI SSE helpers are now shared across the Cloudflare AI
    packages (bundled inline — no new dependency for you to install).

  • #593 1c6afd0 Thanks @threepointone! - Native Workers AI failures are now surfaced as AI SDK APICallErrors so the AI
    SDK's built-in retry (maxRetries) can engage on transient errors.

    Previously the binding path (env.AI.run) threw plain Errors and the REST
    path threw a generic Error, so the AI SDK never retried them — most notably
    the common "out of capacity" failure (internal code 3040, HTTP 429) and
    other 5xx blips just failed the call outright.

    • Binding path: errors thrown by env.AI.run are normalized into an
      APICallError across every Workers AI model — chat, embedding, image, speech,
      transcription, and reranking. The Workers AI internal error code is parsed from
      the message (or a numeric code property) and mapped to the documented HTTP
      status (e.g. 3040/3036429, 3007/3008408, 5007400), and
      APICallError derives isRetryable from that status (retryable on
      408/409/429/5xx). Unrecognized errors get no status and stay non-retryable
      (prior behavior). AbortError/TimeoutError cancellations propagate
      unchanged.
    • REST path: non-OK responses now throw an APICallError carrying the real
      statusCode, response headers (so Retry-After is honored), and body, instead
      of a generic Error. The error message keeps the same
      Workers AI API error (<status> <statusText>): <body> shape.

    This means transient capacity/5xx errors are now automatically retried with
    exponential backoff by generateText/streamText (default 2 retries; tune via
    maxRetries). Set maxRetries: 0 to opt out.

Don't miss a new ai release

NewReleases is sending notifications on new releases.