github cloudflare/ai workers-ai-provider@3.2.0

11 hours ago

Minor Changes

  • #573 4f19489 Thanks @threepointone! - Add AI Gateway routing for third-party catalog models to createWorkersAI, with capability-driven transport selection, the full provider registry, a bring-your-own-provider wrapper, typed errors, and client/server fallback.

    Experimental. This is a substantial new surface for the package — well beyond its original job of wrapping Workers AI — and several behaviors rely on undocumented AI Gateway internals (the cf-aig-run-id resume buffer, per-provider run-path wire formats). Treat the entire third-party / gateway surface as experimental: the API may change, and provider coverage maturity varies (only the run-catalog providers are live-verified end-to-end). It does not affect the existing stable Workers AI / AI Search APIs.

    createWorkersAI is the single public entry point. Pass an optional providers array (wire-format plugins from the sub-paths below). When set, a "<provider>/<model>" catalog slug passed to the provider (or .chat) is routed through AI Gateway automatically, while @cf/... ids continue to build Workers AI models. Each slug is resolved against a registry of every AI Gateway provider, and the transport is picked from the requested options: the run path (env.AI.run) for resumable streaming (cf-aig-run-id, the default, on the unified-billing run catalog), or the gateway path (env.AI.gateway(id).run([…])) for BYOK providers, server-side fallback, and caching. Incompatible option combinations (e.g. resume: true with fallback.mode: "server", or resume/transport: "run" on a BYOK provider) throw a clear GatewayDelegateError; resume-disabling combinations warn loudly. This is fully additive: leaving providers unset preserves the prior behavior exactly, and passing a catalog slug without it throws a helpful error. The chat factory's settings argument is typed from the model id literal — a "<provider>/<model>" slug autocompletes DelegateCallOptions, while a @cf/... id autocompletes WorkersAIChatSettings. gateway is optional for catalog routing — when unset, requests use the account's "default" AI Gateway; set gateway (here or per call) to target a specific one.

    New sub-path exports:

    • workers-ai-provider/openai, workers-ai-provider/anthropic, workers-ai-provider/google — provider plugins keyed by wire format. One openai plugin serves the OpenAI-compatible long tail (deepseek, xai/grok, groq, mistral, perplexity, cerebras, openrouter, fireworks) plus the unified-catalog chat providers alibaba (Qwen) and minimax. @ai-sdk/openai, @ai-sdk/anthropic, and @ai-sdk/google are optional peer dependencies; install only the ones whose wire formats you use. The openai plugin is required for the run path (see below). Providers whose gateway-path URL isn't reproducible from the shared builder (cohere, baseten, parallel, azure-openai, google-vertex) and provider-native/non-chat providers are bring-your-own-provider only.
    • workers-ai-provider/gatewaycreateGatewayFetch / createGatewayProvider wrap any @ai-sdk/* provider so its traffic flows through AI Gateway (provider id detected from the request URL, or set explicitly). Use it for provider-native or non-chat providers the slug routing can't auto-wire (bedrock, replicate, audio/image), or for full control of the underlying provider.

    The transport types, error classes (WorkersAIGatewayError, WorkersAIFallbackError, GatewayDelegateError), the registry helpers, DelegateCallOptions, and createResumableStream are re-exported from the package root.

    Features:

    • Provider registry (GATEWAY_PROVIDERS, findProviderBySlug, detectProviderByUrl) maps slugs to gateway provider ids, wire formats, billing model, and run-catalog membership. Covers every provider in the AI Gateway directory (OpenAI, Anthropic, Google AI Studio/Vertex, xAI, Groq, DeepSeek, Mistral, Perplexity, Cerebras, OpenRouter, Cohere, Baseten, Parallel, Azure OpenAI, Amazon Bedrock, HuggingFace, Replicate, Fal, Ideogram, Cartesia, Deepgram, ElevenLabs — plus Fireworks), with URL host patterns so createGatewayFetch auto-detects each from the wrapped provider's request URL. Also includes the unified-catalog chat providers alibaba (Qwen) and minimax on the resumable run catalog (verified live: OpenAI-wire, cf-aig-run-id on streams); these are run-path only (gatewayPath: false — not native gateway providers), so caching, server-side fallback, and transport: "gateway" are rejected with a clear GatewayDelegateError instead of failing upstream.
    • Metadata & loggingmetadata (custom log attributes for spend attribution) and collectLog are first-class call options on both transports. On the run path they fold into the typed gateway options; on the gateway path they become cf-aig-metadata / cf-aig-collect-log headers (bigint metadata values are coerced to strings). Call-level metadata merges over (and wins against) any metadata set via gateway: { metadata }.
    • BYOK — set byok: true (+ supply the key via extraHeaders) to forward the upstream provider key on the gateway path; otherwise provider auth headers are stripped so unified billing / the gateway's stored key applies.
    • Client-side fallback (fallback.mode: "client") keeps resume per leg — a failed pre-stream dispatch falls through to the next model; if all fail, a WorkersAIFallbackError carries the per-attempt tree. Server-side fallback (fallback.mode: "server") routes same-vendor fallbacks through the gateway path.
    • Typed errorsWorkersAIGatewayError (with a coarse code, a recoverable hint, and the parsed CF/provider envelope) and WorkersAIFallbackError (attempt tree). Helpers classifyStatus / extractErrorMessage are exported.
    • Abort + gateway options are passed through on both transports.

    On the run path, the response stream is wrapped so a transient mid-stream drop reconnects through the gateway resume endpoint (resume?from=N) transparently — the @ai-sdk parser never sees the break. from is an SSE event index, so the wrapper emits only complete events and realigns on the boundary after a drop (no duplicated or truncated bytes). When the gateway buffer expires (404, ~5.5 min TTL), an onResumeExpired policy controls whether the stream errors ("error", the default) or ends with partial output ("accept-partial").

    For cross-invocation recovery (e.g. a new Durable Object invocation after eviction), createResumableStream is exported and accepts no initial body plus a fromEvent offset — it re-attaches by resuming directly from that event index. An onProgress(eventOffset) callback (also surfaced on the delegate as a call option) reports the live SSE event offset so callers can persist { runId, eventOffset } and re-attach later.

    Run-path wire format (per-provider): on the resumable run path (env.AI.run), Cloudflare's unified catalog normalizes most providers to OpenAI chat-completions wire (so google/… is parsed with the openai plugin on the run path, even though the gateway path uses the native google plugin), but passes Anthropic through natively (content[].text, native tool shape) — so anthropic/… is parsed with the anthropic plugin on both paths. The registry records this as runWireFormat (defaults to "openai"). Include openai for the openai-wire run-path providers (openai, google, xai/grok, groq) and anthropic to use anthropic/…; the delegate throws a clear GatewayDelegateError naming the exact plugin a transport needs if it's missing.

Patch Changes

  • #563 231c19b Thanks @slegarraga! - Validate file parts in chat messages before sending them to Workers AI.

    Previously every file part in a user message was unconditionally wrapped as
    an image_url, regardless of its mediaType. Non-image files (e.g.
    application/pdf, audio/*, video/*, application/octet-stream) were
    forwarded as if they were valid vision inputs, and a missing mediaType
    silently defaulted to image/png, producing a corrupt data URL.

    Now convertToWorkersAIChatMessages:

    • throws an UnsupportedFunctionalityError when a file part has a
      non-image/* mediaType, or no mediaType at all, instead of forwarding
      broken multimodal content;
    • matches the image/ prefix case-insensitively (per RFC 2045), so media
      types such as IMAGE/JPEG are accepted while the caller's original casing
      is preserved in the emitted data URL;
    • preserves the provided image mediaType instead of defaulting missing
      media types to image/png.

    This is a behavior change: inputs that previously "succeeded" with broken or
    defaulted media types now throw a clear, catchable error. Type-correct callers
    (the AI SDK always sets mediaType on file parts) are unaffected for valid
    image inputs.

  • #575 65e0735 Thanks @threepointone! - Map the AI SDK's forced single-tool choice to the documented named-function form.

    Previously toolChoice: { type: "tool", toolName } was downgraded to
    tool_choice: "required" (with the tool list filtered to the single function).
    Workers AI treats "required" as advisory: on long contexts and reasoning
    models (e.g. @cf/google/gemma-4-26b-a4b-it, @cf/qwen/qwq-32b,
    @cf/qwen/qwen3-30b-a3b-fp8) the model would "fail open" and answer in prose
    instead of calling the requested tool.

    Now the provider sends the OpenAI-style named-function form
    tool_choice: { type: "function", function: { name } }, which Workers AI
    enforces server-side, and keeps the full tool list (matching OpenAI semantics
    and preserving tool-result context fidelity).

    Note: forcing a tool on a reasoning model with insufficient max_tokens is
    validated server-side and now surfaces as a clear error (Workers AI 8006)
    rather than silently producing no tool call.

    Additionally, recover forced tool calls that gpt-oss models leak as text.
    When a tool is forced, gpt-oss (harmony format) sometimes emits the tool call
    as raw JSON in message.content with an empty tool_calls array and
    finish_reason: "stop". The provider now detects this — only when a tool was
    forced and the leaked JSON's name matches a requested tool — and
    reinterprets it as a structured tool call (with finishReason: "tool-calls"
    and a warning), across both generateText and streamText. Ambiguous leaks
    (harmony channel/role names, hallucinated names) are left untouched to avoid
    fabricating bogus calls.

  • #570 104c4a7 Thanks @threepointone! - Refresh Workers AI model references from the deprecated @cf/moonshotai/kimi-k2.5 to the current @cf/moonshotai/kimi-k2.7-code in the README and inline source documentation.

  • #576 a360e7a Thanks @threepointone! - Keep structured-output name/description instead of dropping them on native Workers AI models.

    Output.object({ schema, name, description }) and generateObject({ schema, schemaName, schemaDescription }) pass a name/description alongside the JSON
    schema. On the native @cf/... path the provider previously forwarded only the
    bare schema as response_format.json_schema and silently discarded both.

    Native Workers AI expects json_schema to be a bare JSON Schema, not
    OpenAI's { name, schema, strict } envelope, so we can't just wrap it (that
    would break native models). Instead the name is folded into the schema's
    standard title keyword and the description into its description keyword —
    the payload stays a valid bare schema while the guidance reaches the model.
    Existing schema-level title/description are never overwritten and the input
    schema is not mutated.

    Note on issue #559: the reported failure was OpenAI partner models (e.g.
    openai/gpt-5.4-mini) rejecting requests with Missing required parameter: 'response_format.json_schema.name'. Partner-model slugs are no longer handled
    by this code path at all — they route through the AI Gateway delegate and the
    real @ai-sdk/* providers, which build the required json_schema.name envelope
    themselves (configure them via createWorkersAI({ binding, providers: [openai] })). This change covers the remaining native-model gap where that guidance was
    being dropped.

    See #559.

Don't miss a new ai release

NewReleases is sending notifications on new releases.