[3.8.41] — 2026-06-29
✨ New Features
- feat(relay): selectable relay backend (TS / Bifrost /
auto) — the OpenAI-compatible relay endpoint can now route its hot path through a native Bifrost sidecar without clients changing URLs.OMNIROUTE_RELAY_BACKEND/RELAY_ROUTING_BACKEND=ts | bifrost | auto: defaults to the existing TypeScript relay;autoselects Bifrost whenBIFROST_BASE_URLis set (andBIFROST_ENABLED≠0) and falls back to TS automatically if the sidecar is unreachable;bifrostkeeps strict failure behavior. Auth, per-IP/token rate limits, prompt-injection checks, and model allowlists still run in the Next relay route before dispatch (control plane stays in the app); responses carryX-Routing-Backend/X-Routing-Fallback. Regression guards:tests/unit/api/v1/relay-routing-backend.test.ts,tests/unit/api/v1/bifrost-sidecar.test.ts. (#5315, #5316 — thanks @KooshaPari)
🔧 Bug Fixes
- translator (claude): synthesize a minimal
userturn when an OpenAI→Claude request carries onlysystem/developermessages, so the request stops failing with[400]: messages: at least one message is required.openaiToClaudeRequesthoists every system/developer turn into Claude's top-levelsystemfield and filters them out ofmessages; an all-system input (OpenCode compaction / title-generation requests) leftmessages: [], which the Messages API rejects — surfacing in OpenCode as a mid-taskstream errorthat drops the conversation. The guard fires only whenmessageswould otherwise be empty (system instructions still drive the response), so non-empty requests are unaffected. (#5342 — thanks @wild-feather) - providers (gemini): drop retired Google AI Studio model ids and align the catalog to what the live GenAI API actually serves (verified 2026-06-29 against the official deprecations page). Removes long-retired
gemini-1.5-pro/gemini-1.5-flash, the shut-downgemini-2.0-flash/gemini-2.0-flash-lite, and dead experimentals; renamesgemini-3.1-flash-lite-preview→ the GAgemini-3.1-flash-lite; swaps the retiredtext-embedding-004for the livegemini-embedding-001/gemini-embedding-2; and adds gracefulmodelDeprecationforwards so legacy/renamed ids redirect to the GA model instead of 404ing. Native AI-Studio-direct image/video/music registration is intentionally out of scope (needs real executor work; those models stay reachable via Antigravity/Vertex/aggregators). (#5337 — thanks @backryun) - services (dashboard): fix the embedded-services dashboard failures (#5298) — service supervisors are now lazily initialized from
/api/services/[name]/logssocliproxy/9routerlogs no longer 404 before bootstrap registers a supervisor; lifecycle buttons send JSON (empty install bodies default toversion: "latest", malformed JSON still returns400 Invalid JSON body); lifecycle and log-stream failures surface as actionable UI errors instead of silently showing no logs; Tailscale CGNAT100.64.0.0/10peers count as private-LAN local for local-only service access; a parent/dashboard/context→/dashboard/context/settingsredirect stops RSC prefetch 404s; and/api/v1/providers/{cliproxyapi,9router}/modelsreturn synced embedded-service models instead ofinvalid_provider. (#5299, #5298 — thanks @KooshaPari) - thinking (claude): fix three independent defects in Claude adaptive-thinking on the OpenAI-compatible path (Cursor → Claude OAuth). (A) the dashboard Thinking-Budget setting was dropped on every restart —
setThinkingBudgetConfigwas never called at boot, so a saved{mode:"adaptive"…}silently reverted to passthrough; it's now hydrated from settings inserver-init. (B) the Claude executor force-injected adaptive thinking after translation, ignoring the operator's budget — it now honorsmode:"auto"(strip) while keeping the default (passthrough) behavior byte-identical so native Claude Code is unaffected, and remaps an operatorthinking.type:"enabled"to theadaptiveshape Opus 4.7/4.8 require (enabled→ 400). (D) on replay, signature-lessreasoning_contentwas reconstructed as athinkingblock carrying a fabricated signature → Anthropic400 "Invalid signature in thinking block"; it now emits a signature-lessredacted_thinkingblock (real signatures are still preserved verbatim). Regression guards:tests/unit/thinking-budget-hydration-5312.test.ts,base-thinking-budget-config-5312.test.ts,openai-to-claude-redacted-replay-5312.test.ts(existing #5123/#4479/#2454 suites stay green). The</think>content-marker channel mismatch (RC-C, shared with #5245) is tracked as a follow-up pending a live Anthropic validation. (#5312 — thanks @vitalNohj) - opencode (proxy pool): the OpenCode Free per-account proxy modal now offers the global Proxy Pool dropdown (by-id reference) instead of forcing manual Host/Port/credentials on every account — Gap 1 of #5217. A Saved / Custom toggle: "Saved" picks a pre-saved proxy from
GET /api/settings/proxiesand stores{fingerprint, proxyId}, so updating that pool proxy applies to every account using it; "Custom" keeps the manual inputs (stored inline) as an escape hatch. Resolution happens server-side (resolveAccountProxiesFromRegistry) so the executor still receives a resolved proxy unchanged; existing inline entries keep working and an unknown/deletedproxyIddegrades safely to direct. Regression guards:tests/unit/noauth-proxy-resolution.test.ts,tests/unit/ui/noauth-account-card.test.tsx. (#5217 Gap 1 — thanks @daniij) - thinking (claude): let reasoning_content-native clients (e.g. Cursor) opt out of the
</think>close-marker so it no longer leaks an orphan</think>into visiblecontent(RC-C of #5312, shared with #5245). The marker-suppression machinery already existed (UA allowlist, #5348) but Cursor's UA was deliberately excluded; this adds an explicit request headerx-omniroute-thinking-marker: off(alsoon/keepto force-keep) that overrides the UA policy. With the header absent the behavior is byte-identical — Claude Code/Cursor-composer clients that scancontentfor the marker (#4633) still receive it. Regression guard:tests/unit/think-close-marker-suppress-5245.test.ts(#5123 case-b + #4479 stay green). (#5312, #5245 — thanks @vitalNohj, @wild-feather) - cors: browser/Electron clients (e.g. Wayland AI) can now use OmniRoute as an OpenAI-compatible provider out-of-the-box. The token-authenticated API surface (
/v1/*,/v1beta/*) now returns a permissiveAccess-Control-Allow-Origin(echoes the requestOrigin,*when absent) by default — matching 9router and the OpenAI-compatible ecosystem — so a rendererfetchcan read the response instead of failing CORS-blocked as "site not found" / empty catalog (whilecurl, which sends no preflight, worked). This is safe: those routes auth viaAuthorization/x-api-keyheaders browsers never auto-attach (no credentialed-session/CSRF exposure), andAccess-Control-Allow-Credentialsis never paired with the echo/wildcard. Cookie-authed MANAGEMENT/dashboard routes stay exactly fail-closed;CORS_ALLOW_ALL/CORS_ALLOWED_ORIGINSstill take precedence. Regression guards:tests/unit/cors/origins.test.ts,tests/unit/authz/pipeline.test.ts. (Bug 2 of #5242 — thanks @jonlwheat2-gif) - grok-web: forward the Cloudflare clearance cookies and stop mislabeling IP-reputation blocks as a bad cookie. "Check cookie" returned
Invalid SSO cookieeven with a valid, complete browser session — but the cookie parser was never the problem (it robustly extractssso/sso-rwfrom a full DevTools header). Two real gaps fixed: (1)buildGrokCookieHeadernow forwardscf_clearanceand__cf_bmwhen pasted (it dropped them before; AIClient2API forwards them too) — strictly additive, a baressoblob still yields exactlysso=…; (2) when the user supplied acf_clearance, a 401 / invalid-credentials-403 from grok.com is now surfaced as an IP-reputation/anti-bot block (cf_clearance is IP+TLS+UA-pinned and can't be replayed from a different machine) instead of the misleading "Invalid SSO cookie — re-paste". A bare cookie with no clearance still gets the re-paste hint. Regression guards inweb-cookie-auth.test.ts+provider-validation-specialty.test.ts. (#5350 — thanks @SeaXen) - cli (serve): opt-in native HTTPS/TLS for
omniroute serve— so strict-CSP Electron apps and browsers can reach OmniRoute overhttps://instead of plainhttp://localhost. Provide--tls-cert <path> --tls-key <path>(orOMNIROUTE_TLS_CERT/OMNIROUTE_TLS_KEY) and the standalone server terminates TLS on the same listener (no extra port/proxy); WebSocket upgrade (live dashboard +/v1streaming) works overwss://unchanged sincehttps.Server extends http.Server. With no TLS flags the HTTP path is byte-identical to before; only one of cert/key, or an unreadable path, logs a warning and stays HTTP (never half-enables, never crashes). Auto-generated self-signed certs for localhost are a follow-up; for now provide an explicit cert/key (or front OmniRoute with a TLS terminator). Regression guard:tests/unit/tls-options.test.ts. (Bug 1C of #5242 — thanks @jonlwheat2-gif) - opencode/observability: make OpenCode Free account/proxy rotation visible and fix two real defects surfaced alongside it. (1) the per-request rotation selection log (
dispatch via account … through proxy …) wasdebug(hidden at defaultAPP_LOG_LEVEL=info) — promoted toinfoso the shuffle/cooldown lifecycle is auditable (token stays masked). (2)[ProxyEgress]reportedproxy=directeven when an account proxy was applied, because the egress logger ran outside the executor's nested proxy context — the effective applied proxy is now captured (via an applied-proxy sink threaded through the proxy AsyncLocalStorage) and reflected in the egress log. (3)[callLogs] too many SQL variables—deleteCallLogRowsByIdsdeleted up to 5000 ids in oneIN (…), exceeding SQLite's ~999 bound-param cap and aborting log trimming/retention; ids are now chunked (≤500 per statement). Regression guards:tests/unit/call-log-trim-sql-vars-5217.test.ts,apply-executor-proxy-info-5217.test.ts, extendedopencode-proxy-rotation-4954.test.ts. The Proxy Pool dropdown (by-id) UI (Gap 1) is a follow-up requiring browser validation. (#5217 — thanks @daniij) - chatgpt-web: wire tool/function calling into the
chatgpt-webprovider. It was the only web-session executor that never readbody.tools— both response builders hardcodedfinish_reason:"stop"and emitted only content, so tool calls were silently dropped (the model answered in prose). It now uses the sharedwebToolsprompt-emulation shim (a<tool>-contract system message +<tool>{…}</tool>response parsing) exactly like its 9 sibling executors (qwen-web, perplexity-web, …) — it was simply omitted from the #3259 rollout. Tool mode buffers and emitstool_calls+finish_reason:"tool_calls"(gated off the image-gen path); plain chat is unchanged. Regression guard:tests/unit/chatgpt-web-tools-5240.test.ts. (#5240 — thanks @Rougler) - oauth/dashboard: fix the persistent/false Antigravity "Token Expired" badge (continuation of #3679/#3850). Two causes: (1) new OAuth connections never set
tokenExpiresAt(onlyexpiresAt), so the dashboard badge — which preferstokenExpiresAt || expiresAt— fell back to the original grant clock and could flash a false "Token Expired" until the first background refresh. Creation now mirrorsexpiresAtintotokenExpiresAtacross all 5 OAuth create paths (a sharedbuildOAuthConnectionCreatePayload), consistent with every refresh path which already writes both. (2) when a refresh-capable connection has no usable refresh token, the health-check sweep silently skipped it, leavingtestStatus="active"forever while the cosmetic badge showed expired; it now surfaces a terminaltestStatus="expired"("needs re-auth"), tightly gated so it never clobbers non-refresh providers or already-terminal/cooldown states. Regression guards:tests/unit/oauth-connection-tokenexpiresat-5326.test.ts,tests/unit/token-health-no-refresh-token-expired-5326.test.ts. (#5326) - routing: auto-disable a depleted API key on upstream
402 "Insufficient account balance"for API Key Round-Robin connections (multiple keys in one connection'sextraApiKeys). The per-connection path already terminalized 402 (→credits_exhausted), but the per-KEY health tracker (recordKeyHealthStatus) only recorded failures for401, so a 402-depleted key stayed in rotation and kept getting retried. Now a 402 marks the current key invalid immediately (terminal — balance won't recover mid-session) via a newrecordKeyTerminal, so the rotator skips it and falls over to the next healthy key; the state persists across restarts. Also addedinsufficient balance/insufficient_balance/insufficient account balanceto the credits-exhausted body signals so non-402 out-of-credit responses terminalize too. Regression guard:tests/unit/key-health-402-disable-5239.test.ts. (#5239 — thanks @muflifadla38) - cli:
omniroute serveno longer discards a user-setNODE_OPTIONS=--max-old-space-size=…. It used to unconditionally overwriteNODE_OPTIONS(and pass an explicit--max-old-space-sizeCLI arg) with the calibrated default, so a user who exported--max-old-space-size=8192still ran at the old cap and OOM'd (#5238 reporter set 8192, crashed at ~505MB). Now it mirrors the Electron and standalone launchers: ifNODE_OPTIONSalready pins the heap, that value wins (and the duplicate CLI arg is suppressed); otherwise the calibrated--max-old-space-sizeis appended, preserving unrelated flags. Regression guard:tests/unit/serve-node-options-preserve-5238.test.ts. (Defect C of #5238; theb.mask/OOM-root parts are tracked separately.) - dashboard: restore the
{active}/{total} activemodel-count badge in a provider's Available Models toolbar (provider detail page). It was dropped during the v3.8.13 god-file decomposition (#3327) — theModelVisibilityToolbarstill receivedactiveCount/totalCountbut they were orphaned as unused_-prefixed params and the rendering<span>was never carried over (themodelsActiveCounti18n key stayed). Re-wired the existing props to the existing key; zero data-layer or i18n change. Regression guard:modelVisibilityToolbarActiveCount.test.tsx. (#5264) - rerank:
/v1/rerankno longer rejects SiliconFlow and DeepInfra Qwen3-Reranker models with400 "Invalid rerank model"even though/v1/modelslists them. The model-ID parser was never the problem (it already splits on the first slash, sosiliconflow/Qwen/Qwen3-Reranker-8Bparses correctly) —siliconflowanddeepinfrawere just missing from the rerank provider registry. Added both: SiliconFlow as Cohere-compatible, DeepInfra via a newdeepinfraadapter (model in the URL pathPOST /v1/inference/<model>,{queries,documents}request, positional{scores}response mapped to Cohereresults[]). Regression guard:tests/unit/rerank-providers-5332.test.ts. (#5332 — thanks @maikokan) - authz/dashboard: stop rejecting every dashboard mutation with
403 INVALID_ORIGINwhen the dashboard is reached over a LAN IP / non-localhost host. The origin-pinning check (#5278) only accepted the configured*_PUBLIC_BASE_URL(typicallyhttp://localhost:20128) plus the internalrequest.urlorigin — which Next.js standalone reports as the bind host, not the realHost. So opening the dashboard at e.g.http://192.168.0.15:20128made the browser's same-originOriginmatch no candidate, and every POST/PUT/DELETE (save API key, save provider, test connection) failed while GETs still worked. Two fixes: (a) the requestHost(or a trustedX-Forwarded-Host) is now accepted as a valid mutation origin, gated by two independent checks — the token-stamped socket peer must be loopback/private-LAN and the Host itself must be a loopback/private-LAN IP literal, so a DNS-rebinding domain (which classifies asremote) can never become a trusted origin and the protocol is pinned to the actual connection; (b) theINVALID_ORIGINresponse now carries an actionable message (setOMNIROUTE_PUBLIC_BASE_URL) and the dashboard surfaces API error.messagevia a sharedextractApiErrorMessagehelper instead of rendering the raw error object. Regression guards:tests/unit/authz/public-origin.test.ts(direct LAN/loopback + DNS-rebinding defense),tests/unit/api-error-message-5340.test.ts. (#5340)
📝 Maintenance
- chore(dead-code): repo-wide sweep of unused exported symbols and a matching dead-code baseline ratchet — trimmed unused exported helpers, validation/settings/encryption-config schemas, utility/domain/static-constant/formatting helpers, runtime test helpers, the request-timeout fetch wrapper, event-bus, semantic-cache (maintenance + expiry), correlation-middleware, MCP-scope, service-registry, build-profile, api-key-format, authz-class, models.dev-context, embedding-cache, provider-limits-scheduler, search-validator, webhook-example, agent-skills-repo-URL and command-code-auth-cleanup exports. Pure dead-code removal validated by
typecheck:core(no remaining referencing site) — no behavior change. (#5321, #5322, #5324, #5325, #5328, #5329, #5330, #5331, #5333, #5334, #5335, #5336, #5338, #5339, #5353, #5354, #5355, #5356, #5357, #5359, #5362, #5364, #5365, #5366, #5368, #5369, #5371 — thanks @JxnLexn)
What's Changed
- Release v3.8.41 by @diegosouzapw in #5327
Full Changelog: v3.8.40...v3.8.41