diegosouzapw/OmniRoute v3.8.41 on GitHub

[3.8.41] — 2026-06-29

✨ New Features

feat(relay): selectable relay backend (TS / Bifrost / auto) — the OpenAI-compatible relay endpoint can now route its hot path through a native Bifrost sidecar without clients changing URLs. OMNIROUTE_RELAY_BACKEND / RELAY_ROUTING_BACKEND = ts | bifrost | auto: defaults to the existing TypeScript relay; auto selects Bifrost when BIFROST_BASE_URL is set (and BIFROST_ENABLED ≠ 0) and falls back to TS automatically if the sidecar is unreachable; bifrost keeps strict failure behavior. Auth, per-IP/token rate limits, prompt-injection checks, and model allowlists still run in the Next relay route before dispatch (control plane stays in the app); responses carry X-Routing-Backend / X-Routing-Fallback. Regression guards: tests/unit/api/v1/relay-routing-backend.test.ts, tests/unit/api/v1/bifrost-sidecar.test.ts. (#5315, #5316 — thanks @KooshaPari)

🔧 Bug Fixes

translator (claude): synthesize a minimal user turn when an OpenAI→Claude request carries only system/developer messages, so the request stops failing with [400]: messages: at least one message is required. openaiToClaudeRequest hoists every system/developer turn into Claude's top-level system field and filters them out of messages; an all-system input (OpenCode compaction / title-generation requests) left messages: [], which the Messages API rejects — surfacing in OpenCode as a mid-task stream error that drops the conversation. The guard fires only when messages would otherwise be empty (system instructions still drive the response), so non-empty requests are unaffected. (#5342 — thanks @wild-feather)
providers (gemini): drop retired Google AI Studio model ids and align the catalog to what the live GenAI API actually serves (verified 2026-06-29 against the official deprecations page). Removes long-retired gemini-1.5-pro/gemini-1.5-flash, the shut-down gemini-2.0-flash/gemini-2.0-flash-lite, and dead experimentals; renames gemini-3.1-flash-lite-preview → the GA gemini-3.1-flash-lite; swaps the retired text-embedding-004 for the live gemini-embedding-001/gemini-embedding-2; and adds graceful modelDeprecation forwards so legacy/renamed ids redirect to the GA model instead of 404ing. Native AI-Studio-direct image/video/music registration is intentionally out of scope (needs real executor work; those models stay reachable via Antigravity/Vertex/aggregators). (#5337 — thanks @backryun)
services (dashboard): fix the embedded-services dashboard failures (#5298) — service supervisors are now lazily initialized from /api/services/[name]/logs so cliproxy/9router logs no longer 404 before bootstrap registers a supervisor; lifecycle buttons send JSON (empty install bodies default to version: "latest", malformed JSON still returns 400 Invalid JSON body); lifecycle and log-stream failures surface as actionable UI errors instead of silently showing no logs; Tailscale CGNAT 100.64.0.0/10 peers count as private-LAN local for local-only service access; a parent /dashboard/context → /dashboard/context/settings redirect stops RSC prefetch 404s; and /api/v1/providers/{cliproxyapi,9router}/models return synced embedded-service models instead of invalid_provider. (#5299, #5298 — thanks @KooshaPari)
thinking (claude): fix three independent defects in Claude adaptive-thinking on the OpenAI-compatible path (Cursor → Claude OAuth). (A) the dashboard Thinking-Budget setting was dropped on every restart — setThinkingBudgetConfig was never called at boot, so a saved {mode:"adaptive"…} silently reverted to passthrough; it's now hydrated from settings in server-init. (B) the Claude executor force-injected adaptive thinking after translation, ignoring the operator's budget — it now honors mode:"auto" (strip) while keeping the default (passthrough) behavior byte-identical so native Claude Code is unaffected, and remaps an operator thinking.type:"enabled" to the adaptive shape Opus 4.7/4.8 require (enabled → 400). (D) on replay, signature-less reasoning_content was reconstructed as a thinking block carrying a fabricated signature → Anthropic 400 "Invalid signature in thinking block"; it now emits a signature-less redacted_thinking block (real signatures are still preserved verbatim). Regression guards: tests/unit/thinking-budget-hydration-5312.test.ts, base-thinking-budget-config-5312.test.ts, openai-to-claude-redacted-replay-5312.test.ts (existing #5123/#4479/#2454 suites stay green). The </think> content-marker channel mismatch (RC-C, shared with #5245) is tracked as a follow-up pending a live Anthropic validation. (#5312 — thanks @vitalNohj)
opencode (proxy pool): the OpenCode Free per-account proxy modal now offers the global Proxy Pool dropdown (by-id reference) instead of forcing manual Host/Port/credentials on every account — Gap 1 of #5217. A Saved / Custom toggle: "Saved" picks a pre-saved proxy from GET /api/settings/proxies and stores {fingerprint, proxyId}, so updating that pool proxy applies to every account using it; "Custom" keeps the manual inputs (stored inline) as an escape hatch. Resolution happens server-side (resolveAccountProxiesFromRegistry) so the executor still receives a resolved proxy unchanged; existing inline entries keep working and an unknown/deleted proxyId degrades safely to direct. Regression guards: tests/unit/noauth-proxy-resolution.test.ts, tests/unit/ui/noauth-account-card.test.tsx. (#5217 Gap 1 — thanks @daniij)
thinking (claude): let reasoning_content-native clients (e.g. Cursor) opt out of the </think> close-marker so it no longer leaks an orphan </think> into visible content (RC-C of #5312, shared with #5245). The marker-suppression machinery already existed (UA allowlist, #5348) but Cursor's UA was deliberately excluded; this adds an explicit request header x-omniroute-thinking-marker: off (also on/keep to force-keep) that overrides the UA policy. With the header absent the behavior is byte-identical — Claude Code/Cursor-composer clients that scan content for the marker (#4633) still receive it. Regression guard: tests/unit/think-close-marker-suppress-5245.test.ts (#5123 case-b + #4479 stay green). (#5312, #5245 — thanks @vitalNohj, @wild-feather)
cors: browser/Electron clients (e.g. Wayland AI) can now use OmniRoute as an OpenAI-compatible provider out-of-the-box. The token-authenticated API surface (/v1/*, /v1beta/*) now returns a permissive Access-Control-Allow-Origin (echoes the request Origin, * when absent) by default — matching 9router and the OpenAI-compatible ecosystem — so a renderer fetch can read the response instead of failing CORS-blocked as "site not found" / empty catalog (while curl, which sends no preflight, worked). This is safe: those routes auth via Authorization/x-api-key headers browsers never auto-attach (no credentialed-session/CSRF exposure), and Access-Control-Allow-Credentials is never paired with the echo/wildcard. Cookie-authed MANAGEMENT/dashboard routes stay exactly fail-closed; CORS_ALLOW_ALL/CORS_ALLOWED_ORIGINS still take precedence. Regression guards: tests/unit/cors/origins.test.ts, tests/unit/authz/pipeline.test.ts. (Bug 2 of #5242 — thanks @jonlwheat2-gif)
grok-web: forward the Cloudflare clearance cookies and stop mislabeling IP-reputation blocks as a bad cookie. "Check cookie" returned Invalid SSO cookie even with a valid, complete browser session — but the cookie parser was never the problem (it robustly extracts sso/sso-rw from a full DevTools header). Two real gaps fixed: (1) buildGrokCookieHeader now forwards cf_clearance and __cf_bm when pasted (it dropped them before; AIClient2API forwards them too) — strictly additive, a bare sso blob still yields exactly sso=…; (2) when the user supplied a cf_clearance, a 401 / invalid-credentials-403 from grok.com is now surfaced as an IP-reputation/anti-bot block (cf_clearance is IP+TLS+UA-pinned and can't be replayed from a different machine) instead of the misleading "Invalid SSO cookie — re-paste". A bare cookie with no clearance still gets the re-paste hint. Regression guards in web-cookie-auth.test.ts + provider-validation-specialty.test.ts. (#5350 — thanks @SeaXen)
cli (serve): opt-in native HTTPS/TLS for omniroute serve — so strict-CSP Electron apps and browsers can reach OmniRoute over https:// instead of plain http://localhost. Provide --tls-cert <path> --tls-key <path> (or OMNIROUTE_TLS_CERT/OMNIROUTE_TLS_KEY) and the standalone server terminates TLS on the same listener (no extra port/proxy); WebSocket upgrade (live dashboard + /v1 streaming) works over wss:// unchanged since https.Server extends http.Server. With no TLS flags the HTTP path is byte-identical to before; only one of cert/key, or an unreadable path, logs a warning and stays HTTP (never half-enables, never crashes). Auto-generated self-signed certs for localhost are a follow-up; for now provide an explicit cert/key (or front OmniRoute with a TLS terminator). Regression guard: tests/unit/tls-options.test.ts. (Bug 1C of #5242 — thanks @jonlwheat2-gif)
opencode/observability: make OpenCode Free account/proxy rotation visible and fix two real defects surfaced alongside it. (1) the per-request rotation selection log (dispatch via account … through proxy …) was debug (hidden at default APP_LOG_LEVEL=info) — promoted to info so the shuffle/cooldown lifecycle is auditable (token stays masked). (2) [ProxyEgress] reported proxy=direct even when an account proxy was applied, because the egress logger ran outside the executor's nested proxy context — the effective applied proxy is now captured (via an applied-proxy sink threaded through the proxy AsyncLocalStorage) and reflected in the egress log. (3) [callLogs] too many SQL variables — deleteCallLogRowsByIds deleted up to 5000 ids in one IN (…), exceeding SQLite's ~999 bound-param cap and aborting log trimming/retention; ids are now chunked (≤500 per statement). Regression guards: tests/unit/call-log-trim-sql-vars-5217.test.ts, apply-executor-proxy-info-5217.test.ts, extended opencode-proxy-rotation-4954.test.ts. The Proxy Pool dropdown (by-id) UI (Gap 1) is a follow-up requiring browser validation. (#5217 — thanks @daniij)
chatgpt-web: wire tool/function calling into the chatgpt-web provider. It was the only web-session executor that never read body.tools — both response builders hardcoded finish_reason:"stop" and emitted only content, so tool calls were silently dropped (the model answered in prose). It now uses the shared webTools prompt-emulation shim (a <tool>-contract system message + <tool>{…}</tool> response parsing) exactly like its 9 sibling executors (qwen-web, perplexity-web, …) — it was simply omitted from the #3259 rollout. Tool mode buffers and emits tool_calls + finish_reason:"tool_calls" (gated off the image-gen path); plain chat is unchanged. Regression guard: tests/unit/chatgpt-web-tools-5240.test.ts. (#5240 — thanks @Rougler)
oauth/dashboard: fix the persistent/false Antigravity "Token Expired" badge (continuation of #3679/#3850). Two causes: (1) new OAuth connections never set tokenExpiresAt (only expiresAt), so the dashboard badge — which prefers tokenExpiresAt || expiresAt — fell back to the original grant clock and could flash a false "Token Expired" until the first background refresh. Creation now mirrors expiresAt into tokenExpiresAt across all 5 OAuth create paths (a shared buildOAuthConnectionCreatePayload), consistent with every refresh path which already writes both. (2) when a refresh-capable connection has no usable refresh token, the health-check sweep silently skipped it, leaving testStatus="active" forever while the cosmetic badge showed expired; it now surfaces a terminal testStatus="expired" ("needs re-auth"), tightly gated so it never clobbers non-refresh providers or already-terminal/cooldown states. Regression guards: tests/unit/oauth-connection-tokenexpiresat-5326.test.ts, tests/unit/token-health-no-refresh-token-expired-5326.test.ts. (#5326)
routing: auto-disable a depleted API key on upstream 402 "Insufficient account balance" for API Key Round-Robin connections (multiple keys in one connection's extraApiKeys). The per-connection path already terminalized 402 (→ credits_exhausted), but the per-KEY health tracker (recordKeyHealthStatus) only recorded failures for 401, so a 402-depleted key stayed in rotation and kept getting retried. Now a 402 marks the current key invalid immediately (terminal — balance won't recover mid-session) via a new recordKeyTerminal, so the rotator skips it and falls over to the next healthy key; the state persists across restarts. Also added insufficient balance/insufficient_balance/insufficient account balance to the credits-exhausted body signals so non-402 out-of-credit responses terminalize too. Regression guard: tests/unit/key-health-402-disable-5239.test.ts. (#5239 — thanks @muflifadla38)
cli: omniroute serve no longer discards a user-set NODE_OPTIONS=--max-old-space-size=…. It used to unconditionally overwrite NODE_OPTIONS (and pass an explicit --max-old-space-size CLI arg) with the calibrated default, so a user who exported --max-old-space-size=8192 still ran at the old cap and OOM'd (#5238 reporter set 8192, crashed at ~505MB). Now it mirrors the Electron and standalone launchers: if NODE_OPTIONS already pins the heap, that value wins (and the duplicate CLI arg is suppressed); otherwise the calibrated --max-old-space-size is appended, preserving unrelated flags. Regression guard: tests/unit/serve-node-options-preserve-5238.test.ts. (Defect C of #5238; the b.mask/OOM-root parts are tracked separately.)
dashboard: restore the {active}/{total} active model-count badge in a provider's Available Models toolbar (provider detail page). It was dropped during the v3.8.13 god-file decomposition (#3327) — the ModelVisibilityToolbar still received activeCount/totalCount but they were orphaned as unused _-prefixed params and the rendering <span> was never carried over (the modelsActiveCount i18n key stayed). Re-wired the existing props to the existing key; zero data-layer or i18n change. Regression guard: modelVisibilityToolbarActiveCount.test.tsx. (#5264)
rerank: /v1/rerank no longer rejects SiliconFlow and DeepInfra Qwen3-Reranker models with 400 "Invalid rerank model" even though /v1/models lists them. The model-ID parser was never the problem (it already splits on the first slash, so siliconflow/Qwen/Qwen3-Reranker-8B parses correctly) — siliconflow and deepinfra were just missing from the rerank provider registry. Added both: SiliconFlow as Cohere-compatible, DeepInfra via a new deepinfra adapter (model in the URL path POST /v1/inference/<model>, {queries,documents} request, positional {scores} response mapped to Cohere results[]). Regression guard: tests/unit/rerank-providers-5332.test.ts. (#5332 — thanks @maikokan)
authz/dashboard: stop rejecting every dashboard mutation with 403 INVALID_ORIGIN when the dashboard is reached over a LAN IP / non-localhost host. The origin-pinning check (#5278) only accepted the configured *_PUBLIC_BASE_URL (typically http://localhost:20128) plus the internal request.url origin — which Next.js standalone reports as the bind host, not the real Host. So opening the dashboard at e.g. http://192.168.0.15:20128 made the browser's same-origin Origin match no candidate, and every POST/PUT/DELETE (save API key, save provider, test connection) failed while GETs still worked. Two fixes: (a) the request Host (or a trusted X-Forwarded-Host) is now accepted as a valid mutation origin, gated by two independent checks — the token-stamped socket peer must be loopback/private-LAN and the Host itself must be a loopback/private-LAN IP literal, so a DNS-rebinding domain (which classifies as remote) can never become a trusted origin and the protocol is pinned to the actual connection; (b) the INVALID_ORIGIN response now carries an actionable message (set OMNIROUTE_PUBLIC_BASE_URL) and the dashboard surfaces API error .message via a shared extractApiErrorMessage helper instead of rendering the raw error object. Regression guards: tests/unit/authz/public-origin.test.ts (direct LAN/loopback + DNS-rebinding defense), tests/unit/api-error-message-5340.test.ts. (#5340)

📝 Maintenance

chore(dead-code): repo-wide sweep of unused exported symbols and a matching dead-code baseline ratchet — trimmed unused exported helpers, validation/settings/encryption-config schemas, utility/domain/static-constant/formatting helpers, runtime test helpers, the request-timeout fetch wrapper, event-bus, semantic-cache (maintenance + expiry), correlation-middleware, MCP-scope, service-registry, build-profile, api-key-format, authz-class, models.dev-context, embedding-cache, provider-limits-scheduler, search-validator, webhook-example, agent-skills-repo-URL and command-code-auth-cleanup exports. Pure dead-code removal validated by typecheck:core (no remaining referencing site) — no behavior change. (#5321, #5322, #5324, #5325, #5328, #5329, #5330, #5331, #5333, #5334, #5335, #5336, #5338, #5339, #5353, #5354, #5355, #5356, #5357, #5359, #5362, #5364, #5365, #5366, #5368, #5369, #5371 — thanks @JxnLexn)

What's Changed

Release v3.8.41 by @diegosouzapw in #5327

Full Changelog: v3.8.40...v3.8.41