diegosouzapw/OmniRoute v3.8.31 on GitHub

✨ New Features

perf(dashboard): combos UI leaf-split, Next.js config tuning, 1-click Redis & Bifrost sidecar — delivers four of the five performance/UX tracks from the #3932 thread: the combos dashboard page is split into focused leaf components (smaller bundles, faster reloads), next.config is tuned for the standalone build, Redis can be provisioned in one click, and a Bifrost sidecar option is wired in. (The fifth track — chatLogHelpers extraction — was already covered upstream and dropped.) (#4381 — thanks @KooshaPari)

🐛 Fixed

fix(embeddings): NVIDIA NIM asymmetric embedding models inject the required input_type — NVIDIA NIM asymmetric embedders (e.g. nvidia/nv-embedqa-e5-v5) reject requests without an input_type parameter with 400 "'input_type' parameter is required", but OmniRoute only forwarded input_type when the client supplied it — so callers (and OpenAI-style SDKs that don't emit the field) got a hard failure. The embedding registry now carries a model-level default (input_type: "query") for the asymmetric NVIDIA model, and the embeddings handler injects a model's default params into the upstream body only when the client didn't already send them — a client-supplied input_type (e.g. "passage") is respected unchanged, and symmetric models that carry no default are unaffected. (#4341 — thanks @hydraromania)
fix(api): migrate the deprecated Codex [features].codex_hooks flag to [features].hooks — Codex renamed the codex_hooks feature flag to hooks; recent Codex CLI versions ignore the old key and print a deprecation notice. When OmniRoute rewrites an existing ~/.codex/config.toml (configuring/resetting the Codex provider) it now carries the user's intent forward by renaming [features].codex_hooks → [features].hooks (preserving its value, never clobbering an already-present hooks) and dropping the deprecated key. No-op when the flag is absent. (#4342 — thanks @Bian-Sh)
fix(translator): same-format response path no longer leaks a data: null SSE event — the streaming response translator's same-format fast path returned [chunk] unconditionally, so the end-of-stream null/flush signal (chunk === null) propagated as a literal [null]. Downstream this surfaced as an empty data: null SSE event between chunks and crashed strict clients (e.g. Factory Droid BYOK on /v1/responses). The fast path now drops the null flush (returns []) while still passing real chunks through unchanged. (#4344 — thanks @thaitryhand)
fix(translator): strip client-only assistant echo fields on the OpenAI target path (Mistral 422) — strict OpenAI-compatible upstreams (e.g. mistral/codestral-latest) reject client-only assistant "echo" fields sent back as input history with 422 extra_forbidden (the report hit messages[].assistant.reasoning_content via Codex /responses). Only reasoning_content was being stripped on the OpenAI target path; the sibling echo fields reasoning, refusal, annotations and cache_control leaked through and tripped the 422. They are now all dropped on the non-reasoner OpenAI target path. audio is deliberately preserved (OpenAI audio models reference a prior assistant audio response by id on multi-turn; Mistral never emits audio, so nothing is lost there). (#4350 — thanks @xxy9468615)
fix(translator): accept AI SDK-style { type: "image", image: "data:…" } content parts — several OpenAI-input translators only recognized images shaped as image_url.url (or an object with .source/.url), so an AI SDK-style part where image is a bare data-URL string was silently dropped before reaching a vision provider (OpenCode is one affected client; the gap is generic). The OpenAI→Claude, OpenAI→Kiro and OpenAI→Gemini/Antigravity translators now parse a string image data URL into each provider's native image shape (Claude {source:{type:"base64"}}, Kiro images[].source.bytes, Gemini inlineData). (#4345 — thanks @mugnimaestra)
fix(translator): Gemini accepts HTTP/HTTPS image URLs instead of silently dropping them — the OpenAI→Gemini request helper (convertOpenAIContentToParts) discarded remote image_url parts (emitting only a console.warn) because Gemini's inlineData needs base64 and the synchronous helper can't fetch+encode upstream. It now uses Gemini's native fileData: { fileUri } part for HTTP/HTTPS URLs (the model fetches the asset itself), so vision requests carrying a URL — not a data: URI — reach Gemini intact. (#4373 — ported from 9router#344, thanks @diegosouzapw)
fix(executors): strip stream_options for qwen non-streaming / thinking Claude-Code requests — Claude-Code-compatible providers force the executor-level stream flag on while the outgoing body keeps the caller's original stream: false, so DefaultExecutor.transformRequest injected stream_options: { include_usage: true } onto a body that still said stream: false, and qwen rejected it with 400 "'stream_options' only set this when you set stream: true". The executor now strips stream_options whenever the body's effective stream is false. (#4374 — ported from 9router#663, thanks @anuragg-saxenaa / @diegosouzapw)
fix(executors): don't inject thinking when tool_choice forces a tool (native Claude) — the Claude-Code wire-image emulation injects thinking: { type: "adaptive" } for non-Haiku Claude models, but Anthropic rejects thinking when tool_choice forces a specific tool ({type:"any"|"tool"}) with 400 "Thinking may not be enabled when tool_choice forces tool use.". Any Opus/Sonnet call that pins a tool (e.g. Claude Code's message_user, or agent harnesses that force a tool) hit a hard 400; the injection is now suppressed when tool_choice forces a tool. (#4389 — thanks @NomenAK)
fix(codex): request reasoning summaries on Codex Responses requests — Codex/OpenAI Responses can return reasoning-token accounting and empty reasoning items unless visible reasoning summaries are requested, so Codex CLI / pi.dev paths missed visible thinking text. OmniRoute now requests reasoning.summary: "auto" (and includes reasoning.encrypted_content) when reasoning is enabled — preserving an explicit client reasoning.summary and existing include entries, and skipping it for reasoning.effort: "none". (#4359 — thanks @xz-dev)
fix(sse): default the combo per-target timeout to 120s for fast failover — a combo's per-target timeout inherited the full FETCH_TIMEOUT_MS (600s default) when the combo didn't set targetTimeoutMs, so a single hung/slow target (e.g. an openai-compatible upstream returning 524/504) could stall the whole combo for up to 10 minutes before failing over. A new DEFAULT_COMBO_TARGET_TIMEOUT_MS = 120_000 is used as the default-when-unset in resolveComboTargetTimeoutMs (backward-compatible 3rd arg, wired in phaseComboSetup); an explicit ceiling/opt-out is preserved. (#4365 — thanks @diegosouzapw)
fix(cli): Tailscale login honors TAILSCALE_AUTHKEY for non-interactive sign-in — startTailscaleLogin built tailscale up without ever reading process.env.TAILSCALE_AUTHKEY, so on a pre-authenticated / headless daemon the login waited for an interactive auth URL and timed out (~15s). When TAILSCALE_AUTHKEY is set it is now passed via --auth-key= (as a spawn argv element — no shell interpolation) so the daemon authenticates non-interactively; when unset, behavior is unchanged. (#4343 — thanks @ipeterpetrus)
fix(dashboard): OAuth modal shows the real error on a non-JSON server response — the OAuth connect/reauth modal called await res.json() unconditionally, so when a build/OAuth endpoint returned a plain-text error (e.g. a 500 Internal Server Error page) the modal threw Unexpected token 'I'… and hid the real failure. Two shared helpers (parseResponseBody / getErrorMessage in src/shared/utils/api.ts) now read the body safely (JSON when it is JSON, raw text otherwise) and surface a clean message either way; all modal fetch sites use them. (#4351 — thanks @DNNYF)
fix(dashboard): a disabled connection's last error is now visible — the provider card's error badge counts a disabled connection (isActive === false) that has an error (its effective status is still error/expired/unavailable), but the connection row hid the lastError text for disabled rows — so the operator saw the error count without being able to see what failed. The row now shows the error text whenever there is one, regardless of the active toggle. (#4352 — thanks @ntdung6868)
fix(providers): the "Test Connection One-by-One" OAuth probe can no longer hang the queue forever — the OAuth connection-test path called bare fetch(url, { method, headers }) with no AbortController/signal/timeout, so when a provider's probe endpoint accepted the socket but never responded, the awaited fetch never settled and the one-by-one test queue stalled indefinitely (the API-key path was already bounded via validateProviderApiKey's timeoutMs). Both the initial probe and the post-refresh retry are now bounded with AbortSignal.timeout(30s) — matching the API-key path's 30s budget — and a timed-out probe resolves as a failure with a clear Test timed out after 30s message in the same shape as every other test error. (#4347 — thanks @ntdung6868)
fix(providers): a deactivated account is labeled distinctly from a revoked token — a Codex connection whose OAuth refresh is fully healthy but whose ChatGPT account has been deactivated by the provider gets a 401 from the upstream API. The connection test labeled that the same as a bad credential (Token invalid or revoked → upstream_auth_error), so the operator couldn't tell a deactivated account from a revoked token. The test now reads the 401/403 body and, when it indicates account deactivation, classifies it as account_deactivated — which the dashboard already renders as "Account Deactivated". A plain auth 401 is unchanged. (#4353 — thanks @ntdung6868)
fix(db): cascade-delete orphaned model aliases when a provider is removed — deleting a custom provider removed its connections and node but left behind the imported model-alias rows (stored as key=<alias>, value="<providerId>/<model>"). Those stale aliases then blocked re-importing the same provider — the import dedup treated them as "already exists", so no new models appeared. A new deleteModelAliasesForProvider(providerId) DB helper drops every alias whose stored value begins with <providerId>/ (leaving other providers and user-defined settings aliases untouched), and the provider-node DELETE handler now calls it after removing the connections and node, so a fresh import is unblocked. (#4348 — thanks @nguyenvanhuy0612)
fix(api): persist max_input_tokens / max_output_tokens when adding a custom model — POST /api/provider-models silently dropped the per-model token limits set in the "add custom model" form: the handler destructured the rest of the body but never read max_input_tokens / max_output_tokens, and addCustomModel() had no parameter for them, so the values were thrown away on write. The DB layer (inputTokenLimit / outputTokenLimit) and the /v1/models catalog already round-trip these fields — only the write path was missing. The validation schema now accepts the two optional limits, the handler forwards them, and addCustomModel() persists them so a custom model's context/output window survives into the catalog. (#4349 — thanks @codename-zen)
fix(plugin): the OpenCode static-catalog plugin prefixes combo/raw model keys with the provider id — OpenCode's static-catalog reader misdetected the omniroute provider: combo keys emitted as combo/MASTER were parsed as provider combo ("No credentials for provider: omniroute"), while a bare-MASTER form was misread as a model with no resolvable provider, and mixed omniroute/MASTER + bare-raw keys were rejected by OpenCode's schema. The plugin now emits every combo and raw model key prefixed with the omniroute provider id, emits the provider id explicitly, and drops the legacy combo/ prefix — so the static-catalog reader detects the provider and the auth loader returns the right credentials (the catalog-fetch timeout was also raised so a cold-start server doesn't publish an empty stub). (#4384 — thanks @herjarsa)

🔒 Security

fix(security): scope the OAuth callback postMessage to a trusted-origin allowlist — the OAuth callback at /callback previously posted { code, state, … } to window.opener.postMessage(…, "*") whenever the opener was cross-origin, so a hostile page that opened the well-known redirect URI in a popup could receive the OAuth code/state and complete the flow as the user. The wildcard fallback is replaced with iteration over a fixed allowlist (same-origin + Codex's localhost:1455 / 127.0.0.1:1455 loopback helper); the browser silently drops postMessage to any opener whose origin isn't listed. (#4372 — ported from 9router#998, thanks @aeonframework / @diegosouzapw)
fix(mitm): exact host membership in the MITM hosts test (CodeQL false positive) — tests/unit/mitm-tool-hosts.test.ts checked host membership with Array.includes(host), which CodeQL's js/incomplete-url-substring-sanitization heuristic misreads as a String.includes() URL-substring sanitization test (HIGH false positive). Switched to .some((h) => h === host) — identical semantics, no flagged pattern. (#4386)

📝 Maintenance

docs: one-time feature-documentation catch-up (v3.8.20 → v3.8.30) — reconciled the docs with every user-facing feature shipped since v3.8.20: a new README ✨ What's New section; new guides for CLI integrations, MITM TPROXY transparent decrypt and delegated Anthropic Context Editing; refreshed AUTO-COMBO (auto/<category>:<tier> + Arena-ELO), API_REFERENCE (x-omniroute-no-memory), MEMORY (int8 quantization, off-by-default), RESILIENCE (model-lockout success-decay), RTK, AGENTBRIDGE, TRAFFIC_INSPECTOR, GUARDRAILS, CLOUD_AGENT, ENVIRONMENT; regenerated PROVIDER_REFERENCE (231 providers) and synced the provider count in README/CLAUDE/AGENTS. Going forward this runs every release (generate-release step 6b). (#4391)
refactor(chatCore): extract the checkHeapPressureGuard leaf (god-file decomposition start) — first increment of decomposing chatCore.ts (~5127 LOC, the hottest path — every chat request flows through handleChatCore). The V8 heap-pressure guard at the top of handleChatCore (rejects with 503 when heapUsed exceeds the shed threshold) is moved to a self-contained, co-located utils/heapPressure.ts::checkHeapPressureGuard(...) with no behavior change. (#4371 — thanks @diegosouzapw)
refactor(combo): de-dup the exhausted-target skip predicate across both dispatchers — the byte-identical #1731/#1731v2 pre-check (skip a target already exhausted on the provider/connection within a request) lived in both combo dispatchers; extracted to a shared combo/comboPredicates.ts helper. (#4362 — thanks @diegosouzapw)
refactor(combo): de-dup the upstream-error exhaustion classification across both dispatchers — both dispatchers ran a near-identical post-error block classifying the upstream error and updating the exhaustion Sets (#1731 provider exhausted / #1731v2 connection error / transient rate-limited); extracted to a shared combo/targetExhaustion.ts::applyComboTargetExhaustion(...). (#4366 — thanks @diegosouzapw)
chore(cli): localize CLI / scraping copy and stabilize fetch, memory & coverage handling — localizes CLI and scraping UX copy plus the Adapta onboarding tutorial (and corrects the CLI Code page title), makes fetch retries honor the start timeout, tightens SSE/response typing, respects configured memory token limits during search, and reduces CI coverage-merge memory by merging V8 data incrementally. (#4383 — thanks @JxnLexn)
test(combo): reset circuit breakers between stream-readiness cases (restore green) — a stream-readiness fallback case failed on the release branch since the cycle-open tip due to test isolation: earlier combo-dispatch cases in the same file deliberately fail glm (tripping the module-level provider circuit breaker), and that OPEN state leaked into the next test so combo.ts skipped the model. The test now resets the circuit breakers between cases. (#4396 — thanks @diegosouzapw)
chore(quality): reconcile the complexity ratchet baseline (1896 → 1900) — absorbs the small complexity-metric increase from the v3.8.31 /review-prs merge batch into quality-baseline.json so the ratchet reflects the shipped code (no production change). (#4410 — thanks @diegosouzapw)
test/gate: reconcile release-time drift surfaced by the full CI gate — three already-merged changes left the release branch's full-CI gate red (the per-PR fast gates don't run it): the Gemini convertOpenAIContentToParts tests were realigned to the #4373 HTTP/HTTPS-URL fileData pass-through (they still asserted the old warn-and-drop behavior), the t11 any-budget for open-sse/executors/base.ts was raised to 2 with a justification (#4389 compares tool_choice against the string literal "any", not a TS any type), and the #4384 opencode-plugin combos test's net-assert reduction (dropping the obsolete combo/ namespace) was allowlisted. No production behavior change. (thanks @diegosouzapw)

What's Changed

test: clear CodeQL js/incomplete-url-substring-sanitization FP (#660) by @diegosouzapw in #4387
Release v3.8.31 by @diegosouzapw in #4377

Full Changelog: v3.8.30...v3.8.31