✨ New Features
- perf(dashboard): combos UI leaf-split, Next.js config tuning, 1-click Redis & Bifrost sidecar — delivers four of the five performance/UX tracks from the #3932 thread: the combos dashboard page is split into focused leaf components (smaller bundles, faster reloads),
next.configis tuned for the standalone build, Redis can be provisioned in one click, and a Bifrost sidecar option is wired in. (The fifth track — chatLogHelpers extraction — was already covered upstream and dropped.) (#4381 — thanks @KooshaPari)
🐛 Fixed
- fix(embeddings): NVIDIA NIM asymmetric embedding models inject the required
input_type— NVIDIA NIM asymmetric embedders (e.g.nvidia/nv-embedqa-e5-v5) reject requests without aninput_typeparameter with400 "'input_type' parameter is required", but OmniRoute only forwardedinput_typewhen the client supplied it — so callers (and OpenAI-style SDKs that don't emit the field) got a hard failure. The embedding registry now carries a model-level default (input_type: "query") for the asymmetric NVIDIA model, and the embeddings handler injects a model's default params into the upstream body only when the client didn't already send them — a client-suppliedinput_type(e.g."passage") is respected unchanged, and symmetric models that carry no default are unaffected. (#4341 — thanks @hydraromania) - fix(api): migrate the deprecated Codex
[features].codex_hooksflag to[features].hooks— Codex renamed thecodex_hooksfeature flag tohooks; recent Codex CLI versions ignore the old key and print a deprecation notice. When OmniRoute rewrites an existing~/.codex/config.toml(configuring/resetting the Codex provider) it now carries the user's intent forward by renaming[features].codex_hooks→[features].hooks(preserving its value, never clobbering an already-presenthooks) and dropping the deprecated key. No-op when the flag is absent. (#4342 — thanks @Bian-Sh) - fix(translator): same-format response path no longer leaks a
data: nullSSE event — the streaming response translator's same-format fast path returned[chunk]unconditionally, so the end-of-stream null/flush signal (chunk === null) propagated as a literal[null]. Downstream this surfaced as an emptydata: nullSSE event between chunks and crashed strict clients (e.g. Factory Droid BYOK on/v1/responses). The fast path now drops the null flush (returns[]) while still passing real chunks through unchanged. (#4344 — thanks @thaitryhand) - fix(translator): strip client-only assistant echo fields on the OpenAI target path (Mistral 422) — strict OpenAI-compatible upstreams (e.g.
mistral/codestral-latest) reject client-only assistant "echo" fields sent back as input history with422 extra_forbidden(the report hitmessages[].assistant.reasoning_contentvia Codex/responses). Onlyreasoning_contentwas being stripped on the OpenAI target path; the sibling echo fieldsreasoning,refusal,annotationsandcache_controlleaked through and tripped the 422. They are now all dropped on the non-reasoner OpenAI target path.audiois deliberately preserved (OpenAI audio models reference a prior assistant audio response by id on multi-turn; Mistral never emits audio, so nothing is lost there). (#4350 — thanks @xxy9468615) - fix(translator): accept AI SDK-style
{ type: "image", image: "data:…" }content parts — several OpenAI-input translators only recognized images shaped asimage_url.url(or an object with.source/.url), so an AI SDK-style part whereimageis a bare data-URL string was silently dropped before reaching a vision provider (OpenCode is one affected client; the gap is generic). The OpenAI→Claude, OpenAI→Kiro and OpenAI→Gemini/Antigravity translators now parse a stringimagedata URL into each provider's native image shape (Claude{source:{type:"base64"}}, Kiroimages[].source.bytes, GeminiinlineData). (#4345 — thanks @mugnimaestra) - fix(translator): Gemini accepts HTTP/HTTPS image URLs instead of silently dropping them — the OpenAI→Gemini request helper (
convertOpenAIContentToParts) discarded remoteimage_urlparts (emitting only aconsole.warn) because Gemini'sinlineDataneeds base64 and the synchronous helper can't fetch+encode upstream. It now uses Gemini's nativefileData: { fileUri }part for HTTP/HTTPS URLs (the model fetches the asset itself), so vision requests carrying a URL — not adata:URI — reach Gemini intact. (#4373 — ported from 9router#344, thanks @diegosouzapw) - fix(executors): strip
stream_optionsfor qwen non-streaming / thinking Claude-Code requests — Claude-Code-compatible providers force the executor-levelstreamflag on while the outgoing body keeps the caller's originalstream: false, soDefaultExecutor.transformRequestinjectedstream_options: { include_usage: true }onto a body that still saidstream: false, and qwen rejected it with400 "'stream_options' only set this when you set stream: true". The executor now stripsstream_optionswhenever the body's effectivestreamis false. (#4374 — ported from 9router#663, thanks @anuragg-saxenaa / @diegosouzapw) - fix(executors): don't inject
thinkingwhentool_choiceforces a tool (native Claude) — the Claude-Code wire-image emulation injectsthinking: { type: "adaptive" }for non-Haiku Claude models, but Anthropic rejectsthinkingwhentool_choiceforces a specific tool ({type:"any"|"tool"}) with400 "Thinking may not be enabled when tool_choice forces tool use.". Any Opus/Sonnet call that pins a tool (e.g. Claude Code'smessage_user, or agent harnesses that force a tool) hit a hard 400; the injection is now suppressed whentool_choiceforces a tool. (#4389 — thanks @NomenAK) - fix(codex): request reasoning summaries on Codex Responses requests — Codex/OpenAI Responses can return reasoning-token accounting and empty reasoning items unless visible reasoning summaries are requested, so Codex CLI / pi.dev paths missed visible thinking text. OmniRoute now requests
reasoning.summary: "auto"(and includesreasoning.encrypted_content) when reasoning is enabled — preserving an explicit clientreasoning.summaryand existingincludeentries, and skipping it forreasoning.effort: "none". (#4359 — thanks @xz-dev) - fix(sse): default the combo per-target timeout to 120s for fast failover — a combo's per-target timeout inherited the full
FETCH_TIMEOUT_MS(600s default) when the combo didn't settargetTimeoutMs, so a single hung/slow target (e.g. an openai-compatible upstream returning 524/504) could stall the whole combo for up to 10 minutes before failing over. A newDEFAULT_COMBO_TARGET_TIMEOUT_MS = 120_000is used as the default-when-unset inresolveComboTargetTimeoutMs(backward-compatible 3rd arg, wired inphaseComboSetup); an explicit ceiling/opt-out is preserved. (#4365 — thanks @diegosouzapw) - fix(cli): Tailscale login honors
TAILSCALE_AUTHKEYfor non-interactive sign-in —startTailscaleLoginbuilttailscale upwithout ever readingprocess.env.TAILSCALE_AUTHKEY, so on a pre-authenticated / headless daemon the login waited for an interactive auth URL and timed out (~15s). WhenTAILSCALE_AUTHKEYis set it is now passed via--auth-key=(as a spawn argv element — no shell interpolation) so the daemon authenticates non-interactively; when unset, behavior is unchanged. (#4343 — thanks @ipeterpetrus) - fix(dashboard): OAuth modal shows the real error on a non-JSON server response — the OAuth connect/reauth modal called
await res.json()unconditionally, so when a build/OAuth endpoint returned a plain-text error (e.g. a500 Internal Server Errorpage) the modal threwUnexpected token 'I'…and hid the real failure. Two shared helpers (parseResponseBody/getErrorMessageinsrc/shared/utils/api.ts) now read the body safely (JSON when it is JSON, raw text otherwise) and surface a clean message either way; all modal fetch sites use them. (#4351 — thanks @DNNYF) - fix(dashboard): a disabled connection's last error is now visible — the provider card's error badge counts a disabled connection (
isActive === false) that has an error (its effective status is still error/expired/unavailable), but the connection row hid thelastErrortext for disabled rows — so the operator saw the error count without being able to see what failed. The row now shows the error text whenever there is one, regardless of the active toggle. (#4352 — thanks @ntdung6868) - fix(providers): the "Test Connection One-by-One" OAuth probe can no longer hang the queue forever — the OAuth connection-test path called bare
fetch(url, { method, headers })with noAbortController/signal/timeout, so when a provider's probe endpoint accepted the socket but never responded, the awaited fetch never settled and the one-by-one test queue stalled indefinitely (the API-key path was already bounded viavalidateProviderApiKey'stimeoutMs). Both the initial probe and the post-refresh retry are now bounded withAbortSignal.timeout(30s)— matching the API-key path's 30s budget — and a timed-out probe resolves as a failure with a clearTest timed out after 30smessage in the same shape as every other test error. (#4347 — thanks @ntdung6868) - fix(providers): a deactivated account is labeled distinctly from a revoked token — a Codex connection whose OAuth refresh is fully healthy but whose ChatGPT account has been deactivated by the provider gets a
401from the upstream API. The connection test labeled that the same as a bad credential (Token invalid or revoked→upstream_auth_error), so the operator couldn't tell a deactivated account from a revoked token. The test now reads the401/403body and, when it indicates account deactivation, classifies it asaccount_deactivated— which the dashboard already renders as "Account Deactivated". A plain auth401is unchanged. (#4353 — thanks @ntdung6868) - fix(db): cascade-delete orphaned model aliases when a provider is removed — deleting a custom provider removed its connections and node but left behind the imported model-alias rows (stored as
key=<alias>,value="<providerId>/<model>"). Those stale aliases then blocked re-importing the same provider — the import dedup treated them as "already exists", so no new models appeared. A newdeleteModelAliasesForProvider(providerId)DB helper drops every alias whose stored value begins with<providerId>/(leaving other providers and user-defined settings aliases untouched), and the provider-node DELETE handler now calls it after removing the connections and node, so a fresh import is unblocked. (#4348 — thanks @nguyenvanhuy0612) - fix(api): persist
max_input_tokens/max_output_tokenswhen adding a custom model —POST /api/provider-modelssilently dropped the per-model token limits set in the "add custom model" form: the handler destructured the rest of the body but never readmax_input_tokens/max_output_tokens, andaddCustomModel()had no parameter for them, so the values were thrown away on write. The DB layer (inputTokenLimit/outputTokenLimit) and the/v1/modelscatalog already round-trip these fields — only the write path was missing. The validation schema now accepts the two optional limits, the handler forwards them, andaddCustomModel()persists them so a custom model's context/output window survives into the catalog. (#4349 — thanks @codename-zen) - fix(plugin): the OpenCode static-catalog plugin prefixes combo/raw model keys with the provider id — OpenCode's static-catalog reader misdetected the
omnirouteprovider: combo keys emitted ascombo/MASTERwere parsed as providercombo("No credentials for provider: omniroute"), while a bare-MASTERform was misread as a model with no resolvable provider, and mixedomniroute/MASTER+ bare-raw keys were rejected by OpenCode's schema. The plugin now emits every combo and raw model key prefixed with theomnirouteprovider id, emits the provider id explicitly, and drops the legacycombo/prefix — so the static-catalog reader detects the provider and the auth loader returns the right credentials (the catalog-fetch timeout was also raised so a cold-start server doesn't publish an empty stub). (#4384 — thanks @herjarsa)
🔒 Security
- fix(security): scope the OAuth callback
postMessageto a trusted-origin allowlist — the OAuth callback at/callbackpreviously posted{ code, state, … }towindow.opener.postMessage(…, "*")whenever the opener was cross-origin, so a hostile page that opened the well-known redirect URI in a popup could receive the OAuth code/state and complete the flow as the user. The wildcard fallback is replaced with iteration over a fixed allowlist (same-origin + Codex'slocalhost:1455/127.0.0.1:1455loopback helper); the browser silently dropspostMessageto any opener whose origin isn't listed. (#4372 — ported from 9router#998, thanks @aeonframework / @diegosouzapw) - fix(mitm): exact host membership in the MITM hosts test (CodeQL false positive) —
tests/unit/mitm-tool-hosts.test.tschecked host membership withArray.includes(host), which CodeQL'sjs/incomplete-url-substring-sanitizationheuristic misreads as aString.includes()URL-substring sanitization test (HIGH false positive). Switched to.some((h) => h === host)— identical semantics, no flagged pattern. (#4386)
📝 Maintenance
- docs: one-time feature-documentation catch-up (v3.8.20 → v3.8.30) — reconciled the docs with every user-facing feature shipped since v3.8.20: a new README ✨ What's New section; new guides for CLI integrations, MITM TPROXY transparent decrypt and delegated Anthropic Context Editing; refreshed AUTO-COMBO (
auto/<category>:<tier>+ Arena-ELO), API_REFERENCE (x-omniroute-no-memory), MEMORY (int8 quantization, off-by-default), RESILIENCE (model-lockout success-decay), RTK, AGENTBRIDGE, TRAFFIC_INSPECTOR, GUARDRAILS, CLOUD_AGENT, ENVIRONMENT; regenerated PROVIDER_REFERENCE (231 providers) and synced the provider count in README/CLAUDE/AGENTS. Going forward this runs every release (generate-release step 6b). (#4391) - refactor(chatCore): extract the
checkHeapPressureGuardleaf (god-file decomposition start) — first increment of decomposingchatCore.ts(~5127 LOC, the hottest path — every chat request flows throughhandleChatCore). The V8 heap-pressure guard at the top ofhandleChatCore(rejects with 503 whenheapUsedexceeds the shed threshold) is moved to a self-contained, co-locatedutils/heapPressure.ts::checkHeapPressureGuard(...)with no behavior change. (#4371 — thanks @diegosouzapw) - refactor(combo): de-dup the exhausted-target skip predicate across both dispatchers — the byte-identical
#1731/#1731v2pre-check (skip a target already exhausted on the provider/connection within a request) lived in both combo dispatchers; extracted to a sharedcombo/comboPredicates.tshelper. (#4362 — thanks @diegosouzapw) - refactor(combo): de-dup the upstream-error exhaustion classification across both dispatchers — both dispatchers ran a near-identical post-error block classifying the upstream error and updating the exhaustion Sets (
#1731provider exhausted /#1731v2connection error / transient rate-limited); extracted to a sharedcombo/targetExhaustion.ts::applyComboTargetExhaustion(...). (#4366 — thanks @diegosouzapw) - chore(cli): localize CLI / scraping copy and stabilize fetch, memory & coverage handling — localizes CLI and scraping UX copy plus the Adapta onboarding tutorial (and corrects the CLI Code page title), makes fetch retries honor the start timeout, tightens SSE/response typing, respects configured memory token limits during search, and reduces CI coverage-merge memory by merging V8 data incrementally. (#4383 — thanks @JxnLexn)
- test(combo): reset circuit breakers between stream-readiness cases (restore green) — a stream-readiness fallback case failed on the release branch since the cycle-open tip due to test isolation: earlier combo-dispatch cases in the same file deliberately fail
glm(tripping the module-level provider circuit breaker), and that OPEN state leaked into the next test socombo.tsskipped the model. The test now resets the circuit breakers between cases. (#4396 — thanks @diegosouzapw) - chore(quality): reconcile the complexity ratchet baseline (1896 → 1900) — absorbs the small complexity-metric increase from the v3.8.31
/review-prsmerge batch intoquality-baseline.jsonso the ratchet reflects the shipped code (no production change). (#4410 — thanks @diegosouzapw) - test/gate: reconcile release-time drift surfaced by the full CI gate — three already-merged changes left the release branch's full-CI gate red (the per-PR fast gates don't run it): the Gemini
convertOpenAIContentToPartstests were realigned to the #4373 HTTP/HTTPS-URLfileDatapass-through (they still asserted the old warn-and-drop behavior), thet11any-budget foropen-sse/executors/base.tswas raised to 2 with a justification (#4389 comparestool_choiceagainst the string literal"any", not a TSanytype), and the #4384 opencode-plugin combos test's net-assert reduction (dropping the obsoletecombo/namespace) was allowlisted. No production behavior change. (thanks @diegosouzapw)
What's Changed
- test: clear CodeQL js/incomplete-url-substring-sanitization FP (#660) by @diegosouzapw in #4387
- Release v3.8.31 by @diegosouzapw in #4377
Full Changelog: v3.8.30...v3.8.31