diegosouzapw/OmniRoute v3.8.38 on GitHub

✨ New Features

feat(sidebar): colored menu icons — sidebar menu icons now render with a per-item accent color: curated colors for known items (SIDEBAR_ICON_ACCENTS) plus a deterministic hash-based fallback (getSidebarIconAccent) so every item gets a stable, distinct color across sessions. (#3812 — thanks @rafacpti23)
feat(providers): add Factory (factory.ai) as a subscription gateway provider — factory (Factory Droids' hosted gateway) is now a first-class routing provider on the OpenAI-compatible https://api.factory.ai/v1 endpoint with Bearer apikey auth; the key is supplied from the Dashboard connection (not env). (#5065 — thanks @KooshaPari)
feat(providers): add Grok Build (xAI) provider with OAuth import-token flow — grok-cli (alias gc) routes through Grok's CLI chat proxy; users paste their ~/.grok/auth.json (or the JWT), with automatic refresh_token rotation. The public xAI client_id is embedded via resolvePublicCred("grok_id") (Hard Rule #11), never a literal. (#5020 — thanks @fulorgnas)
feat(dashboard): click-to-edit model alias in the provider page — click an alias to edit it inline (Enter/blur saves, Escape cancels), instead of only being able to delete and re-add it. (#5119 — thanks @waguriagentic)
feat(providers): add ZenMux Free (session-cookie free-tier) provider — zenmux-free (alias zmf) with a dedicated executor translating ZenMux's Anthropic-style SSE to OpenAI format; ships 12 free-tier models (DeepSeek V3.2, GLM 4.7 Flash Free, etc.). (#5105 — thanks @mrnasil)
feat(providers): allow local/private provider URLs by default (Allow Local Provider URLs flag) — adding/validating an OpenAI-compatible provider on a loopback/LAN address (e.g. http://127.0.0.1:3264/api) was rejected by the SSRF guard with "Blocked private or local provider URL", even though OmniRoute is local-first. A new OMNIROUTE_ALLOW_LOCAL_PROVIDER_URLS feature flag (default ON, toggle in Settings → Feature Flags) now scopes the provider-validation guard to allow local/private hosts while still blocking cloud-metadata endpoints (169.254.169.254, metadata.google.internal). Disable it to restore strict public-only blocking. Webhook/remote-image SSRF defaults are unchanged. (#5066, thanks @daniij)
feat(blackbox): refresh provider model catalog with latest models. (thanks @ptkelanatechsolutions)
kiro: inline <thinking> stream splitter — when <thinking_mode>enabled</thinking_mode> is present, assistantResponseEvent content is now split into separate delta.content / delta.reasoning_content SSE chunks (new open-sse/executors/kiroThinking.ts module wired into KiroExecutor.transformEventStreamToSSE).
feat(cursor): parse Cursor Composer DeepSeek-style inline tool calls — Composer cu/composer-2.5* models embed tool invocations in their visible text using <｜tool▁calls▁begin｜>…<｜tool▁calls▁end｜> markers instead of structured protobuf frames; a new streaming parser (composerToolCalls.ts) intercepts these in both streaming and non-streaming paths, suppresses the markers from the client-visible content, and emits proper OpenAI tool_calls deltas so downstream clients handle them natively. (thanks @noestelar)
feat(proxy): support auth-less host:port batch import and surface proxy-test failures. (thanks @dimaslanjaka)
feat(video): Alibaba DashScope video provider (wan2.7-t2v) — adds the alibaba video provider (DashScope async task → poll → MP4) wired through the standard apikey credential path, so text-to-video requests can route to Alibaba's wan2.7-t2v model. (thanks @josevictorferreira)
feat(cc): per-connection "summarized thinking display" toggle for Claude-Code-compatible providers — exposes a connection-level toggle that drives the existing Copilot summarized-thinking marker, so operators can opt a CC-compatible connection into summarized reasoning display from the UI (schema + request defaults + provider modals, with i18n). (thanks @rdself)
feat(compression): compression playground in the studio (Play + Compare tabs) — /dashboard/compression/studio gains a synthetic playground: paste text → per-engine lanes (each deterministic engine run alone via /api/compression/preview) plus a combined waterfall ordered by stackPriority, and a free A/B Compare grid with on-demand, USD-capped fidelity verdicts (/api/compression/compare + compare/verify). The preview route now uses the real cl100k tokenizer, returns engineBreakdown, and accepts an ordered pipeline[]; new compare / compare/verify / retrieve routes; the live WS feed moved to /dashboard/compression/live. Management-only. (#5080)
feat(dashboard): expose Fusion judgeModel + fusionTuning in the combo editor — the Fusion strategy editor now surfaces the judge model (synthesizes the panel answers; defaults to the first panel model) plus the quorum-grace tuning fields (minPanel, stragglerGraceMs, panelHardTimeoutMs) that open-sse/services/fusion.ts already reads. Schema-validated + bounded; empty tuning is never persisted. (#5074)
feat(compression): opt-in per-step fidelity gate for the stacked pipeline — each compression step can now be guarded by a pure fidelity checker (4 invariants, fail-open) so a lossy engine that would degrade the prompt past a threshold is rejected and its lane skipped instead of silently shipping. Configurable via fidelityGate (advanced thresholds intentionally API-omitted), with a per-lane rejection breakdown surfaced in the studio playground toggle. (#5143)
feat(compression): fuzzy near-duplicate dedup (session-dedup 2nd pass) — the session-dedup engine gains a second fuzzy pass that collapses near-duplicate (not just byte-identical) segments, with a playground toggle to compare on/off. (#5143)
feat(quota): opt-in Codex/Claude auto-ping keepalive — an opt-in background keepalive can periodically ping Codex/Claude connections to keep their session/quota state warm, reducing cold-start failures on the first real request. (#5102)
feat(ops): SRE playbooks + ops helper scripts — salvaged from a closed stale PR; adds operator runbooks and ops helper scripts. (#5138 — thanks @KooshaPari / @diegosouzapw)
feat(mcp): web-session robustness — cookie dedup + browser-pool observability — the MCP web-session path now de-duplicates cookies when (re)hydrating a session (avoiding conflicting duplicate Cookie headers) and exposes browser-pool observability (pool size / in-use / acquisition metrics) for the headless web providers. (#5121, builds on #3368)
feat(compression): Ionizer engine — lossy JSON-array sampling reversible via CCR — a new compression engine that down-samples large JSON arrays to a representative subset and records a Compact Change Representation (CCR) so the omitted rows can be reconstructed, trading exactness for a large token reduction on tabular/array-heavy payloads. (#5148)

🔧 Bug Fixes

fix(proxy): make the SOCKS5 handshake timeout operator-tunable (SOCKS_HANDSHAKE_TIMEOUT_MS) — under high concurrency against a single residential gateway host, the SOCKS5 connect handshake could exceed the hardcoded 10s even though the proxy was reachable, surfacing as a false [Proxy Fast-Fail] Proxy unreachable (the pool size is already tunable via OMNIROUTE_PROXY_DISPATCHER_CONNECTIONS). The handshake timeout now reads SOCKS_HANDSHAKE_TIMEOUT_MS (default unchanged at 10000, capped at 120000) so a concurrency-heavy deployment can raise it without a code change. Mitigation for #5109 (the full concurrency-100 collapse still needs the reporter's live load-test confirmation). (#5109)
fix(api): resolve GET /v1/models/{id} case-insensitively — clients that normalise the model id (e.g. OpenCode requesting minimax/minimax-m3 for the canonical catalog entry minimax/MiniMax-M3) missed the single-model lookup, which is case-sensitive, and fell back to advertising context_length: 0. findModelById now prefers an exact-case match and falls back to a case-insensitive match, so the real entry (and its context window) is returned regardless of casing. (#5082)
fix(services): embed WS proxy honours LIVE_WS_HOST; reject empty messages early — two headless/Docker deployment fixes (#5110). The embed WebSocket proxy (:20131) only read EMBED_WS_PROXY_HOST, so behind a reverse proxy/tunnel it stayed bound to 127.0.0.1 even with LIVE_WS_HOST=0.0.0.0 set and the Live dashboard showed "WebSocket disconnected"; it now falls back to LIVE_WS_HOST (default still loopback). Separately, a request with an explicitly empty messages: [] array was forwarded upstream and bounced back as a confusing raw 400/502; handleChat now rejects it up front with a clear messages: at least one message is required (Responses-API input requests are unaffected). (#5110)
fix(proxy): repair one-click Deno & Cloudflare relay deployments — the /api/settings/proxy/test endpoint only recognized the vercel relay type, so testing a deployed Deno or Cloudflare relay returned proxy.type must be http, https, or socks5 and never reached the relay; it now routes all relay types through isRelayType(). On installs with STORAGE_ENCRYPTION_KEY the relay-auth token is read via extractRelayAuth (encrypted relayAuthEnc form), fixing the silent 401 that left publicIp null. The Cloudflare Worker upload now sends the script part as application/javascript (the API rejects application/javascript+module; ES-module semantics come from main_module), and the proxy-registry schema accepts the deno/cloudflare types + deno-relay/cloudflare-relay sources so editing a deployed relay no longer 400s. (#5128)
fix(kiro): retire claude-sonnet-4.5 from the Kiro catalog + pin the exact Kiro 400 error — claude-sonnet-4.5 left the Kiro free-tier lineup (current active models: Opus 4.8/4.7/4.6, Sonnet 4.6, Haiku 4.5), so it is removed from the Kiro registry entry and the free-model catalog. A regression test now pins Kiro's verbatim [400] Invalid model. Please select a different model to continue. to the isModelUnavailableError model-unavailable classification. A 400 on every model (including current ones) points to a server-side Kiro tier/region gate, not an OmniRoute catalog bug. (#5140, closes #4484)
fix(dashboard): preserve every rendered field when loading/saving Resilience settings — ResilienceTab renders comboCooldownWait and quotaShareConcurrencyLimit, but both the initial-load and save paths rewrote component state without those fields, so after a successful /api/resilience response the cards received undefined and the page fell back to the generic "failed to load" state. A shared toResilienceResponse() mapper now keeps all rendered fields, and PATCH /api/resilience returns quotaShareConcurrencyLimit to match GET and the UI contract. (#5139 — thanks @rdself)
fix(quota): hydrate the in-memory quota cache from snapshots + scope auto-combo candidates — after a restart the quota cache was empty, so a known-exhausted connection looked healthy until re-queried; isAccountQuotaExhausted now lazily hydrates from persisted quota_snapshots. Auto-combo candidate expansion is also scoped to the connections each combo target actually allows, instead of pulling in every connection for the provider. (#5015 — thanks @JxnLexn)
fix(resilience): harden quota cutoff, Gemini audio MIME, and model-lockout cooldown — stored quota hard-cutoff values are no longer coerced to enabled=true from arbitrary strings; Gemini audio input parts have their MIME type validated/normalized before forwarding; and model lockout now honours the configured maxCooldownMs ceiling. (#5093 — thanks @KooshaPari)
fix(streaming): harden long OpenAI-compatible SSE streams — a late pipeline-wind-down error can no longer overwrite an already-recorded successful stream (streamCompletionRecorded guard), client disconnects finalize as 499 client_disconnected instead of poisoning provider/account failure state, JSON bodies that are actually SSE (wrong application/json content-type) are sniffed and re-streamed, and reasoning fields (reasoning/reasoning_content + OpenRouter/Gemini encrypted reasoning_details) are preserved through the JSON-as-SSE fallback. (#5124 — thanks @rdself)
fix(usage): dedupe request-usage logging and debounce stats events — saveRequestUsage now guards against duplicate inserts (natural key: timestamp + provider + model + connection + api-key + token counts), back-fills a missing endpoint, and only emits usageRecorded when a row was actually inserted; stats update/pending event bursts are collapsed into a single debounced notification to reduce churn. (#4940 — thanks @nguyenxvotanminh3)
fix(sse): convert the native Gemini request body to OpenAI format in the Antigravity MITM handler — contents / systemInstruction / generationConfig / thinkingConfig are now translated to OpenAI chat-completions format before forwarding to /v1/chat/completions, so thinking-capable models (e.g. ag/claude-opus-4-6-thinking) no longer fail with provider-side 400 "invalid argument" errors. (#4845 — thanks @anuragg-saxenaa)
fix(db): translate the two pt-BR SQLite driver-fallback log lines to English — [DB] Pré-inicializando sql.js WASM… and [DB] Drivers síncronos indisponíveis… were the only non-English server log strings, mixing languages in the logs. Now [DB] Pre-initializing sql.js WASM (synchronous drivers unavailable)… / [DB] Synchronous drivers unavailable — falling back to sql.js (WASM), guarded by a test that scans the driver path for accented log strings. (#5103)
fix(diagnostics): non-streaming Claude responses no longer false-502 as empty_choices — the v3.8.37 malformed-200 detector (#4942) only understood OpenAI choices and Responses-API output shapes, so a /v1/messages response that stays in Claude shape ({type:"message", content:[…]}) fell through to empty_choices → 502 (cascading to "All models failed" in a combo). Most visibly, an extended-thinking turn whose buffered body is a single empty thinking block with a valid signature (Claude Code's non-streaming Bash classifier) 502'd on every call. detectMalformedNonStream now understands the Claude shape: text/tool_use blocks and thinking blocks carrying a signature count as valid output, while a genuinely empty content:[] is still flagged. (#5108, thanks @insoln)
fix(combo): empty-content 502 now fails over within the same request instead of exhausting the provider — a leg that answers HTTP 200 with no usable completion is rewritten to 502 "Provider returned empty content", but the combo exhaustion classifier treated that synthetic 502 as a connection-level failure (#1731v2) and marked the whole provider/connection exhausted, skipping every remaining same-provider leg in that request. The connection is actually healthy (it just returned an empty body), so empty-content 502s are now classified as model-level transient failures: the request advances to the next leg and the rest of that provider's legs stay eligible. Genuine gateway 502s still trip connection exhaustion. (#5085, thanks @andrea-kingautomation)
fix(dashboard): surface the detailed credential-validation error instead of a bare "invalid" badge — the inline "Check" in the Add-Connection modal discarded the error message returned by /api/providers/validate and showed only an invalid badge. For web providers (claude-web / chatgpt-web) the real cause is often an environment error the backend already reports (e.g. TLS impersonation client failed to start: EACCES … mkdir tls-client-node/bin), so users were left guessing. The modal now renders the full reason next to the badge. (#5088, thanks @tkhs101)
fix(executors): strip client_metadata from forwarded body for Cerebras and Mistral — Cerebras returns 400 (wrong_api_format) and Mistral returns 422 (extra_forbidden) when the passthrough body carries client_metadata (an OpenAI Codex / Claude CLI field with no equivalent on these upstreams). The default executor now drops it for these two providers before sending downstream; other providers (notably openai/codex) keep it. (thanks @saurabh321gupta)
fix(codebuddy): only send reasoning params when the client requests reasoning. (thanks @anki1kr)
fix(sse): keep streaming for forceStream providers when a JSON client requests it. Providers marked forceStream:true reject stream:false upstream (HTTP 400); resolveStreamFlag now guards against this so stream-only providers keep streaming even when the client sends Accept: application/json or stream:false. (thanks @anki1kr)
fix(sse): prevent non-JSON SSE lines and duplicate [DONE] from breaking clients. (thanks @qianze0628)
fix(sse): dedupe case-variant Anthropic headers in the executor buildHeaders path — Node/undici's fetch merges anthropic-version and Anthropic-Version into a single "v, v" value that the Anthropic API rejects, so both case variants are now collapsed to one canonical lowercase header (same for anthropic-beta). (thanks @Delcado19)
oauth(kiro): support Kiro IDC (organization) token import — when the ~/.aws/sso/cache token carries a clientIdHash, auto-import now reads the linked client registration file to obtain clientId/clientSecret, probes the Kiro IDE profile.json for profileArn (ARN region normalized to us-east-1 for the runtime gateway), and refreshes via the regional AWS OIDC endpoint instead of the social path; the import schema and modal forward these credentials so manual imports also work for IDC tokens. (thanks @enjoyer-hub)
fix(translator): preserve client cache_control breakpoints when routing Claude-format requests (e.g. Claude Code) to Alibaba DashScope's OpenAI-compatible providers (alibaba / alibaba-cn). The Claude→OpenAI translation previously stripped the markers from the system and message text blocks, so DashScope's explicit caching never engaged and every request was a cache miss. Cache hints now survive when preservation is requested for caching-capable OpenAI-format providers. (thanks @sacrtap)
fix(tts): resolve Gemini TTS models from catalog and add gemini-3.1-flash-tts-preview as the new default Vertex TTS model. (thanks @nguyenha935)
fix(sse): don't cool down a healthy connection on a self-inflicted upstream timeout (504) — when OmniRoute's own deadline elapses (surfaced as TimeoutError/BodyTimeoutError → 504), the connection is no longer disabled/failed-over, so a slow-but-healthy provider isn't penalised for our timeout. Genuine upstream 5xx/429 still trigger cooldown; antigravity keeps its own policy. (thanks @costaeder)
fix(translator): forward image tool_result blocks as image_url instead of stringifying base64. (thanks @alican532)
fix(sse): robust Anthropic /v1/messages streaming — real ping keepalive + client-disconnect guard — slow first tokens on reasoning models could trip strict clients' idle-read watchdog; the route now keeps the stream warm with a real event: ping (Anthropic clients ignore SSE comments) from the very first frame, and a client disconnect (AbortError / controller-closed) no longer counts as a provider failure (no failover/cooldown). (thanks @costaeder)
fix: preserve model hidden flags (isHidden) across model sync — replaceCustomModels pruned the compat-override list to the new custom-model ids, silently wiping the isHidden flag of eye-hidden SYNCED models on every periodic sync / import (all hidden models turned back on). The redundant cleanup is removed (per-model removal already handles its own compat cleanup), so eye-hidden models stay hidden across re-sync. (#5086 — thanks @herjarsa)
fix(models): derive model-discovery config from the registry modelsUrl — providers absent from the hardcoded PROVIDER_MODELS_CONFIG but carrying a registry modelsUrl (e.g. MiniMax) now get an auto-derived Bearer /v1/models discovery config, so "discover models" works instead of returning nothing. (thanks @herjarsa)
fix(compression): resolve worker + rule/filter assets via runtime anchors (standalone bundle) — the LLMLingua worker and the RTK rule/filter loaders relied on fileURLToPath(import.meta.url), which the standalone bundle freezes to the build-machine path, so the worker never spawned and rule/filter packs failed to resolve. They now anchor on process.cwd()/argv[1] (with pathToFileURL for the worker URL). (thanks @fulorgnas)
fix(api): sanitize error responses on seven management routes (Rule #12 hardening) — cli-tools/backups, cli-tools/guide-settings/[toolId], logs/export, models/catalog, providers/test-batch, settings/import-json and usage/proxy-logs no longer return raw error.message; they wrap caught errors in sanitizeErrorMessage(...), and the routes are removed from the check-error-helper allowlist. (thanks @JxnLexn)
fix(sse): keep output_text-only Responses bodies from being dropped/false-502'd — some upstreams return a shorthand Responses body whose answer is only in output_text with an empty output[]. sanitizeResponsesApiResponse discarded the text, so the response then tripped the malformed-200 guard. The sanitizer now synthesizes an output[] message item from a non-empty output_text (complements the Claude-native fix in #5108; both stem from #4942).
fix(executors): preserve a lone caller-supplied Anthropic-Version header casing — the case-variant dedupe (#4846) unconditionally rewrote Anthropic-Version/Anthropic-Beta to lowercase even when only one variant was present, clobbering the caller's header. Dedupe now runs only when both case variants coexist (the actual undici-merge collision it was meant to fix).
fix(responses): default text.format to { type: "text" } for openai-compatible responses providers — some Responses-compatible upstreams (e.g. LM Studio) reject a text object missing text.format with a 400 missing_required_parameter; the default executor now fills the Responses-API default before forwarding (guarded to openai-compatible-*responses*, never overwriting an existing format). (thanks @StevanusPangau)
fix(translator): stop stripping client-provided reasoning_content for reasoning-replay providers — the #4849 agentic-context strip (which drops reasoning_content from tool-call assistant turns to avoid O(n²) token growth) ran unconditionally, so replay providers (DeepSeek V4, Kimi K2, Qwen-Thinking, etc.) lost the client's reasoning and the reasoning-replay cache then overwrote it with a stale cached value (and such upstreams 400 without the original reasoning). The strip now skips reasoning-replay targets while non-reasoning providers keep the O(n²) protection. (#5122)
fix(providers): add MiniMax M3 & Nemotron 3 Ultra to the Cline catalog — the two models were missing from Cline's provider catalog and could not be selected; both are now registered. (#5136, closes #3321)
fix(dashboard): key model-visibility toggle on the canonical providerId — the per-model visibility toggle keyed off a display id, so toggling a model on one provider alias could mis-target another; it now keys on the canonical providerId. (#5091 — thanks @Theadd)
fix(diagnostics): recognize the Claude API format in detectMalformedNonStream — salvaged null-guard so a Claude-shaped non-streaming body is no longer misclassified. (#5141 — thanks @herjarsa / @diegosouzapw)
fix(logging): track the final connection IDs in failover logs — failover log lines now record the connection that actually served (or last failed) the request, instead of only the first attempt. (#5016 — thanks @JxnLexn)
fix(sse): ignore disconnect races during in-band stream error handling — a client disconnect that races with in-band upstream error handling no longer surfaces as a spurious provider failure. (#5007 — thanks @JxnLexn)
fix(dashboard): surface the server error on handleToggleCombo failure — a failed combo toggle now shows the backend error instead of silently no-op'ing. (#5138 — thanks @KooshaPari / @diegosouzapw)
fix(quota): track provider quota reset windows + enrich the Codex playground — observed quota reset windows are tracked and surfaced, and the Codex playground gains the enriched quota metadata. (#5141 — thanks @Witroch4 / @diegosouzapw)
fix(sidebar): drop the orphan settings accent color — removed a dangling accent-color entry that broke typecheck:core. (#5142)
fix(sse): preserve non-stream reasoning fields for compatible clients — non-streaming responses now keep the upstream reasoning fields (reasoning / reasoning_content and OpenRouter/Gemini reasoning_details) instead of stripping them in responseSanitizer, so clients that render reasoning on buffered responses no longer lose it. (#5155 — thanks @rdself)
fix(i18n): add missing English UI labels — fills in untranslated English strings that were surfacing as raw keys in the dashboard. (#5153 — thanks @rdself)

🔒 Security

fix(security): exact-host Anthropic baseUrl check — the Anthropic base-URL guard used a substring match that a crafted host could partially satisfy; it now requires an exact host match (resolves CodeQL js/incomplete-url-substring-sanitization alert #674). (#5130)

📝 Maintenance

refactor(store): remove dead legacy store modules — salvaged cleanup of unused legacy store code. (#5138 — thanks @JxnLexn / @diegosouzapw)
test(combo): deterministic routing-decision matrix for all 17 strategies — a deterministic E2E matrix pins the routing decision of every combo strategy. (#5146)
chore: baseline reconciliations (complexity / file-size / cognitive), golden-snapshot + apikey-count alignment for new providers, orphan-test relocation, release base-red repairs, CHANGELOG i18n mirror sync, and an actions/cache 5→6 bump. (#5145, #5144, #5125, #5126, #5120, #5117, #5112)
test: gated live smoke for combo strategies (in-process + VPS HTTP) and refreshed release expectations to match current code. (#5151, #5150 — thanks @KooshaPari / @diegosouzapw)