✨ New Features
- feat(sidebar): colored menu icons — sidebar menu icons now render with a per-item accent color: curated colors for known items (
SIDEBAR_ICON_ACCENTS) plus a deterministic hash-based fallback (getSidebarIconAccent) so every item gets a stable, distinct color across sessions. (#3812 — thanks @rafacpti23) - feat(providers): add Factory (factory.ai) as a subscription gateway provider —
factory(Factory Droids' hosted gateway) is now a first-class routing provider on the OpenAI-compatiblehttps://api.factory.ai/v1endpoint with Bearer apikey auth; the key is supplied from the Dashboard connection (not env). (#5065 — thanks @KooshaPari) - feat(providers): add Grok Build (xAI) provider with OAuth import-token flow —
grok-cli(aliasgc) routes through Grok's CLI chat proxy; users paste their~/.grok/auth.json(or the JWT), with automaticrefresh_tokenrotation. The public xAI client_id is embedded viaresolvePublicCred("grok_id")(Hard Rule #11), never a literal. (#5020 — thanks @fulorgnas) - feat(dashboard): click-to-edit model alias in the provider page — click an alias to edit it inline (Enter/blur saves, Escape cancels), instead of only being able to delete and re-add it. (#5119 — thanks @waguriagentic)
- feat(providers): add ZenMux Free (session-cookie free-tier) provider —
zenmux-free(aliaszmf) with a dedicated executor translating ZenMux's Anthropic-style SSE to OpenAI format; ships 12 free-tier models (DeepSeek V3.2, GLM 4.7 Flash Free, etc.). (#5105 — thanks @mrnasil) - feat(providers): allow local/private provider URLs by default (
Allow Local Provider URLsflag) — adding/validating an OpenAI-compatible provider on a loopback/LAN address (e.g.http://127.0.0.1:3264/api) was rejected by the SSRF guard with "Blocked private or local provider URL", even though OmniRoute is local-first. A newOMNIROUTE_ALLOW_LOCAL_PROVIDER_URLSfeature flag (default ON, toggle in Settings → Feature Flags) now scopes the provider-validation guard to allow local/private hosts while still blocking cloud-metadata endpoints (169.254.169.254, metadata.google.internal). Disable it to restore strict public-only blocking. Webhook/remote-image SSRF defaults are unchanged. (#5066, thanks @daniij) - feat(blackbox): refresh provider model catalog with latest models. (thanks @ptkelanatechsolutions)
- kiro: inline
<thinking>stream splitter — when<thinking_mode>enabled</thinking_mode>is present,assistantResponseEventcontent is now split into separatedelta.content/delta.reasoning_contentSSE chunks (newopen-sse/executors/kiroThinking.tsmodule wired intoKiroExecutor.transformEventStreamToSSE). - feat(cursor): parse Cursor Composer DeepSeek-style inline tool calls — Composer
cu/composer-2.5*models embed tool invocations in their visible text using<|tool▁calls▁begin|>…<|tool▁calls▁end|>markers instead of structured protobuf frames; a new streaming parser (composerToolCalls.ts) intercepts these in both streaming and non-streaming paths, suppresses the markers from the client-visible content, and emits proper OpenAItool_callsdeltas so downstream clients handle them natively. (thanks @noestelar) - feat(proxy): support auth-less
host:portbatch import and surface proxy-test failures. (thanks @dimaslanjaka) - feat(video): Alibaba DashScope video provider (
wan2.7-t2v) — adds thealibabavideo provider (DashScope async task → poll → MP4) wired through the standard apikey credential path, so text-to-video requests can route to Alibaba'swan2.7-t2vmodel. (thanks @josevictorferreira) - feat(cc): per-connection "summarized thinking display" toggle for Claude-Code-compatible providers — exposes a connection-level toggle that drives the existing Copilot summarized-thinking marker, so operators can opt a CC-compatible connection into summarized reasoning display from the UI (schema + request defaults + provider modals, with i18n). (thanks @rdself)
- feat(compression): compression playground in the studio (Play + Compare tabs) —
/dashboard/compression/studiogains a synthetic playground: paste text → per-engine lanes (each deterministic engine run alone via/api/compression/preview) plus a combined waterfall ordered bystackPriority, and a free A/B Compare grid with on-demand, USD-capped fidelity verdicts (/api/compression/compare+compare/verify). The preview route now uses the real cl100k tokenizer, returnsengineBreakdown, and accepts an orderedpipeline[]; newcompare/compare/verify/retrieveroutes; the live WS feed moved to/dashboard/compression/live. Management-only. (#5080) - feat(dashboard): expose Fusion
judgeModel+fusionTuningin the combo editor — the Fusion strategy editor now surfaces the judge model (synthesizes the panel answers; defaults to the first panel model) plus the quorum-grace tuning fields (minPanel,stragglerGraceMs,panelHardTimeoutMs) thatopen-sse/services/fusion.tsalready reads. Schema-validated + bounded; empty tuning is never persisted. (#5074) - feat(compression): opt-in per-step fidelity gate for the stacked pipeline — each compression step can now be guarded by a pure fidelity checker (4 invariants, fail-open) so a lossy engine that would degrade the prompt past a threshold is rejected and its lane skipped instead of silently shipping. Configurable via
fidelityGate(advanced thresholds intentionally API-omitted), with a per-lane rejection breakdown surfaced in the studio playground toggle. (#5143) - feat(compression): fuzzy near-duplicate dedup (session-dedup 2nd pass) — the session-dedup engine gains a second fuzzy pass that collapses near-duplicate (not just byte-identical) segments, with a playground toggle to compare on/off. (#5143)
- feat(quota): opt-in Codex/Claude auto-ping keepalive — an opt-in background keepalive can periodically ping Codex/Claude connections to keep their session/quota state warm, reducing cold-start failures on the first real request. (#5102)
- feat(ops): SRE playbooks + ops helper scripts — salvaged from a closed stale PR; adds operator runbooks and ops helper scripts. (#5138 — thanks @KooshaPari / @diegosouzapw)
- feat(mcp): web-session robustness — cookie dedup + browser-pool observability — the MCP web-session path now de-duplicates cookies when (re)hydrating a session (avoiding conflicting duplicate
Cookieheaders) and exposes browser-pool observability (pool size / in-use / acquisition metrics) for the headless web providers. (#5121, builds on #3368) - feat(compression): Ionizer engine — lossy JSON-array sampling reversible via CCR — a new compression engine that down-samples large JSON arrays to a representative subset and records a Compact Change Representation (CCR) so the omitted rows can be reconstructed, trading exactness for a large token reduction on tabular/array-heavy payloads. (#5148)
🔧 Bug Fixes
- fix(proxy): make the SOCKS5 handshake timeout operator-tunable (
SOCKS_HANDSHAKE_TIMEOUT_MS) — under high concurrency against a single residential gateway host, the SOCKS5 connect handshake could exceed the hardcoded 10s even though the proxy was reachable, surfacing as a false[Proxy Fast-Fail] Proxy unreachable(the pool size is already tunable viaOMNIROUTE_PROXY_DISPATCHER_CONNECTIONS). The handshake timeout now readsSOCKS_HANDSHAKE_TIMEOUT_MS(default unchanged at10000, capped at120000) so a concurrency-heavy deployment can raise it without a code change. Mitigation for #5109 (the full concurrency-100 collapse still needs the reporter's live load-test confirmation). (#5109) - fix(api): resolve
GET /v1/models/{id}case-insensitively — clients that normalise the model id (e.g. OpenCode requestingminimax/minimax-m3for the canonical catalog entryminimax/MiniMax-M3) missed the single-model lookup, which is case-sensitive, and fell back to advertisingcontext_length: 0.findModelByIdnow prefers an exact-case match and falls back to a case-insensitive match, so the real entry (and its context window) is returned regardless of casing. (#5082) - fix(services): embed WS proxy honours
LIVE_WS_HOST; reject emptymessagesearly — two headless/Docker deployment fixes (#5110). The embed WebSocket proxy (:20131) only readEMBED_WS_PROXY_HOST, so behind a reverse proxy/tunnel it stayed bound to127.0.0.1even withLIVE_WS_HOST=0.0.0.0set and the Live dashboard showed "WebSocket disconnected"; it now falls back toLIVE_WS_HOST(default still loopback). Separately, a request with an explicitly emptymessages: []array was forwarded upstream and bounced back as a confusing raw400/502;handleChatnow rejects it up front with a clearmessages: at least one message is required(Responses-APIinputrequests are unaffected). (#5110) - fix(proxy): repair one-click Deno & Cloudflare relay deployments — the
/api/settings/proxy/testendpoint only recognized thevercelrelay type, so testing a deployed Deno or Cloudflare relay returnedproxy.type must be http, https, or socks5and never reached the relay; it now routes all relay types throughisRelayType(). On installs withSTORAGE_ENCRYPTION_KEYthe relay-auth token is read viaextractRelayAuth(encryptedrelayAuthEncform), fixing the silent401that leftpublicIpnull. The Cloudflare Worker upload now sends the script part asapplication/javascript(the API rejectsapplication/javascript+module; ES-module semantics come frommain_module), and the proxy-registry schema accepts thedeno/cloudflaretypes +deno-relay/cloudflare-relaysources so editing a deployed relay no longer 400s. (#5128) - fix(kiro): retire
claude-sonnet-4.5from the Kiro catalog + pin the exact Kiro 400 error —claude-sonnet-4.5left the Kiro free-tier lineup (current active models: Opus 4.8/4.7/4.6, Sonnet 4.6, Haiku 4.5), so it is removed from the Kiro registry entry and the free-model catalog. A regression test now pins Kiro's verbatim[400] Invalid model. Please select a different model to continue.to theisModelUnavailableErrormodel-unavailable classification. A 400 on every model (including current ones) points to a server-side Kiro tier/region gate, not an OmniRoute catalog bug. (#5140, closes #4484) - fix(dashboard): preserve every rendered field when loading/saving Resilience settings —
ResilienceTabrenderscomboCooldownWaitandquotaShareConcurrencyLimit, but both the initial-load and save paths rewrote component state without those fields, so after a successful/api/resilienceresponse the cards receivedundefinedand the page fell back to the generic "failed to load" state. A sharedtoResilienceResponse()mapper now keeps all rendered fields, andPATCH /api/resiliencereturnsquotaShareConcurrencyLimitto match GET and the UI contract. (#5139 — thanks @rdself) - fix(quota): hydrate the in-memory quota cache from snapshots + scope auto-combo candidates — after a restart the quota cache was empty, so a known-exhausted connection looked healthy until re-queried;
isAccountQuotaExhaustednow lazily hydrates from persistedquota_snapshots. Auto-combo candidate expansion is also scoped to the connections each combo target actually allows, instead of pulling in every connection for the provider. (#5015 — thanks @JxnLexn) - fix(resilience): harden quota cutoff, Gemini audio MIME, and model-lockout cooldown — stored quota hard-cutoff values are no longer coerced to
enabled=truefrom arbitrary strings; Gemini audio input parts have their MIME type validated/normalized before forwarding; and model lockout now honours the configuredmaxCooldownMsceiling. (#5093 — thanks @KooshaPari) - fix(streaming): harden long OpenAI-compatible SSE streams — a late pipeline-wind-down error can no longer overwrite an already-recorded successful stream (
streamCompletionRecordedguard), client disconnects finalize as499 client_disconnectedinstead of poisoning provider/account failure state, JSON bodies that are actually SSE (wrongapplication/jsoncontent-type) are sniffed and re-streamed, and reasoning fields (reasoning/reasoning_content+ OpenRouter/Gemini encryptedreasoning_details) are preserved through the JSON-as-SSE fallback. (#5124 — thanks @rdself) - fix(usage): dedupe request-usage logging and debounce stats events —
saveRequestUsagenow guards against duplicate inserts (natural key: timestamp + provider + model + connection + api-key + token counts), back-fills a missingendpoint, and only emitsusageRecordedwhen a row was actually inserted; statsupdate/pendingevent bursts are collapsed into a single debounced notification to reduce churn. (#4940 — thanks @nguyenxvotanminh3) - fix(sse): convert the native Gemini request body to OpenAI format in the Antigravity MITM handler —
contents/systemInstruction/generationConfig/thinkingConfigare now translated to OpenAI chat-completions format before forwarding to/v1/chat/completions, so thinking-capable models (e.g.ag/claude-opus-4-6-thinking) no longer fail with provider-side 400 "invalid argument" errors. (#4845 — thanks @anuragg-saxenaa) - fix(db): translate the two pt-BR SQLite driver-fallback log lines to English —
[DB] Pré-inicializando sql.js WASM…and[DB] Drivers síncronos indisponíveis…were the only non-English server log strings, mixing languages in the logs. Now[DB] Pre-initializing sql.js WASM (synchronous drivers unavailable)…/[DB] Synchronous drivers unavailable — falling back to sql.js (WASM), guarded by a test that scans the driver path for accented log strings. (#5103) - fix(diagnostics): non-streaming Claude responses no longer false-502 as
empty_choices— the v3.8.37 malformed-200 detector (#4942) only understood OpenAIchoicesand Responses-APIoutputshapes, so a/v1/messagesresponse that stays in Claude shape ({type:"message", content:[…]}) fell through toempty_choices→ 502 (cascading to "All models failed" in a combo). Most visibly, an extended-thinking turn whose buffered body is a single empty thinking block with a validsignature(Claude Code's non-streaming Bash classifier) 502'd on every call.detectMalformedNonStreamnow understands the Claude shape: text/tool_use blocks and thinking blocks carrying a signature count as valid output, while a genuinely emptycontent:[]is still flagged. (#5108, thanks @insoln) - fix(combo): empty-content 502 now fails over within the same request instead of exhausting the provider — a leg that answers HTTP 200 with no usable completion is rewritten to
502 "Provider returned empty content", but the combo exhaustion classifier treated that synthetic 502 as a connection-level failure (#1731v2) and marked the whole provider/connection exhausted, skipping every remaining same-provider leg in that request. The connection is actually healthy (it just returned an empty body), so empty-content 502s are now classified as model-level transient failures: the request advances to the next leg and the rest of that provider's legs stay eligible. Genuine gateway 502s still trip connection exhaustion. (#5085, thanks @andrea-kingautomation) - fix(dashboard): surface the detailed credential-validation error instead of a bare "invalid" badge — the inline "Check" in the Add-Connection modal discarded the
errormessage returned by/api/providers/validateand showed only aninvalidbadge. For web providers (claude-web / chatgpt-web) the real cause is often an environment error the backend already reports (e.g.TLS impersonation client failed to start: EACCES … mkdir tls-client-node/bin), so users were left guessing. The modal now renders the full reason next to the badge. (#5088, thanks @tkhs101) - fix(executors): strip
client_metadatafrom forwarded body for Cerebras and Mistral — Cerebras returns 400 (wrong_api_format) and Mistral returns 422 (extra_forbidden) when the passthrough body carriesclient_metadata(an OpenAI Codex / Claude CLI field with no equivalent on these upstreams). The default executor now drops it for these two providers before sending downstream; other providers (notablyopenai/codex) keep it. (thanks @saurabh321gupta) - fix(codebuddy): only send reasoning params when the client requests reasoning. (thanks @anki1kr)
- fix(sse): keep streaming for forceStream providers when a JSON client requests it. Providers marked
forceStream:truerejectstream:falseupstream (HTTP 400);resolveStreamFlagnow guards against this so stream-only providers keep streaming even when the client sendsAccept: application/jsonorstream:false. (thanks @anki1kr) - fix(sse): prevent non-JSON SSE lines and duplicate
[DONE]from breaking clients. (thanks @qianze0628) - fix(sse): dedupe case-variant Anthropic headers in the executor
buildHeaderspath — Node/undici'sfetchmergesanthropic-versionandAnthropic-Versioninto a single"v, v"value that the Anthropic API rejects, so both case variants are now collapsed to one canonical lowercase header (same foranthropic-beta). (thanks @Delcado19) - oauth(kiro): support Kiro IDC (organization) token import — when the
~/.aws/sso/cachetoken carries aclientIdHash, auto-import now reads the linked client registration file to obtainclientId/clientSecret, probes the Kiro IDEprofile.jsonforprofileArn(ARN region normalized tous-east-1for the runtime gateway), and refreshes via the regional AWS OIDC endpoint instead of the social path; the import schema and modal forward these credentials so manual imports also work for IDC tokens. (thanks @enjoyer-hub) - fix(translator): preserve client
cache_controlbreakpoints when routing Claude-format requests (e.g. Claude Code) to Alibaba DashScope's OpenAI-compatible providers (alibaba/alibaba-cn). The Claude→OpenAI translation previously stripped the markers from the system and message text blocks, so DashScope's explicit caching never engaged and every request was a cache miss. Cache hints now survive when preservation is requested for caching-capable OpenAI-format providers. (thanks @sacrtap) - fix(tts): resolve Gemini TTS models from catalog and add
gemini-3.1-flash-tts-previewas the new default Vertex TTS model. (thanks @nguyenha935) - fix(sse): don't cool down a healthy connection on a self-inflicted upstream timeout (504) — when OmniRoute's own deadline elapses (surfaced as
TimeoutError/BodyTimeoutError→ 504), the connection is no longer disabled/failed-over, so a slow-but-healthy provider isn't penalised for our timeout. Genuine upstream 5xx/429 still trigger cooldown; antigravity keeps its own policy. (thanks @costaeder) - fix(translator): forward image
tool_resultblocks asimage_urlinstead of stringifying base64. (thanks @alican532) - fix(sse): robust Anthropic
/v1/messagesstreaming — real ping keepalive + client-disconnect guard — slow first tokens on reasoning models could trip strict clients' idle-read watchdog; the route now keeps the stream warm with a realevent: ping(Anthropic clients ignore SSE comments) from the very first frame, and a client disconnect (AbortError / controller-closed) no longer counts as a provider failure (no failover/cooldown). (thanks @costaeder) - fix: preserve model hidden flags (
isHidden) across model sync —replaceCustomModelspruned the compat-override list to the new custom-model ids, silently wiping theisHiddenflag of eye-hidden SYNCED models on every periodic sync / import (all hidden models turned back on). The redundant cleanup is removed (per-model removal already handles its own compat cleanup), so eye-hidden models stay hidden across re-sync. (#5086 — thanks @herjarsa) - fix(models): derive model-discovery config from the registry
modelsUrl— providers absent from the hardcodedPROVIDER_MODELS_CONFIGbut carrying a registrymodelsUrl(e.g. MiniMax) now get an auto-derived Bearer/v1/modelsdiscovery config, so "discover models" works instead of returning nothing. (thanks @herjarsa) - fix(compression): resolve worker + rule/filter assets via runtime anchors (standalone bundle) — the LLMLingua worker and the RTK rule/filter loaders relied on
fileURLToPath(import.meta.url), which the standalone bundle freezes to the build-machine path, so the worker never spawned and rule/filter packs failed to resolve. They now anchor onprocess.cwd()/argv[1](withpathToFileURLfor the worker URL). (thanks @fulorgnas) - fix(api): sanitize error responses on seven management routes (Rule #12 hardening) —
cli-tools/backups,cli-tools/guide-settings/[toolId],logs/export,models/catalog,providers/test-batch,settings/import-jsonandusage/proxy-logsno longer return rawerror.message; they wrap caught errors insanitizeErrorMessage(...), and the routes are removed from thecheck-error-helperallowlist. (thanks @JxnLexn) - fix(sse): keep
output_text-only Responses bodies from being dropped/false-502'd — some upstreams return a shorthand Responses body whose answer is only inoutput_textwith an emptyoutput[].sanitizeResponsesApiResponsediscarded the text, so the response then tripped the malformed-200 guard. The sanitizer now synthesizes anoutput[]message item from a non-emptyoutput_text(complements the Claude-native fix in #5108; both stem from #4942). - fix(executors): preserve a lone caller-supplied
Anthropic-Versionheader casing — the case-variant dedupe (#4846) unconditionally rewroteAnthropic-Version/Anthropic-Betato lowercase even when only one variant was present, clobbering the caller's header. Dedupe now runs only when both case variants coexist (the actual undici-merge collision it was meant to fix). - fix(responses): default
text.formatto{ type: "text" }for openai-compatible responses providers — some Responses-compatible upstreams (e.g. LM Studio) reject atextobject missingtext.formatwith a 400missing_required_parameter; the default executor now fills the Responses-API default before forwarding (guarded toopenai-compatible-*responses*, never overwriting an existing format). (thanks @StevanusPangau) - fix(translator): stop stripping client-provided
reasoning_contentfor reasoning-replay providers — the #4849 agentic-context strip (which dropsreasoning_contentfrom tool-call assistant turns to avoid O(n²) token growth) ran unconditionally, so replay providers (DeepSeek V4, Kimi K2, Qwen-Thinking, etc.) lost the client's reasoning and the reasoning-replay cache then overwrote it with a stale cached value (and such upstreams 400 without the original reasoning). The strip now skips reasoning-replay targets while non-reasoning providers keep the O(n²) protection. (#5122) - fix(providers): add MiniMax M3 & Nemotron 3 Ultra to the Cline catalog — the two models were missing from Cline's provider catalog and could not be selected; both are now registered. (#5136, closes #3321)
- fix(dashboard): key model-visibility toggle on the canonical
providerId— the per-model visibility toggle keyed off a display id, so toggling a model on one provider alias could mis-target another; it now keys on the canonicalproviderId. (#5091 — thanks @Theadd) - fix(diagnostics): recognize the Claude API format in
detectMalformedNonStream— salvaged null-guard so a Claude-shaped non-streaming body is no longer misclassified. (#5141 — thanks @herjarsa / @diegosouzapw) - fix(logging): track the final connection IDs in failover logs — failover log lines now record the connection that actually served (or last failed) the request, instead of only the first attempt. (#5016 — thanks @JxnLexn)
- fix(sse): ignore disconnect races during in-band stream error handling — a client disconnect that races with in-band upstream error handling no longer surfaces as a spurious provider failure. (#5007 — thanks @JxnLexn)
- fix(dashboard): surface the server error on
handleToggleCombofailure — a failed combo toggle now shows the backend error instead of silently no-op'ing. (#5138 — thanks @KooshaPari / @diegosouzapw) - fix(quota): track provider quota reset windows + enrich the Codex playground — observed quota reset windows are tracked and surfaced, and the Codex playground gains the enriched quota metadata. (#5141 — thanks @Witroch4 / @diegosouzapw)
- fix(sidebar): drop the orphan
settingsaccent color — removed a dangling accent-color entry that broketypecheck:core. (#5142) - fix(sse): preserve non-stream reasoning fields for compatible clients — non-streaming responses now keep the upstream reasoning fields (
reasoning/reasoning_contentand OpenRouter/Geminireasoning_details) instead of stripping them inresponseSanitizer, so clients that render reasoning on buffered responses no longer lose it. (#5155 — thanks @rdself) - fix(i18n): add missing English UI labels — fills in untranslated English strings that were surfacing as raw keys in the dashboard. (#5153 — thanks @rdself)
🔒 Security
- fix(security): exact-host Anthropic
baseUrlcheck — the Anthropic base-URL guard used a substring match that a crafted host could partially satisfy; it now requires an exact host match (resolves CodeQLjs/incomplete-url-substring-sanitizationalert #674). (#5130)
📝 Maintenance
- refactor(store): remove dead legacy store modules — salvaged cleanup of unused legacy store code. (#5138 — thanks @JxnLexn / @diegosouzapw)
- test(combo): deterministic routing-decision matrix for all 17 strategies — a deterministic E2E matrix pins the routing decision of every combo strategy. (#5146)
- chore: baseline reconciliations (complexity / file-size / cognitive), golden-snapshot + apikey-count alignment for new providers, orphan-test relocation, release base-red repairs, CHANGELOG i18n mirror sync, and an
actions/cache5→6 bump. (#5145, #5144, #5125, #5126, #5120, #5117, #5112) - test: gated live smoke for combo strategies (in-process + VPS HTTP) and refreshed release expectations to match current code. (#5151, #5150 — thanks @KooshaPari / @diegosouzapw)