diegosouzapw/OmniRoute v3.8.25 on GitHub

[3.8.25] — 2026-06-14

✨ New Features

feat(compression): pluggable compression engines + async pipeline + Compression Studios — a new prompt-compression subsystem with selectable engines (Lite / Aggressive / Ultra), an asynchronous compression pipeline wired into the chat core, and "Compression Studios" tooling for inspecting and tuning compression. (#3848)
feat(compression-ui): unified compression configuration UI — a Compression Hub with per-engine pages (Lite / Aggressive / Ultra), a combos editor, a dedicated sidebar entry, and live-WS default-on. (#3860)
feat(security): prompt-injection guard across every LLM route + red-team suite — the prompt-injection guard now runs on all LLM routes (chat, responses, embeddings, images, audio, rerank, search, moderations, videos, music) with a shared input sanitizer and a promptfoo-based red-team suite (Quality Gates Fase 8 · Bloco D). (#3857)
feat(kiro): live per-account model discovery — Kiro now discovers each account/tier's entitled models via CodeWhisperer ListAvailableModels (region-matched, with a static-catalog fallback). (#3836 — thanks @artickc)
feat(gemini/vertex): surface Veo video models in dynamic discovery — Veo video models (predictLongRunning) now appear in Gemini/Vertex dynamic model discovery. (#3839 — thanks @artickc)
feat(mimocode): per-account proxy for multi-account round-robin — each mimocode account can route through its own proxy (resolved per account by fingerprint via runWithProxyContext), with a "Distribute proxies" UI helper. (#3837 — thanks @pizzav-xyz)
feat(intelligence): expose Arena ELO sync as a feature flag — the LM Arena ELO leaderboard sync is now toggleable (ARENA_ELO_SYNC_ENABLED, DB-override + env fallback). (#3821 — thanks @rdself)

🐛 Fixed

test(oauth): prove refresh_token preservation for the real gemini-cli / antigravity dispatch — the #3679/#3766 regression test used a synthetic provider that routes through the generic tokenUrl path, so the fix was never proven for the actual Google-family providers, which dispatch through refreshGoogleToken() against the hardcoded OAUTH_ENDPOINTS.google.token. Added a test that drives checkConnection through the real gemini-cli/antigravity path (redirecting the Google token endpoint to a local server returning invalid_grant) and asserts the refresh_token is preserved (not nulled) — confirming these connections are not spuriously destroyed on a failed refresh. (#3850 — thanks @3xa228148)
fix(oauth): clear setup message for GitLab Duo instead of "Internal server error" — adding a GitLab Duo connection without a registered OAuth client returned an opaque Internal server error at the Add Connection step. buildAuthUrl threw when GITLAB_DUO_OAUTH_CLIENT_ID was missing, and the route swallowed it into a generic 500. It now returns null (mirroring the Qoder provider) and the authorize route surfaces an actionable message: register an OAuth app at https://gitlab.com/-/profile/applications with redirect URI http://localhost:20128/callback and scopes ai_features read_user, then set GITLAB_DUO_OAUTH_CLIENT_ID. (#3861 — thanks @sidinsearch)
fix(db): persist the "Keep latest backups" retention setting — changing the backup-retention count in Settings → Database backup retention had no effect: it always snapped back to 20 on refresh (and editing .env post-start was ignored too, since process.env isn't reloaded). getDbBackupMaxFiles() only read the DB_BACKUP_MAX_FILES env var — there was no setter and no persisted value. The value now round-trips through a dedicated key_value store (getDbBackupMaxFiles precedence: env override → persisted UI value → default 20), and the "Clean old backups" action persists the chosen count. Existing installs keep the historical default of 20 until explicitly changed. (#3834 — thanks @netstratego)
fix(sse): clamp Gemini thinking budget to the model's real cap (reasoning_effort/effort=high 400) — translating OpenAI reasoning_effort=high (and Claude-Code output_config.effort=high) to a Gemini target sent a hardcoded thinkingBudget: 32768, which exceeds Flash-tier Gemini's real max of 24576 → upstream HTTP 400 (the thinkingLevel=high path already used 24576 and worked on the same model). gemini-2.5-flash now declares its real thinkingBudgetCap (24576) so the existing capThinkingBudget() chokepoint actually clamps, and the Claude→Gemini output_config.effort path — which previously sent the raw value with no cap at all — now routes through the same clamp (pro-tier, real cap 32768, is left untouched). (#3842 — thanks @andrea-kingautomation)
fix(intelligence): run pricing + models.dev sync from the live startup path — like the Arena ELO sync (v3.8.24), the external pricing sync (PRICING_SYNC_ENABLED) and the models.dev capability sync (Settings → AI toggle) were only initialized from server-init.ts, which the Next standalone runtime never executes — and models.dev had no caller at all. Their toggles were inert in production. Both are now initialized from instrumentation-node.ts (self-gated, opt-in preserved, non-blocking, never fatal). (thanks @diegosouzapw)
test(proxy): guard the per-connection 'direct' bypass over a global proxy + clearer label — the per-connection "Proxy Off" toggle (proxyEnabled: false) already overrides a configured global proxy (resolveProxyForConnection short-circuits to level: "direct" before the global step). Added an explicit regression test proving the bypass beats a global assignment (and round-trips on re-enable), and relabeled the UI to "Direct (bypass proxy)" so operators recognize it. Closes the verification gap in #2996. (thanks @diegosouzapw)
feat(connections): per-connection "disable cooldown" opt-out — a connection can now opt out of the transient cooldown (providerSpecificData.disableCooling, with a toggle in the Edit Connection modal). When set, a recoverable failure still records the error/backoff but does not take the connection out of rotation, so it stays eligible for selection — useful for a primary key you never want parked on a blip. Terminal states (banned / expired / credits_exhausted) still apply. (#2997 — thanks @diegosouzapw)
fix(combo): restore sessionless combo stickiness + reasoning-aware readiness (504 / TPS regression after v3.8.14) — #3399 (v3.8.16) replaced the <omniModel>-tag combo pinning with a server-side context-cache pin gated on a client sessionId. Clients that send no session id (most OpenAI-compatible tools) lost combo stickiness, so combos re-ran strategy selection every turn → upstream prompt-cache misses → cold high-reasoning starts (~78s) → intermittent [504] Upstream request did not return response headers + TPS collapse (only on combos). The pin now falls back to a stable per-conversation fingerprint (extractSessionAffinityKey(body)) when no session id is present — only when context_cache_protection is on, so #3399's anti-leak behaviour is preserved. Separately, the stream-readiness window now grants the +30s reasoning budget unconditionally for high-reasoning Codex GPT-5.x (small high-reasoning prompts were 504-ing at the 80s base regardless of stickiness). (#3825 — thanks @bypanghu)
test(combo): cover the skipProviderBreaker consumer gate — the producer was tested but the consumer (whether a failed combo target trips the whole-provider circuit breaker) was not; the breaker decision is now an exported pure predicate (shouldRecordProviderBreakerFailure, behaviour-identical) with direct tests asserting a connection_cooldown 503 does not trip the breaker while a plain 503 does. Closes another deferred test gap from #2743. (thanks @diegosouzapw)
fix(providers): surface the real Devin error + correct the Windsurf auth instructions — Devin chat returned a generic 502 "Invalid SSE response for non-streaming request" that swallowed the real cause (e.g. "Devin CLI not found"): an error-only SSE chunk (no choices) is now propagated with its sanitized message. The Windsurf "Visit windsurf.com/show-auth-token" instruction (the bare URL shows no token without an IDE-supplied ?state=) now directs users to the Windsurf: Provide Auth Token command-palette flow. (#3324 — thanks @mikmaneggahommie)
fix(grok-web): clearer 403 message for anti-bot / IP-reputation blocks — a Grok Web subscription validating from a flagged datacenter/VPS IP got a 403 that read like an invalid cookie, sending users to chase a cookie that was actually fine. A non-auth 403 (Cloudflare challenge / anti-bot body) now returns a message stating the cookie is likely OK and the block is IP-reputation-based — retry from a residential IP or configure a proxy (auth-shaped 403s keep the re-paste guidance). (#3474 — thanks @friedtofu1608)
fix(db): make the mass-pending-migrations safety threshold env-overridable — restoring a backup DB from an older version could trip "Detected N pending migrations … threshold is 50" with no way to override the hardcoded 50. The threshold is now configurable via OMNIROUTE_MAX_PENDING_MIGRATIONS (resolved at startup; 0 disables the check). (#3416 — thanks @samuraiIT)
test(proxy): cover the Vercel-relay proxyFetch path — net-new tests for buildVercelRelayHeaders and the vercel-type relay short-circuit (x-relay-target/-path/-auth, TCP-skip, missing-auth fail-closed), closing one of the deferred test gaps tracked in #2743. (thanks @diegosouzapw)
fix(cli): surface omniroute runtime repair in the native-module error messages — after a Node major upgrade, better-sqlite3's prebuilt binary mismatches the ABI and the service can crash-loop; the error only mentioned npm rebuild better-sqlite3 (which fails for global / no-toolchain installs). The startup + SQLite error hints now also point to the existing self-heal command omniroute runtime repair (rebuilds into a user-writable runtime), and a top-level omniroute repair alias was added. (#3476 — thanks @Rahulsharma0810)
fix(antigravity): per-request Pro-family upstream-id fallback chain (gemini-3.1-pro-high 400) — Antigravity silently renamed the Gemini 3.1 Pro-high upstream id, so gemini-3.1-pro-high started returning HTTP 400 (while -low still worked) and the live id can't be determined statically (competitor proxies disagree). The executor now retries alternative ids on a 400 (gemini-3.1-pro-high → gemini-pro-agent → gemini-3-pro-high, analogous for pro-low), bounded and only on a 400, with zero extra cost on the happy path; the 1:1 tier-passthrough invariant is preserved (the chain is request-time, not a static alias remap). (#3786 — thanks @aliaksandrsen)
fix(sse): retry once on an early stream close (STREAM_EARLY_EOF) for single-model requests — flaky OpenAI-compatible upstreams (e.g. NVIDIA NIM with minimax-m3 / qwen3.5 / glm-5.1) intermittently send HTTP 200 then close the SSE with zero useful frames, surfacing as a 502 "Stream ended before producing useful content". Only Antigravity got an early-close retry; every other provider returned the 502 immediately on the non-combo single-model path. A bounded one-retry (early-close only — not readiness-timeout — and without marking the account unavailable) now generalizes it. (The separate qwen-web validation SSRF part of the same report was already fixed in v3.8.24, #3767.) (#3758 — thanks @Svatosalav)
fix(models): preserve eye-hidden models across auto-sync / import — hiding models via the visibility (eye) toggle to keep only a combo's models was undone on every model import or auto-sync, which re-showed all of them. The sync re-import treated "hidden" identically to "deleted" and dropped both; a distinct isDeleted marker now separates the trash/delete path (still dropped on re-import, #3199) from the eye toggle (preserved as listed-but-hidden), and eye-hidden models are no longer re-aliased into the routable catalog on sync. (#3782 — thanks @xenstar)
fix(providers): correct the lmarena cookie hint (session → arena-auth-prod-v1) — the lmarena credential hint asked for a cookie named session, but lmarena.ai's real auth cookie is arena-auth-prod-v1, so users who pasted only session=… hit validation failures. The credential name, placeholder and storage keys now use the correct name (the legacy session key is retained for back-compat with already-saved credentials). (#3810 — thanks @xspylol)
fix(reasoning): normalize OpenAI-compatible max effort to xhigh by default — OpenAI-compatible providers do not accept literal max, but some upstreams (for example DeepSeek through OpenRouter) support xhigh; max now maps to xhigh unless the target model explicitly opts out of xhigh, with Claude alias variants still honoring the canonical Claude opt-out list. (#3826 — thanks @rdself)
fix(combo): return the replay response on the round-robin streaming path — a round-robin combo with a streaming target returned a body already locked by the readiness peek, surfacing as a 500 "ReadableStream is locked"; the round-robin path now returns the replay clone like the priority path does. (#3811 — thanks @0xtbug)
fix(claude): strip the reasoning-effort suffix from Claude model ids — Claude ids carrying an effort suffix (…-low … …-max) 404'd upstream and tripped the circuit breaker into a misleading "rate-limited" state; the suffix is now stripped before dispatch. (#3807 — thanks @zhiru)
fix(sse): flush routed SSE chunks promptly (ping/zombie readiness filter) — combo stream-readiness now filters ping/zombie frames so routed SSE chunks stream out without waiting on the readiness window. (#3759 — thanks @rdself)
fix(models): don't auto-hide transient (rate-limited / timeout) failures on Test All — a parallel Test All across many models could rate-limit an account and auto-hide every model that 429'd / timed out (dropping them from /v1/models); transient failures now surface an error state but stay visible. (#3849 — thanks @lukmanc405)
fix(quota): surface OpenCode Go's missing quota-API as a latched diagnostic — OpenCode Go keys whose quota endpoints return 404/401 no longer hammer the dead endpoints; the gap is latched with a clear message and an OMNIROUTE_OPENCODE_GO_QUOTA_URL override hint. (#3838 — thanks @adivekar-utexas)
fix(pricing): add the missing Kiro model pricing rows — Kiro models the registry serves (e.g. claude-sonnet-4.6) had no pricing row and reported $0.00; the rows were added. (#3835 — thanks @artickc)
fix(ui): render country flags via flagcdn SVGs for Windows compatibility — Windows doesn't render regional-indicator flag emoji; flags now use flagcdn SVGs with an emoji fallback. (#3814 — thanks @rafacpti23)
fix(ui): expand the request log table with a vertical resize handle — the request log table now shows ~10 rows and can be resized vertically. (#3820 — thanks @rafacpti23)
fix(i18n): translate the missing embeddedServices keys across 37 locales — the embeddedServices strings showed __MISSING__ in 37 locales; they are now translated. (#3819 — thanks @rafacpti23)

🔒 Security & Hardening

fix(security): CCR cross-tenant IDOR — per-principal scope store + bounded memory — the compression CCR scope store was shared across principals, allowing cross-tenant reads; it is now scoped per-principal with bounded memory. (#3859)
feat(supply-chain): build provenance, SBOM, Trivy scan & OpenSSF Scorecard (advisory) — added npm build provenance, a CycloneDX SBOM, Trivy image scanning, and an OpenSSF Scorecard workflow (Quality Gates Fase 8 · Bloco A, advisory). (#3824)

🧹 Internal / Quality / Docs

Consolidate the email-privacy control into Settings → Appearance — the per-page email-privacy toggles were replaced by a single global switch. (#3822 — thanks @rdself)
docs(ui): clarify the routing-settings copy (strategy sync + sticky limit) — (#3843 — thanks @adivekar-utexas)
Quality Gates — Fase 7 & 8 — promoted the dead-code / cognitive-complexity / type-coverage ratchets to blocking, installed advisory CI scanners (gitleaks / osv / actionlint / zizmor), and added property + golden + SSE-correctness tests and a runtime-resilience (chaos / heap-growth / k6 soak) suite. (#3809, #3858, #3808, #3854)
fix(docs): add MDX frontmatter to SUPPLY_CHAIN.md — the new security doc lacked the title: frontmatter that MDX pages require, which broke the production Build + Docker Hub publish; the frontmatter was added. (#3864)
chore(deps): bump aquasecurity/trivy-action 0.28.0 → 0.36.0 (#3862)
chore(quality): reconcile the file-size ratchet baseline for Prettier-inflated v3.8.25 fixes + chat.ts growth — the per-file size baseline was re-frozen to absorb the formatting/line-count growth from this cycle's chat-core and combo fixes (manual edits, never an automatic upward ratchet). (#3823, #3833 — thanks @diegosouzapw)
test(suite): green the unit suite at release time — align stale tests to this cycle's intended behavior + de-flake two new suites — release-gate housekeeping: updated tests that lagged behind intended behavior changes (OpenCode Go latched quota message #3838, the email-privacy control consolidated into Settings #3822, SOCKS5 default-on proxy-type message, the [id] provider-detail strangler-fig decomposition #3501, Vertex Express-mode keys, Antigravity discovery using a current user-callable model id) and the same-provider 503 fall-through resilience test; de-flaked the compression benchmark reproducibility test (sequential passes) and the ServiceSupervisor crash test (poll instead of fixed sleep). No production code changed. Also documented OMNIROUTE_MAX_PENDING_MIGRATIONS (#3416) in .env.example + ENVIRONMENT.md. (thanks @diegosouzapw)

What's Changed

fix(intelligence): run pricing + models.dev sync from the live startup path by @diegosouzapw in #3806
fix(claude): strip reasoning-effort suffix from Claude model ids (VS Code Effort slider 404) by @zhiru in #3807
fix: stream routed SSE chunks promptly by @rdself in #3759
fix(providers): correct lmarena cookie hint to arena-auth-prod-v1 (#3810) by @diegosouzapw in #3815
fix(models): preserve eye-hidden models across auto-sync (#3782) by @diegosouzapw in #3816
fix(sse): retry once on STREAM_EARLY_EOF for single-model requests (#3758) by @diegosouzapw in #3817
fix(antigravity): per-request Pro-family upstream-id fallback chain (#3786) by @diegosouzapw in #3818
chore(quality): re-baseline chat.ts file-size (#3758 follow-up) by @diegosouzapw in #3823
fix(cli): surface 'omniroute runtime repair' in native-module errors (#3476) by @diegosouzapw in #3828
test(proxy): cover Vercel-relay proxyFetch path (#2743 gap c) by @diegosouzapw in #3831
fix(db): env-overridable mass-pending-migrations threshold (#3416) by @diegosouzapw in #3827
fix(grok-web): clearer 403 message for anti-bot/IP-reputation blocks (#3474) by @diegosouzapw in #3830
fix(providers): surface real Devin error + fix Windsurf auth instructions (#3324) by @diegosouzapw in #3829
test(combo): cover skipProviderBreaker consumer gate (#2743 gap d) by @diegosouzapw in #3832
chore(quality): reconcile file-size baseline (prettier inflation) by @diegosouzapw in #3833
fix(reasoning): map max effort to xhigh by default by @rdself in #3826
Expose Arena ELO sync in feature flags by @rdself in #3821
fix(combo): sessionless combo stickiness + reasoning-aware readiness (#3825) by @diegosouzapw in #3847
Consolidate email privacy control by @rdself in #3822
fix(combo): return replay response in round-robin to avoid locked stream 500 by @0xtbug in #3811
fix(ui): use flagcdn SVGs for Windows flags compatibility in language selector by @rafacpti23 in #3814
fix(ui): expand request log table to show ~10 rows with vertical resize by @rafacpti23 in #3820
fix(i18n): translate MISSING embeddedServices keys in 37 locales by @rafacpti23 in #3819
fix(pricing): add missing Kiro model pricing rows (Sonnet 4.6, auto-kiro, deepseek-3.2, minimax-m2.5, glm-5) by @artickc in #3835
fix(models): don't auto-hide rate-limited/timeout models on Test All by @lukmanc405 in #3849
fix(quota): surface OpenCode Go missing-quota-API as a latched warning by @adivekar-utexas in #3838
feat(gemini/vertex): surface Veo video models in dynamic discovery (Gemini, Vertex, Vertex Express) by @artickc in #3839
feat(kiro): live per-account model discovery via ListAvailableModels (Builder ID + IAM Identity Center) by @artickc in #3836
clarify routing settings copy by @adivekar-utexas in #3843
feat(connections): per-connection disable-cooldown opt-out (#2997) by @diegosouzapw in #3852
test(proxy): guard per-connection direct bypass over global proxy (#2996) by @diegosouzapw in #3853
feat(compression): compression engines + async pipeline + Combo/Compression Studios by @diegosouzapw in #3848
Fase 8 · Bloco C — resiliência runtime (chaos + heap-growth + k6 soak) by @diegosouzapw in #3854
Fase 8 · Bloco D — injection-guard em todas as rotas LLM + red-team by @diegosouzapw in #3857
Fase 8 · Bloco A — supply-chain (provenance, SBOM, Trivy, Scorecard) advisory by @diegosouzapw in #3824
fix(security): CCR cross-tenant IDOR — scope store per-principal + bound memory by @diegosouzapw in #3859
Fase 8 · Bloco B — suíte de correção (property + golden + SSE-correctness) by @diegosouzapw in #3808
Fase 7 — instalar scanners advisory no CI (gitleaks/osv/actionlint/zizmor) by @diegosouzapw in #3858
Fase 7 finalize — 3 catracas advisory→bloqueante + re-baseline consciente v3.8.25 by @diegosouzapw in #3809
Release v3.8.25 by @diegosouzapw in #3805
fix(docs): SUPPLY_CHAIN.md MDX frontmatter — unblocks main Build by @diegosouzapw in #3864
fix(sse): clamp Gemini thinking budget to model cap (#3842) by @diegosouzapw in #3865
feat(mimocode): per-account proxy support for multi-account round-robin by @pizzav-xyz in #3837
chore(deps): bump aquasecurity/trivy-action from 0.28.0 to 0.36.0 in /.github/workflows in the github_actions group across 1 directory by @dependabot[bot] in #3862
Release v3.8.25 by @diegosouzapw in #3863
Release v3.8.25 by @diegosouzapw in #3866

New Contributors

@lukmanc405 made their first contribution in #3849

Full Changelog: v3.8.24...v3.8.25