[3.8.25] — 2026-06-14
✨ New Features
- feat(compression): pluggable compression engines + async pipeline + Compression Studios — a new prompt-compression subsystem with selectable engines (Lite / Aggressive / Ultra), an asynchronous compression pipeline wired into the chat core, and "Compression Studios" tooling for inspecting and tuning compression. (#3848)
- feat(compression-ui): unified compression configuration UI — a Compression Hub with per-engine pages (Lite / Aggressive / Ultra), a combos editor, a dedicated sidebar entry, and live-WS default-on. (#3860)
- feat(security): prompt-injection guard across every LLM route + red-team suite — the prompt-injection guard now runs on all LLM routes (chat, responses, embeddings, images, audio, rerank, search, moderations, videos, music) with a shared input sanitizer and a promptfoo-based red-team suite (Quality Gates Fase 8 · Bloco D). (#3857)
- feat(kiro): live per-account model discovery — Kiro now discovers each account/tier's entitled models via CodeWhisperer
ListAvailableModels(region-matched, with a static-catalog fallback). (#3836 — thanks @artickc) - feat(gemini/vertex): surface Veo video models in dynamic discovery — Veo video models (
predictLongRunning) now appear in Gemini/Vertex dynamic model discovery. (#3839 — thanks @artickc) - feat(mimocode): per-account proxy for multi-account round-robin — each mimocode account can route through its own proxy (resolved per account by fingerprint via
runWithProxyContext), with a "Distribute proxies" UI helper. (#3837 — thanks @pizzav-xyz) - feat(intelligence): expose Arena ELO sync as a feature flag — the LM Arena ELO leaderboard sync is now toggleable (
ARENA_ELO_SYNC_ENABLED, DB-override + env fallback). (#3821 — thanks @rdself)
🐛 Fixed
- test(oauth): prove refresh_token preservation for the real gemini-cli / antigravity dispatch — the #3679/#3766 regression test used a synthetic provider that routes through the generic
tokenUrlpath, so the fix was never proven for the actual Google-family providers, which dispatch throughrefreshGoogleToken()against the hardcodedOAUTH_ENDPOINTS.google.token. Added a test that drivescheckConnectionthrough the realgemini-cli/antigravitypath (redirecting the Google token endpoint to a local server returninginvalid_grant) and asserts therefresh_tokenis preserved (not nulled) — confirming these connections are not spuriously destroyed on a failed refresh. (#3850 — thanks @3xa228148) - fix(oauth): clear setup message for GitLab Duo instead of "Internal server error" — adding a GitLab Duo connection without a registered OAuth client returned an opaque
Internal server errorat the Add Connection step.buildAuthUrlthrew whenGITLAB_DUO_OAUTH_CLIENT_IDwas missing, and the route swallowed it into a generic 500. It now returnsnull(mirroring the Qoder provider) and the authorize route surfaces an actionable message: register an OAuth app athttps://gitlab.com/-/profile/applicationswith redirect URIhttp://localhost:20128/callbackand scopesai_features read_user, then setGITLAB_DUO_OAUTH_CLIENT_ID. (#3861 — thanks @sidinsearch) - fix(db): persist the "Keep latest backups" retention setting — changing the backup-retention count in Settings → Database backup retention had no effect: it always snapped back to 20 on refresh (and editing
.envpost-start was ignored too, sinceprocess.envisn't reloaded).getDbBackupMaxFiles()only read theDB_BACKUP_MAX_FILESenv var — there was no setter and no persisted value. The value now round-trips through a dedicatedkey_valuestore (getDbBackupMaxFilesprecedence: env override → persisted UI value → default 20), and the "Clean old backups" action persists the chosen count. Existing installs keep the historical default of 20 until explicitly changed. (#3834 — thanks @netstratego) - fix(sse): clamp Gemini thinking budget to the model's real cap (
reasoning_effort/effort=high400) — translating OpenAIreasoning_effort=high(and Claude-Codeoutput_config.effort=high) to a Gemini target sent a hardcodedthinkingBudget: 32768, which exceeds Flash-tier Gemini's real max of 24576 → upstream HTTP 400 (thethinkingLevel=highpath already used 24576 and worked on the same model).gemini-2.5-flashnow declares its realthinkingBudgetCap(24576) so the existingcapThinkingBudget()chokepoint actually clamps, and the Claude→Geminioutput_config.effortpath — which previously sent the raw value with no cap at all — now routes through the same clamp (pro-tier, real cap 32768, is left untouched). (#3842 — thanks @andrea-kingautomation) - fix(intelligence): run pricing + models.dev sync from the live startup path — like the Arena ELO sync (v3.8.24), the external pricing sync (
PRICING_SYNC_ENABLED) and the models.dev capability sync (Settings → AI toggle) were only initialized fromserver-init.ts, which the Next standalone runtime never executes — and models.dev had no caller at all. Their toggles were inert in production. Both are now initialized frominstrumentation-node.ts(self-gated, opt-in preserved, non-blocking, never fatal). (thanks @diegosouzapw) - test(proxy): guard the per-connection 'direct' bypass over a global proxy + clearer label — the per-connection "Proxy Off" toggle (
proxyEnabled: false) already overrides a configured global proxy (resolveProxyForConnectionshort-circuits tolevel: "direct"before the global step). Added an explicit regression test proving the bypass beats a global assignment (and round-trips on re-enable), and relabeled the UI to "Direct (bypass proxy)" so operators recognize it. Closes the verification gap in #2996. (thanks @diegosouzapw) - feat(connections): per-connection "disable cooldown" opt-out — a connection can now opt out of the transient cooldown (
providerSpecificData.disableCooling, with a toggle in the Edit Connection modal). When set, a recoverable failure still records the error/backoff but does not take the connection out of rotation, so it stays eligible for selection — useful for a primary key you never want parked on a blip. Terminal states (banned / expired / credits_exhausted) still apply. (#2997 — thanks @diegosouzapw) - fix(combo): restore sessionless combo stickiness + reasoning-aware readiness (504 / TPS regression after v3.8.14) — #3399 (v3.8.16) replaced the
<omniModel>-tag combo pinning with a server-side context-cache pin gated on a clientsessionId. Clients that send no session id (most OpenAI-compatible tools) lost combo stickiness, so combos re-ran strategy selection every turn → upstream prompt-cache misses → cold high-reasoning starts (~78s) → intermittent[504] Upstream request did not return response headers+ TPS collapse (only on combos). The pin now falls back to a stable per-conversation fingerprint (extractSessionAffinityKey(body)) when no session id is present — only whencontext_cache_protectionis on, so #3399's anti-leak behaviour is preserved. Separately, the stream-readiness window now grants the +30s reasoning budget unconditionally for high-reasoning Codex GPT-5.x (small high-reasoning prompts were 504-ing at the 80s base regardless of stickiness). (#3825 — thanks @bypanghu) - test(combo): cover the
skipProviderBreakerconsumer gate — the producer was tested but the consumer (whether a failed combo target trips the whole-provider circuit breaker) was not; the breaker decision is now an exported pure predicate (shouldRecordProviderBreakerFailure, behaviour-identical) with direct tests asserting aconnection_cooldown503 does not trip the breaker while a plain 503 does. Closes another deferred test gap from #2743. (thanks @diegosouzapw) - fix(providers): surface the real Devin error + correct the Windsurf auth instructions — Devin chat returned a generic 502 "Invalid SSE response for non-streaming request" that swallowed the real cause (e.g. "Devin CLI not found"): an error-only SSE chunk (no
choices) is now propagated with its sanitized message. The Windsurf "Visit windsurf.com/show-auth-token" instruction (the bare URL shows no token without an IDE-supplied?state=) now directs users to theWindsurf: Provide Auth Tokencommand-palette flow. (#3324 — thanks @mikmaneggahommie) - fix(grok-web): clearer 403 message for anti-bot / IP-reputation blocks — a Grok Web subscription validating from a flagged datacenter/VPS IP got a 403 that read like an invalid cookie, sending users to chase a cookie that was actually fine. A non-auth 403 (Cloudflare challenge / anti-bot body) now returns a message stating the cookie is likely OK and the block is IP-reputation-based — retry from a residential IP or configure a proxy (auth-shaped 403s keep the re-paste guidance). (#3474 — thanks @friedtofu1608)
- fix(db): make the mass-pending-migrations safety threshold env-overridable — restoring a backup DB from an older version could trip "Detected N pending migrations … threshold is 50" with no way to override the hardcoded
50. The threshold is now configurable viaOMNIROUTE_MAX_PENDING_MIGRATIONS(resolved at startup;0disables the check). (#3416 — thanks @samuraiIT) - test(proxy): cover the Vercel-relay
proxyFetchpath — net-new tests forbuildVercelRelayHeadersand thevercel-type relay short-circuit (x-relay-target/-path/-auth, TCP-skip, missing-auth fail-closed), closing one of the deferred test gaps tracked in #2743. (thanks @diegosouzapw) - fix(cli): surface
omniroute runtime repairin the native-module error messages — after a Node major upgrade,better-sqlite3's prebuilt binary mismatches the ABI and the service can crash-loop; the error only mentionednpm rebuild better-sqlite3(which fails for global / no-toolchain installs). The startup + SQLite error hints now also point to the existing self-heal commandomniroute runtime repair(rebuilds into a user-writable runtime), and a top-levelomniroute repairalias was added. (#3476 — thanks @Rahulsharma0810) - fix(antigravity): per-request Pro-family upstream-id fallback chain (
gemini-3.1-pro-high400) — Antigravity silently renamed the Gemini 3.1 Pro-high upstream id, sogemini-3.1-pro-highstarted returning HTTP 400 (while-lowstill worked) and the live id can't be determined statically (competitor proxies disagree). The executor now retries alternative ids on a 400 (gemini-3.1-pro-high→gemini-pro-agent→gemini-3-pro-high, analogous for pro-low), bounded and only on a 400, with zero extra cost on the happy path; the 1:1 tier-passthrough invariant is preserved (the chain is request-time, not a static alias remap). (#3786 — thanks @aliaksandrsen) - fix(sse): retry once on an early stream close (
STREAM_EARLY_EOF) for single-model requests — flaky OpenAI-compatible upstreams (e.g. NVIDIA NIM with minimax-m3 / qwen3.5 / glm-5.1) intermittently send HTTP 200 then close the SSE with zero useful frames, surfacing as a 502 "Stream ended before producing useful content". Only Antigravity got an early-close retry; every other provider returned the 502 immediately on the non-combo single-model path. A bounded one-retry (early-close only — not readiness-timeout — and without marking the account unavailable) now generalizes it. (The separate qwen-web validation SSRF part of the same report was already fixed in v3.8.24, #3767.) (#3758 — thanks @Svatosalav) - fix(models): preserve eye-hidden models across auto-sync / import — hiding models via the visibility (eye) toggle to keep only a combo's models was undone on every model import or auto-sync, which re-showed all of them. The sync re-import treated "hidden" identically to "deleted" and dropped both; a distinct
isDeletedmarker now separates the trash/delete path (still dropped on re-import, #3199) from the eye toggle (preserved as listed-but-hidden), and eye-hidden models are no longer re-aliased into the routable catalog on sync. (#3782 — thanks @xenstar) - fix(providers): correct the lmarena cookie hint (
session→arena-auth-prod-v1) — the lmarena credential hint asked for a cookie namedsession, but lmarena.ai's real auth cookie isarena-auth-prod-v1, so users who pasted onlysession=…hit validation failures. The credential name, placeholder and storage keys now use the correct name (the legacysessionkey is retained for back-compat with already-saved credentials). (#3810 — thanks @xspylol) - fix(reasoning): normalize OpenAI-compatible
maxeffort toxhighby default — OpenAI-compatible providers do not accept literalmax, but some upstreams (for example DeepSeek through OpenRouter) supportxhigh;maxnow maps toxhighunless the target model explicitly opts out ofxhigh, with Claude alias variants still honoring the canonical Claude opt-out list. (#3826 — thanks @rdself) - fix(combo): return the replay response on the round-robin streaming path — a round-robin combo with a streaming target returned a body already locked by the readiness peek, surfacing as a 500 "ReadableStream is locked"; the round-robin path now returns the replay clone like the priority path does. (#3811 — thanks @0xtbug)
- fix(claude): strip the reasoning-effort suffix from Claude model ids — Claude ids carrying an effort suffix (
…-low……-max) 404'd upstream and tripped the circuit breaker into a misleading "rate-limited" state; the suffix is now stripped before dispatch. (#3807 — thanks @zhiru) - fix(sse): flush routed SSE chunks promptly (ping/zombie readiness filter) — combo stream-readiness now filters ping/zombie frames so routed SSE chunks stream out without waiting on the readiness window. (#3759 — thanks @rdself)
- fix(models): don't auto-hide transient (rate-limited / timeout) failures on Test All — a parallel Test All across many models could rate-limit an account and auto-hide every model that 429'd / timed out (dropping them from
/v1/models); transient failures now surface an error state but stay visible. (#3849 — thanks @lukmanc405) - fix(quota): surface OpenCode Go's missing quota-API as a latched diagnostic — OpenCode Go keys whose quota endpoints return 404/401 no longer hammer the dead endpoints; the gap is latched with a clear message and an
OMNIROUTE_OPENCODE_GO_QUOTA_URLoverride hint. (#3838 — thanks @adivekar-utexas) - fix(pricing): add the missing Kiro model pricing rows — Kiro models the registry serves (e.g.
claude-sonnet-4.6) had no pricing row and reported $0.00; the rows were added. (#3835 — thanks @artickc) - fix(ui): render country flags via flagcdn SVGs for Windows compatibility — Windows doesn't render regional-indicator flag emoji; flags now use flagcdn SVGs with an emoji fallback. (#3814 — thanks @rafacpti23)
- fix(ui): expand the request log table with a vertical resize handle — the request log table now shows ~10 rows and can be resized vertically. (#3820 — thanks @rafacpti23)
- fix(i18n): translate the missing
embeddedServiceskeys across 37 locales — theembeddedServicesstrings showed__MISSING__in 37 locales; they are now translated. (#3819 — thanks @rafacpti23)
🔒 Security & Hardening
- fix(security): CCR cross-tenant IDOR — per-principal scope store + bounded memory — the compression CCR scope store was shared across principals, allowing cross-tenant reads; it is now scoped per-principal with bounded memory. (#3859)
- feat(supply-chain): build provenance, SBOM, Trivy scan & OpenSSF Scorecard (advisory) — added npm build provenance, a CycloneDX SBOM, Trivy image scanning, and an OpenSSF Scorecard workflow (Quality Gates Fase 8 · Bloco A, advisory). (#3824)
🧹 Internal / Quality / Docs
- Consolidate the email-privacy control into Settings → Appearance — the per-page email-privacy toggles were replaced by a single global switch. (#3822 — thanks @rdself)
- docs(ui): clarify the routing-settings copy (strategy sync + sticky limit) — (#3843 — thanks @adivekar-utexas)
- Quality Gates — Fase 7 & 8 — promoted the dead-code / cognitive-complexity / type-coverage ratchets to blocking, installed advisory CI scanners (gitleaks / osv / actionlint / zizmor), and added property + golden + SSE-correctness tests and a runtime-resilience (chaos / heap-growth / k6 soak) suite. (#3809, #3858, #3808, #3854)
- fix(docs): add MDX frontmatter to
SUPPLY_CHAIN.md— the new security doc lacked thetitle:frontmatter that MDX pages require, which broke the production Build + Docker Hub publish; the frontmatter was added. (#3864) - chore(deps): bump
aquasecurity/trivy-action0.28.0 → 0.36.0 (#3862) - chore(quality): reconcile the file-size ratchet baseline for Prettier-inflated v3.8.25 fixes +
chat.tsgrowth — the per-file size baseline was re-frozen to absorb the formatting/line-count growth from this cycle's chat-core and combo fixes (manual edits, never an automatic upward ratchet). (#3823, #3833 — thanks @diegosouzapw) - test(suite): green the unit suite at release time — align stale tests to this cycle's intended behavior + de-flake two new suites — release-gate housekeeping: updated tests that lagged behind intended behavior changes (OpenCode Go latched quota message #3838, the email-privacy control consolidated into Settings #3822, SOCKS5 default-on proxy-type message, the
[id]provider-detail strangler-fig decomposition #3501, Vertex Express-mode keys, Antigravity discovery using a current user-callable model id) and the same-provider 503 fall-through resilience test; de-flaked the compression benchmark reproducibility test (sequential passes) and the ServiceSupervisor crash test (poll instead of fixed sleep). No production code changed. Also documentedOMNIROUTE_MAX_PENDING_MIGRATIONS(#3416) in.env.example+ENVIRONMENT.md. (thanks @diegosouzapw)
What's Changed
- fix(intelligence): run pricing + models.dev sync from the live startup path by @diegosouzapw in #3806
- fix(claude): strip reasoning-effort suffix from Claude model ids (VS Code Effort slider 404) by @zhiru in #3807
- fix: stream routed SSE chunks promptly by @rdself in #3759
- fix(providers): correct lmarena cookie hint to arena-auth-prod-v1 (#3810) by @diegosouzapw in #3815
- fix(models): preserve eye-hidden models across auto-sync (#3782) by @diegosouzapw in #3816
- fix(sse): retry once on STREAM_EARLY_EOF for single-model requests (#3758) by @diegosouzapw in #3817
- fix(antigravity): per-request Pro-family upstream-id fallback chain (#3786) by @diegosouzapw in #3818
- chore(quality): re-baseline chat.ts file-size (#3758 follow-up) by @diegosouzapw in #3823
- fix(cli): surface 'omniroute runtime repair' in native-module errors (#3476) by @diegosouzapw in #3828
- test(proxy): cover Vercel-relay proxyFetch path (#2743 gap c) by @diegosouzapw in #3831
- fix(db): env-overridable mass-pending-migrations threshold (#3416) by @diegosouzapw in #3827
- fix(grok-web): clearer 403 message for anti-bot/IP-reputation blocks (#3474) by @diegosouzapw in #3830
- fix(providers): surface real Devin error + fix Windsurf auth instructions (#3324) by @diegosouzapw in #3829
- test(combo): cover skipProviderBreaker consumer gate (#2743 gap d) by @diegosouzapw in #3832
- chore(quality): reconcile file-size baseline (prettier inflation) by @diegosouzapw in #3833
- fix(reasoning): map max effort to xhigh by default by @rdself in #3826
- Expose Arena ELO sync in feature flags by @rdself in #3821
- fix(combo): sessionless combo stickiness + reasoning-aware readiness (#3825) by @diegosouzapw in #3847
- Consolidate email privacy control by @rdself in #3822
- fix(combo): return replay response in round-robin to avoid locked stream 500 by @0xtbug in #3811
- fix(ui): use flagcdn SVGs for Windows flags compatibility in language selector by @rafacpti23 in #3814
- fix(ui): expand request log table to show ~10 rows with vertical resize by @rafacpti23 in #3820
- fix(i18n): translate MISSING embeddedServices keys in 37 locales by @rafacpti23 in #3819
- fix(pricing): add missing Kiro model pricing rows (Sonnet 4.6, auto-kiro, deepseek-3.2, minimax-m2.5, glm-5) by @artickc in #3835
- fix(models): don't auto-hide rate-limited/timeout models on Test All by @lukmanc405 in #3849
- fix(quota): surface OpenCode Go missing-quota-API as a latched warning by @adivekar-utexas in #3838
- feat(gemini/vertex): surface Veo video models in dynamic discovery (Gemini, Vertex, Vertex Express) by @artickc in #3839
- feat(kiro): live per-account model discovery via ListAvailableModels (Builder ID + IAM Identity Center) by @artickc in #3836
- clarify routing settings copy by @adivekar-utexas in #3843
- feat(connections): per-connection disable-cooldown opt-out (#2997) by @diegosouzapw in #3852
- test(proxy): guard per-connection direct bypass over global proxy (#2996) by @diegosouzapw in #3853
- feat(compression): compression engines + async pipeline + Combo/Compression Studios by @diegosouzapw in #3848
- Fase 8 · Bloco C — resiliência runtime (chaos + heap-growth + k6 soak) by @diegosouzapw in #3854
- Fase 8 · Bloco D — injection-guard em todas as rotas LLM + red-team by @diegosouzapw in #3857
- Fase 8 · Bloco A — supply-chain (provenance, SBOM, Trivy, Scorecard) advisory by @diegosouzapw in #3824
- fix(security): CCR cross-tenant IDOR — scope store per-principal + bound memory by @diegosouzapw in #3859
- Fase 8 · Bloco B — suíte de correção (property + golden + SSE-correctness) by @diegosouzapw in #3808
- Fase 7 — instalar scanners advisory no CI (gitleaks/osv/actionlint/zizmor) by @diegosouzapw in #3858
- Fase 7 finalize — 3 catracas advisory→bloqueante + re-baseline consciente v3.8.25 by @diegosouzapw in #3809
- Release v3.8.25 by @diegosouzapw in #3805
- fix(docs): SUPPLY_CHAIN.md MDX frontmatter — unblocks main Build by @diegosouzapw in #3864
- fix(sse): clamp Gemini thinking budget to model cap (#3842) by @diegosouzapw in #3865
- feat(mimocode): per-account proxy support for multi-account round-robin by @pizzav-xyz in #3837
- chore(deps): bump aquasecurity/trivy-action from 0.28.0 to 0.36.0 in /.github/workflows in the github_actions group across 1 directory by @dependabot[bot] in #3862
- Release v3.8.25 by @diegosouzapw in #3863
- Release v3.8.25 by @diegosouzapw in #3866
New Contributors
- @lukmanc405 made their first contribution in #3849
Full Changelog: v3.8.24...v3.8.25