[3.8.23] — 2026-06-13
✨ New Features
-
Emergency budget fallback: opt-out env switch
OMNIROUTE_EMERGENCY_FALLBACK(#3741 — thanks @zoispag): adds anOMNIROUTE_EMERGENCY_FALLBACKenvironment variable that disables the budget-exhaustion emergency reroute tonvidia/openai/gpt-oss-120bentirely when set tofalseor0. Default behavior (enabled) is unchanged. -
Auto-Combo: live model intelligence scoring via Arena ELO + models.dev (#3660 — thanks @pizzav-xyz): replaces the static fitness lookup with a 5-layer resolution chain (user override → Arena ELO → models.dev tiers → hardcoded map → neutral fallback). A sync pipeline auto-fetches Arena AI leaderboard ELO scores and derives intelligence tiers from models.dev capabilities; combo picks now update as leaderboard rankings change without any manual configuration.
-
Vertex AI: dynamic model discovery (#3712 — thanks @artickc): the
vertexprovider now queries the Generative Language models API at runtime to surface the full account catalog — including image-generation models (Imagen,gemini-*-image), embeddings, and partner models — instead of returning only the small hardcoded registry list. -
Vertex AI: self-tracked USD spend on the Limits page (#3724 — thanks @artickc): since the Google Cloud Billing API is inaccessible via the proxy credential, Vertex connections now track their own cumulative USD spend locally (based on token-cost accounting) and display it on the Limits page as "$ used since account added."
-
Gemini: rate-limit metadata for known per-model RPM/RPD caps (#3686 — thanks @hartmark): injects known rate-limit headers (RPM/RPD) for Gemini models that carry per-model limits (e.g. Gemma 4's 15 RPM / generous RPD), so the cooldown engine applies them correctly instead of locking out the whole account on daily-limit hits.
-
Model Lockout: full settings UI with success-decay recovery (#3629 — thanks @Chewji9875): end-to-end wiring of the per-model lockout feature — settings UI (enable/disable, configure thresholds), backend integration, structured error classification, and a success-decay mechanism that gradually recovers a locked model's fitness as successful calls accumulate. Lockout now applies to all providers when enabled, not just per-model-quota providers.
-
Provider display modes — All / Configured / Compact (#3743 — thanks @rdself): adds a three-state display mode control to the Providers page. "All" shows every registered provider; "Configured" shows only providers with at least one connection; "Compact" shows configured providers in a condensed card layout for denser views.
-
API key cost drilldown + quota % used (#3742 — thanks @Witroch4): the API Keys page now shows a per-key cost breakdown and the percentage of quota consumed for each key.
🔧 Bug Fixes
-
@omniroute/opencode-pluginbundled in the npm tarball +omniroute setup opencodeCLI command (#3726 — thanks @herjarsa): the plugin was never compiled as part of the publish pipeline, requiring manual extraction. Now ships pre-built inside theomniroutepackage and installed viaomniroute setup opencode(copies plugin into~/.config/opencode/plugins/omniroute/, updatesopencode.jsonidempotently). Also fixesprovider.modelsbaseURL resolution — checks_provider.options.baseURLas a third fallback so partner/tiered providers no longer return zero models. (#3711) -
MiMoCode 403 "Illegal access" fixed (#3728 — thanks @felipesartori): the Xiaomi free endpoint gates requests on a recognized MiMoCode system-prompt signature; OmniRoute forwarded raw requests without the marker, causing 403 on every call. The executor now injects the required anti-abuse signature.
-
"Test all models" flow: i18n crash, status icons, auto-hide (#3729 — thanks @felipesartori): three bugs in the provider-detail test-all-models flow —
providerText()crash because thetestAllResultstemplate requires{ok, total}but callers passed{ok, error}; missingonline/offlinestatus icons on model rows; results panel not auto-hiding after run completes. -
OAuth token-refresh invalidation loop fixed (#3692 — thanks @diegosouzapw):
refreshClaudeOAuthTokenreturnednullinstead of the error sentinel on non-canonical 400 bodies, causing the caller to retry every 60 seconds — observed as 1,352 consecutive refresh attempts on one Claude account. Fixed alongside hardening ofsafeResolveProxy(proxy resolution errors now warn instead of silently falling back to DIRECT) and adding egress-IP visibility tosafeLogEvents. -
safeLogEventsasync hotfix (thanks @diegosouzapw): PR #3692 introduced alazy await import(proxyEgress)inside a syncsafeLogEvents— an ES syntax error that broke every consumer loadingchatHelpersviatsxand caused 14 tests to fail at module load. MadesafeLogEventsasync;void-ed the singlechat.tscall site. -
Kiro: quota tracking for IAM Identity Center accounts (#3722 — thanks @artickc):
getKiroUsagereturned "0 used" for IAM Identity Center accounts (andkiro-cliimports) because those connections frequently lack a persistedprofileArn. Now falls back to a name-based profile lookup so quota displays correctly. -
Empty Claude SSE stream now surfaces a real error (#3689 — thanks @TechNickAI): when a Claude stream completed with lifecycle events but no content block, the proxy returned a synthetic
"[Proxy Error] The upstream API returned an empty response"as a successful assistant message. Now emits a proper SSE error event; the missing-finalizer synthetic path is preserved for streams that already produced content. -
Vertex AI Express-mode API keys (#3690 — thanks @artickc): the Vertex executor rejected every non-JSON credential with "Vertex AI requires a valid Service Account JSON." Now accepts Express-mode API key strings (
AIza*) alongside Service Account JSON, routing them through the correct token endpoint. -
Anthropic: strip
top_pwhentemperatureis set (#3691 — thanks @zhiru): Anthropic API rejects requests containing bothtemperatureandtop_p; VS Code's Claude extension sends both in every request, causing 400s on all routed calls. The OpenAI→Claude translator now dropstop_pwhentemperatureis present. -
Combo reasoning token buffer: conservative application + feature flag (#3700 — thanks @rdself): tightens the #3588 buffer (only applies when the model is explicitly thinking-capable, has a non-default known output cap, and the full buffered value fits inside that cap) and adds a
reasoningTokenBufferEnabledfeature flag in combo defaults so users can fully disable it from Settings. -
Emergency budget fallback: cross-provider credential leak fixed (#3699 — thanks @diegosouzapw): the executor-level emergency hop re-sent the failing provider's API key to the emergency provider's endpoint (e.g. the OpenAI
Authorizationheader going tointegrate.api.nvidia.com). Now orchestrated exclusively by the routing layer, which resolves credentials for the emergency provider via account selection and no longer fires inside combo targets. -
/v1/messages/count_tokensnow honors the connection's proxy assignment (#3699 — thanks @diegosouzapw): token count calls went DIRECT regardless of configured proxies, leaking the host IP for proxy-isolated setups. Now wraps execution inrunWithProxyContext, exactly like chat execution. -
Gemini: context-mode fallback for signatureless tool calls (#3688 — thanks @diegosouzapw): fixes HTTP 400 on multi-turn thinking-model tool calls when
thought_signatureis unavailable — standard Gemini provider now falls back to context mode instead of sending the unsigned call. -
Antigravity: preserve
gemini-3.1-proHigh/Low budget tiers (#3696 — thanks @diegosouzapw): upstream accepts the suffixed ids; stop collapsing to baregemini-3.1-pro. -
Stream combo: fail over on empty/content-filtered response (#3685 — thanks @diegosouzapw): streaming combos now route to the next target instead of surfacing a blank reply.
-
Qwen Web: migrated to v2 chat API (#3723 — thanks @diegosouzapw): the legacy
/api/chat/completionsendpoint was retired upstream returning504HTML from Alibaba's gateway for all requests. The executor now uses the two-step v2 flow (/api/v2/chats/new→/api/v2/chat/completions?chat_id=), replays the full browser cookie jar (cna + ssxmod_itna/itna2 + token) required by Alibaba's WAF instead of only a Bearer token, parses phase-based SSE (think→reasoning, answer→content), and refreshes the model catalog to current ids (qwen3.7-max,qwen3.7-plus,qwen3.6-plus; legacy ids kept as aliases). 17 unit tests. (Closes #3288) -
Responses API:
streamdefaults tofalsewhen omitted (spec compliance) (#3708 — thanks @diegosouzapw):/v1/responsesrequests that omitstreamno longer 502 (STREAM_EARLY_EOF) when the upstream returns a valid JSON response.resolveStreamFlagnow applies the OpenAI Responses API spec default (stream=false) in addition to the existing Anthropic Messages API default — previously onlysourceFormat=claudetriggered this path, leavingsourceFormat=openai-responsesto fall through to the wildcard-Accept heuristic (Accept: */*→ streaming intent), which caused spec-compliant upstreams that return JSON to appear as a dead stream. Codex CLI (always sendsstream: true) and explicit SSE clients (Accept: text/event-stream) are unaffected. -
Semantic cache: scope to requesting API key (#3740 — thanks @diegosouzapw): two callers with different API keys sending the same prompt and model no longer receive each other's cached responses.
generateSignaturenow includes theapi_key_iddimension in the SHA-256 hash; unauthenticated requests (no API key) remain isolated from keyed requests. Existing cache entries (generated without the key dimension) are cleared by migration098. -
Model-family fallback: dot-notation model IDs now resolve correctly (thanks @diegosouzapw):
getNextFamilyFallbacknormalizes dots to hyphens for the initial lookup but also falls back to the bare model name, supporting IDs likegemini-3.1-pro-highwhose dots are part of the literal name. Previously,gemini-3.1-pro-highsilently returned null and bypassed the entire family.
♻️ Code Quality
- Dashboard god-component (#3501): Phases 1g → 1t complete — ≤800 LOC target reached (#3717, #3721, #3725, #3727 — thanks @diegosouzapw): four extraction phases bring
ProviderDetailPageClient.tsxfrom 4,062 to 781 LOC — the ≤800 target set at the start of the refactor. Extracted OAuth flow helpers, quota display, traffic-inspector panel, logs viewer, combo-target editor, and remaining inline UI into standalone components underproviders/[id]/components/.
🌍 Internationalization
- zh-CN: comprehensive Simplified Chinese translation improvements (#3736 — thanks @sdfsdfw2): broad pass on Simplified Chinese UI strings for accuracy and consistency.
📝 Maintenance
-
CI: bump GitHub Actions artifacts/cache actions to latest (thanks @diegosouzapw):
actions/download-artifact4→8 (#3733),actions/cache4→5 (#3734),actions/upload-artifact4→7 (#3735). -
File-size ratchet baseline reconciled (#3705 — thanks @diegosouzapw): freezes 27 inherited/previously-grown files at their current LOC and registers
providerLimits.tsin the gate; ongoing shrink tracked via #3501. -
docs: add FUNDING.yml and README Support section (#3698 — thanks @diegosouzapw)
-
docs(changelog): restore
#3590bullet lost on the v3.8.20 release branch (thanks @diegosouzapw): the fix reachedmainpre-tag via cherry-pick#3591, but its changelog bullet only existed onrelease/v3.8.20after the squash-merge; restored per the 2026-06-12 release-branch leftover audit.
✅ Tests
- Combo strategy fallback coverage (
tests/unit/combo-strategy-fallbacks.test.ts, 11 tests): fill-first / p2c / random / cost-optimized / strict-random fallback paths (previously happy-path only), price-tie stability, stale strict-random deck degradation, unknown-strategy normalization to priority, and circuit-breaker HALF_OPEN recovery inside the combo loop +preScreenTargets(lazy-recovery contract). #1731fast-skip suite restored (tests/integration/combo-provider-exhaustion.test.ts): the five skipped tests were rewritten against the current routing policy (quota-exhausted 429 marks the provider for the request; transient 429 retries other connections; connection errors skip per-connection; nothing persists across requests) and re-enabled — 8/8 green.- Proxy context passthrough (
tests/integration/proxy-context-passthrough.test.ts): combo targets each execute under their own connection's proxy;count_tokensruns inside the connection's proxy context.
What's Changed
- Release v3.8.23 by @diegosouzapw in #3693
Full Changelog: v3.8.22...v3.8.23