✨ New Features
- feat(oauth): remote Antigravity login via local helper + paste-credentials — Antigravity (and other Google "native/desktop" OAuth providers) use Google's
firstparty/nativeappconsent, which only releases the auth code when the loopback redirect (127.0.0.1:<port>) is reachable from the approving browser. On a remote VPS install that loopback lives on the server, so the consent hangs forever and never emits a code — the "paste the callback URL" fallback has nothing to paste (a Google-side constraint, identical in upstream 9router). A newomniroute login antigravityCLI helper runs the OAuth on the user's own machine (where 127.0.0.1 works), exchanges the code, and prints a single-lineomniroute-cred-v1.…credential blob; the dashboard's Antigravity Connect → Step 2 field now accepts that blob (alongside callback URLs) and persists the connection via a newpaste-credentialsaction (server-side onboarding, provider-allowlisted, with the blob's embedded provider required to match the route). The SSH local-forward tunnel is documented as a zero-tooling alternative. Seedocs/guides/REMOTE-MODE.md. (#5203) - feat(agent-bridge): graceful cert-install fallback for containers / headless — when the MITM root CA can't be installed into the system trust store automatically (Docker / headless / no sudo / read-only trust store), the Agent Bridge no longer hard-fails on start with a generic "Certificate install failed". It now starts in skip mode and the dashboard surfaces a platform-specific manual-install guide (plus a CA download link) so the operator can trust the certificate by hand. The trust-cert endpoints return a structured
{ skippable, manualGuide }response (HTTP 200) for environment failures instead of a 500; an explicit user cancellation is still reported distinctly. (#4546 — thanks @phuchptty) - feat(compression): CCR ranged/grep/stats retrieval (ReDoS-safe, backward-compat) — extends the
omniroute_ccr_retrieveMCP tool and/api/compression/retrieveendpoint with optionalrange(byte/line slice),grep(ReDoS-safe literal or bounded-pattern match against stored lines), andstats(byte/line/word counts) parameters so agents pull exactly the slice or summary they need instead of re-expanding the entire stored block. All parameters are optional — no parameters returns the full block byte-identical to the existing behavior; the CCR store written by ionizer/fuzzy/headroom is fully compatible. Sixth item of the compression roadmap. (#5187) - feat(compression): TOON best-of-N candidate encoder + encoder A/B table — adds
@toon-format/toonas a candidate encoder in the headroom compression engine via a best-of-N scheme: both GCF and TOON run per prompt and the shorter result is kept, rather than hard-swapping encoders (GCF already encodes the headroom block and TOON is not a lossless universal win). An encoder A/B comparison table (GCF vs TOON vs JSON — bytes and cl100k tokens) is now surfaced in the compression studio. Fifth item of the compression feature-extraction roadmap (bench: #5080, gate: #5127, fuzzy/gate: #5143, ionizer: #5148). (#5163)
🔧 Bug Fixes
-
fix(oauth): Antigravity refresh no longer nulls the stored refresh_token on an empty upstream response — Google's OAuth token endpoint uses non-rotating refresh tokens: a refresh response normally OMITS
refresh_tokenand occasionally returns it as an empty string. The Antigravity executor'srefreshCredentialsusedtypeof tokens.refresh_token === "string" ? tokens.refresh_token : credentials.refreshToken, and becausetypeof "" === "string"is true, an empty-string response overwrote the good token with""— nulling it on first refresh. The check now treats a non-string or empty value as absent and preserves the stored token, matching the canonicalrefreshGoogleToken(tokens.refresh_token || refreshToken) semantics. (#3850 — thanks @3xa228148) -
fix(api): LAN/Tailscale dashboard access —
ws:CSP scheme, GET-exempt version route, surface combo field errors — three failures when opening the dashboard from a non-loopback host: (1) CSPconnect-srcallowed thews:scheme only for loopback origins, blocking the dashboard'sws://<lan-host>:*Live WebSocket from LAN/Tailscale clients; the barews:scheme is now permitted (symmetric with the barewss:already allowed), kept declarative innext.config.mjswith no global middleware (the project has none by design); (2)GET /api/system/versionwas blocked byLOCAL_ONLY_API_PREFIXESfor all methods despite onlyPOSTspawning child processes (git/npm/pm2) — a newLOCAL_ONLY_API_GET_EXEMPTIONSset exempts safe read methods for this path while keepingPOST/PUT/PATCH/DELETEstrictly loopback-only; (3)COMBO_002validation errors only surfaced the generic message —firstField/firstMessageare now extracted from the first Zod issue and included in the response body. (#5083 — thanks @KooshaPari for the diagnosis and original PR #5084) -
fix(sse): defer
</think>close so it never leaks beforetool_callsin Claude→OpenAI streaming — when a Claude thinking block was followed by a tool_use block, the translator unconditionally emitted acontent: "</think>"chunk atcontent_block_stop, injecting a spurious assistant text chunk immediately before thetool_callsdelta and corrupting OpenAI-compatible clients (e.g. Kimi Coding). The close marker is now deferred: it is flushed at the firsttext_deltathat follows the thinking block (preserving the #4633 / decolua/9router#454 behavior for Claude Code / Cursor) or at stream finish when no tool_calls were collected. Tool-use streams never get atext_deltaafter the thinking block, so</think>is never emitted into content beforetool_calls. (#5123) -
fix(sse): normalize array user-message content in the Command Code executor to prevent upstream 400 — when a client sends a user turn whose
contentis an array of content parts (e.g.[{type:"text",text:"…"}, …]), the raw array was forwarded verbatim to the Command Code upstream, which requiresmessages[N].contentfor theuserrole to be a plain string — resulting inexpected string, received array/ HTTP 400 on DeepSeek V4-Pro and other Command Code models. The user branch ofconvertMessagesnow callsnormalizeContentText()(already used by system, assistant, and tool branches) so multi-part user content is joined to a string before dispatch. Partially addresses (#5166); the 0-output-token symptom on reasoning-only models is tracked separately. -
fix(mcp): return HTTP 404 (not 400) for an unknown/expired Streamable HTTP session id — when an MCP session is terminated or idles out and the client reuses the stale
Mcp-Session-Idheader, the Streamable HTTP transport replied with HTTP 400. The MCP spec (2025-03-26 and 2025-11-25, Session Management) mandates HTTP 404 Not Found in that case, and spec-compliant clients only re-initialize a session on 404 — so the 400 was non-recoverable. The handler now returns 404 for a present-but-unknown session id, while a missing session id on a non-initialize request correctly stays 400. (#5169 — thanks @czer323) -
fix(api): blocking "Auto (Zero-Config)" in Security settings now removes
auto/*from/v1/models— the built-inauto/*combo advertiser (#4164 / #4235) at the top of the models catalog ignoredsettings.blockedProviders, so checking Auto (Zero-Config) under Security → Blocked Providers had no effect and the model picker kept listing everyauto/*entry. The injection loop now skips the entireauto/*block when the system providerauto(its id and alias are bothauto) is blocked, consistent with how every other provider is filtered from the catalog. (#5192 — thanks @WslzGmzs) -
fix(cli): auto-calibrate the server V8 heap from physical RAM instead of a fixed 512MB default — the server was spawned with a hard-coded
--max-old-space-size=512(omniroute serve) or with no heap flag at all (Electron desktop, which then inherited the runtime's low ~512MB default), so RAM-rich machines still OOM-crashed under load (FATAL ERROR: Ineffective mark-compacts near heap limit … ~500MBat code=134) with many providers/accounts and large model catalogs (one report: 16GB RAM, 65 providers, ~100 accounts, ~2600 models). A newcalibrateHeapFallbackMb(os.totalmem())helper derives the default heap as ~35% of physical RAM, clamped to[512, 4096], and is wired into bothbin/cli/commands/serve.mjsandelectron/main.js. An explicitOMNIROUTE_MEMORY_MB(or a pre-set--max-old-space-size) still wins, so the #2939 override contract is unchanged. (#5172, #5160, #5152 — thanks @manchairwang, @Xyzjesus) -
fix(oauth): Antigravity login no longer hangs — fire-and-forget onboarding + bounded post-exchange — the dashboard's Antigravity OAuth login spun indefinitely because
postExchangeawaited theonboardUserretry loop inline (up to 10 × 5 s per attempt, each fetch with no timeout), blocking the/exchangeresponse forever. Matching the upstream 9router web flow:onboardUsernow runs fire-and-forget in a background task; the/exchangeendpoint is bounded by a 10 s hard timeout so it always returns; a progress endpoint lets the dashboard poll onboarding completion state. (#5193) -
fix(antigravity): retry Antigravity accounts by quota family before escalating the combo — when one Antigravity account returns a quota or rate-limit
429for a Gemini model (e.g.gemini-3.5-flash-medium), combo orchestration could prematurely advance to the next combo model instead of trying other eligible Antigravity accounts for the same quota family. Antigravity quota-family awareness is now added to the fallback path so a429on one account triggers a bounded same-model retry across other Antigravity accounts sharing that quota bucket before the combo degrades to a lower-tier model. (#5180 — thanks @Ardem2025) -
fix(translator): accept Claude Messages shape in the non-stream malformed-200 guard — when a Claude client (e.g. Claude Code) is routed to a non-Claude provider, the translated non-streaming response body is in Claude Messages shape (
type: "message", content[]) produced byconvertOpenAINonStreamingToClaude.detectMalformedNonStreamonly recognized OpenAIchoices[].messageand Responses APIoutput[], so this shape fell through toempty_choices→ 502. The guard now recognizes the Claude Messages shape: text, tool_use, and thinking blocks carrying asignaturecount as valid output, while a genuinely emptycontent: []is still flagged. (#5156 — thanks @NomenAK) -
fix(sse): resolve nameless deepseek-web
<tool>blocks via parameter-schema match — whenchat.deepseek.comemits a<tool>block with no<name>child, no JSON bodyname/typekey, and no tag suffix, every name-resolution path inextractCallreturnednulland the raw XML leaked to the client as plain text. A conservative schema-based fallback now compares the block's extracted parameter names against each declared tool's schema keys; if exactly one tool matches, its name is used. Zero or ambiguous (>1) matches still returnnullso no calls are misattributed. (#5154, #5173) -
fix(stream): normalize provider safety finish reasons to
content_filter— Gemini and Antigravity can return safety/prohibited terminal reasons (SAFETY,RECITATION,BLOCKLIST,PROHIBITED_CONTENT) that OpenAI-compatible downstream clients do not recognize. A shared finish-reason normalization helper now maps these to the standardcontent_filtervalue, applied in both the streaming and JSON collection paths for both providers. (#5197 — thanks @rdself) -
fix(responses): normalize non-array Responses API
inputbefore routing — the OpenAI Responses API acceptsinputas a string, object, or list, but OmniRoute only handled list-shaped payloads; a string or objectinputwas silently dropped on the Responses→Chat Completions path. The translator now normalizesinputto a list before dispatch; the Codex-native Responses path also normalizes before forwarding (preventing upstream400 Input must be a list); and the prompt-injection and PII sanitizer extraction paths are guarded against object-valuedinputso security checks do not throw. (#5204 — thanks @wilsonicdev) -
fix(zenmux): normalize vendor-prefixed GLM system roles for Z.AI models — ZenMux exposes Z.AI GLM via vendor-prefixed OpenAI-compatible IDs such as
z-ai/glm-5.2. The existing GLM detection only matched bareglm-*/glmids, sozenmux/z-ai/glm-5.2kept system messages in place; Z.AI rejects compressed histories ending with a system turn beforeassistant(tool_calls) → toolsequences. The fix extends GLM detection to coverz-ai/glm-*prefixes and routes them through the existingnormalizeSystemRolepath. (#5158 — thanks @Thinkscape) -
fix(xai): add OAuth connection test probe + normalize xAI reasoning effort aliases — xAI rejects unsupported reasoning effort values (
max,xhigh) with HTTP 400 after a provider update; the xAI translator now mapsmaxandxhightohighbefore forwarding. Additionally, xAI OAuth connections had no dashboard test configuration, so provider tests returned"Provider test not supported"; a dedicated OAuth test probe is now wired for xAI accounts with regression coverage for the effort normalization. (#5157 — thanks @nguyenxvotanminh3) -
fix(serve): honour
HOSTNAMEfrom.envinstead of hardcoding0.0.0.0—bin/cli/commands/serve.mjsspreadprocess.envinto the child-process environment but immediately overwroteHOSTNAMEwith a literal"0.0.0.0", silently discarding any user-configured bind address even thoughHOSTNAMEis documented in.env.exampleanddocs/reference/ENVIRONMENT.md.dist/server.jsalready readprocess.env.HOSTNAMEcorrectly; only the CLI wrapper was overriding it. The fix appliesprocess.env.HOSTNAME || "0.0.0.0"so the env value takes effect. (#5134, #5170 — thanks @anki1kr / @Angelo90810) -
fix(cli): force
NODE_ENVto match dev/start run mode in the custom Next server — when.env.exampleshipsNODE_ENV=production, startingnpm run devviascripts/dev/run-next.mjsforwarded that value to the programmaticnext()entry, which — unlike thenextCLI — does not normalize it to match the run mode. The resulting production flag caused PostCSS to skip Tailwind's CSS transform, surfacing asModule parse failed: Unexpected character '@'onglobals.css. The custom server now explicitly forcesNODE_ENV=developmentfor thedevpath andNODE_ENV=productionfor thestartpath regardless of.env. (#5189 — thanks @backryun) -
fix(cli): raise dev server Node heap limit to 8 GB to prevent OOM —
npm run devcrashed withFATAL ERROR: Ineffective mark-compacts near heap limit — Allocation failed - JavaScript heap out of memorywhile compiling heavy dashboard routes becausenode scripts/dev/run-next.mjsran on V8's ~4 GB default with no--max-old-space-sizeflag. Thedevnpm script now passes--max-old-space-size=8192at invocation time (the only point where this flag can be set for that process). (#5198 — thanks @backryun) -
fix(cli): re-enable Turbopack as the default
npm run devbundler — PR #4092 forced webpack because an earlier Turbopack 16.2.x panic (internal error: entered unreachable code: there must be a path to a rootinturbopack-core/module_graph) blocked the OmniRoute module graph. That panic no longer reproduces on the pinned Next 16.2.9, soOMNIROUTE_USE_TURBOPACKis flipped from0to1in.env.example, aligning it withdocs/reference/ENVIRONMENT.mdwhich had already documented the default as1. (#5206 — thanks @backryun) -
fix(auth): allow synthetic no-auth fallback for mimocode — mimocode connections without explicit credentials were blocked before reaching the executor. The auth layer now permits a synthetic no-auth fallback for the mimocode provider so credential-free access patterns work as intended. (#5205 — thanks @KooshaPari)
-
fix(combo): reject empty Responses API
output: []as a fail-over trigger — a non-streaming Responses API body withobject: "response"andoutput: []was accepted as a valid HTTP 200 by the combo response-quality validator, allowing a combo target to stop rather than fail over to the next leg. The non-stream validator now inspects Responses-API-shaped bodies before the genericoutputshortcut and rejects an emptyoutput: []asempty_choices; structural non-empty output (e.g.function_call) remains valid. (#5207 — thanks @KooshaPari) -
fix(proxy): close cached dispatchers when clearing the proxy cache — cached proxy and direct-retry dispatchers were not closed on cache clear, leaking open connection handles. The cache-clear path now calls
close()on all evicted dispatchers; dispatcher cache and lifecycle helpers have been extracted from the oversized proxy-dispatcher module into a dedicated helper for reuse. (#5202 — thanks @KooshaPari) -
fix(proxy): coalesce concurrent fast-fail health probes per proxy URL — under high concurrency each simultaneous request opened its own TCP health probe for the same proxy URL, creating a thundering-herd burst. Concurrent proxy fast-fail checks are now coalesced so only one TCP probe runs per proxy URL at a time; the completed-result health cache is preserved so subsequent same-URL checks return immediately. (#5109, #5208 — thanks @KooshaPari)
-
fix(pwa): prefer cached navigation before showing the offline page — the service worker was too eager to display
/offlineon transient navigation failures. It now caches successful navigation responses and consults the cached route or app shell before falling back to/offline;/offlineremains the final fallback when no cached navigation or app shell exists. (#5165, #5209 — thanks @KooshaPari) -
fix(request-logger): never render a negative percentage in the compression badge — when every prompt token was compressed (
totalIn = 0, compressed > 0), the compression pill displayed(-100%)because the badge format hard-coded a leading-before the percentage value. The badge now omits the negative sign in this case, correctly representing the saving as a positive ratio. (#5201 — thanks @KooshaPari) -
fix(dashboard): use amber for home update-step warning icon — the warning-state icon in the home update steps (
HomePageClient.tsx) usedtext-yellow-500(Tailwind#eab308), which has poor contrast on light backgrounds (~1.9:1, below WCAG AA) and is inconsistent with theamberwarning convention used by every sibling element in the same component. Switched totext-amber-500— a one-lineclassNamechange with no behavior change. (#5176)
📝 Maintenance
- test(combo): deterministic context-relay universal-handoff coverage — covers the universal (provider-agnostic) session-handoff path in
context-relay(combo.ts:2099–2139), which previously had only a definition-order assertion and aTODO(phase-2). The test drives the real pipeline via session seams (x-session-id→relayOptions.sessionId→maybeGenerateUniversalHandoff) without live infrastructure. (#5168) - test(combo): end-to-end quota-share DRR routing-decision coverage (matrix parity) — adds the missing E2E test for the
quota-sharestrategy, driving the realhandleChat→ chatCore →selectQuotaShareTarget→ executor pipeline via in-process seams and asserting which connection is dispatched. The DRR selector already had 29 unit tests; this closes the E2E gap and brings quota-share to parity with the 17-strategy public matrix. (#5179) - test(combo): deterministic context-relay codex quota-handoff coverage (closes last gap) — covers the codex-specific handoff block of
context-relay(combo.ts:2143–2183), which #5168 left documented-but-untested because it requires acodexconnection. All seams (fetchCodexQuota, handoff generation, session relay) are mocked deterministically without live infra. (#5195) - test(ci): wire antigravity-quota-family test under
test:vitest(fix test-discovery orphan) —open-sse/services/__tests__/antigravity-quota-family.test.ts(introduced by #5180) was not collected by any active runner, causingcheck:test-discoveryto report a new orphan and gate every subsequent PR on the release branch. The file is now added tovitest.mcp.config.tsincludeand the corresponding orphan-allowlist entry is removed. (#5196) - test(security): regression guard — PII redaction stays opt-in (default off) + Hard Rule #20 — adds a test asserting both
PII_REDACTION_ENABLEDandPII_RESPONSE_SANITIZATIONfeature-flagdefaultValuefields are"false"and that data passes through all three application points (piiMasker,piiSanitizer,streamingPiiTransform) untouched when both flags are off, encoding Hard Rule #20 as a CI-enforced contract and fixing a misleading doc implication that PII masking was on by default. (#5159) - docs(i18n): add Traditional Chinese (zh-TW) README + update zh-CN — adds a new Traditional Chinese translation (
docs/i18n/zh-TW/README.md) and updates the Simplified Chinese README to the current English baseline; the language index (docs/i18n/README.md) and rootREADME.mdbadge row are updated accordingly. (#5162 — thanks @lunkerchen) - docs(i18n): full sync of zh-TW and zh-CN README to canonical English v3.8.39 — brings both translations to full parity, adding the complete What's New section, compression real-token examples, and all sections updated in the v3.8.38/39 English README. (#5171 — thanks @lunkerchen)
- docs(combo): sync combo/routing-strategy docs to current state + document test coverage — removes a stale ordinal from the Fusion bullet in
README.md; adds a new Testing & Coverage section todocs/routing/AUTO-COMBO.mddocumenting the deterministic strategy matrix (npm run test:combo:matrix), quota-share DRR E2E coverage, and context-relay handoff tests delivered across the v3.8.39 cycle. (#5185)
What's Changed
- Release v3.8.39 by @diegosouzapw in #5164
Full Changelog: v3.8.38...v3.8.39
What's Changed
- Release v3.8.39 by @diegosouzapw in #5164
- fix(docker): copy open-sse workspace manifest before npm ci (v3.8.39 image) by @diegosouzapw in #5223
Full Changelog: v3.8.38...v3.8.39