diegosouzapw/OmniRoute v3.8.39 on GitHub

✨ New Features

feat(oauth): remote Antigravity login via local helper + paste-credentials — Antigravity (and other Google "native/desktop" OAuth providers) use Google's firstparty/nativeapp consent, which only releases the auth code when the loopback redirect (127.0.0.1:<port>) is reachable from the approving browser. On a remote VPS install that loopback lives on the server, so the consent hangs forever and never emits a code — the "paste the callback URL" fallback has nothing to paste (a Google-side constraint, identical in upstream 9router). A new omniroute login antigravity CLI helper runs the OAuth on the user's own machine (where 127.0.0.1 works), exchanges the code, and prints a single-line omniroute-cred-v1.… credential blob; the dashboard's Antigravity Connect → Step 2 field now accepts that blob (alongside callback URLs) and persists the connection via a new paste-credentials action (server-side onboarding, provider-allowlisted, with the blob's embedded provider required to match the route). The SSH local-forward tunnel is documented as a zero-tooling alternative. See docs/guides/REMOTE-MODE.md. (#5203)
feat(agent-bridge): graceful cert-install fallback for containers / headless — when the MITM root CA can't be installed into the system trust store automatically (Docker / headless / no sudo / read-only trust store), the Agent Bridge no longer hard-fails on start with a generic "Certificate install failed". It now starts in skip mode and the dashboard surfaces a platform-specific manual-install guide (plus a CA download link) so the operator can trust the certificate by hand. The trust-cert endpoints return a structured { skippable, manualGuide } response (HTTP 200) for environment failures instead of a 500; an explicit user cancellation is still reported distinctly. (#4546 — thanks @phuchptty)
feat(compression): CCR ranged/grep/stats retrieval (ReDoS-safe, backward-compat) — extends the omniroute_ccr_retrieve MCP tool and /api/compression/retrieve endpoint with optional range (byte/line slice), grep (ReDoS-safe literal or bounded-pattern match against stored lines), and stats (byte/line/word counts) parameters so agents pull exactly the slice or summary they need instead of re-expanding the entire stored block. All parameters are optional — no parameters returns the full block byte-identical to the existing behavior; the CCR store written by ionizer/fuzzy/headroom is fully compatible. Sixth item of the compression roadmap. (#5187)
feat(compression): TOON best-of-N candidate encoder + encoder A/B table — adds @toon-format/toon as a candidate encoder in the headroom compression engine via a best-of-N scheme: both GCF and TOON run per prompt and the shorter result is kept, rather than hard-swapping encoders (GCF already encodes the headroom block and TOON is not a lossless universal win). An encoder A/B comparison table (GCF vs TOON vs JSON — bytes and cl100k tokens) is now surfaced in the compression studio. Fifth item of the compression feature-extraction roadmap (bench: #5080, gate: #5127, fuzzy/gate: #5143, ionizer: #5148). (#5163)

🔧 Bug Fixes

fix(oauth): Antigravity refresh no longer nulls the stored refresh_token on an empty upstream response — Google's OAuth token endpoint uses non-rotating refresh tokens: a refresh response normally OMITS refresh_token and occasionally returns it as an empty string. The Antigravity executor's refreshCredentials used typeof tokens.refresh_token === "string" ? tokens.refresh_token : credentials.refreshToken, and because typeof "" === "string" is true, an empty-string response overwrote the good token with "" — nulling it on first refresh. The check now treats a non-string or empty value as absent and preserves the stored token, matching the canonical refreshGoogleToken (tokens.refresh_token || refreshToken) semantics. (#3850 — thanks @3xa228148)
fix(api): LAN/Tailscale dashboard access — ws: CSP scheme, GET-exempt version route, surface combo field errors — three failures when opening the dashboard from a non-loopback host: (1) CSP connect-src allowed the ws: scheme only for loopback origins, blocking the dashboard's ws://<lan-host>:* Live WebSocket from LAN/Tailscale clients; the bare ws: scheme is now permitted (symmetric with the bare wss: already allowed), kept declarative in next.config.mjs with no global middleware (the project has none by design); (2) GET /api/system/version was blocked by LOCAL_ONLY_API_PREFIXES for all methods despite only POST spawning child processes (git/npm/pm2) — a new LOCAL_ONLY_API_GET_EXEMPTIONS set exempts safe read methods for this path while keeping POST/PUT/PATCH/DELETE strictly loopback-only; (3) COMBO_002 validation errors only surfaced the generic message — firstField/firstMessage are now extracted from the first Zod issue and included in the response body. (#5083 — thanks @KooshaPari for the diagnosis and original PR #5084)
fix(sse): defer </think> close so it never leaks before tool_calls in Claude→OpenAI streaming — when a Claude thinking block was followed by a tool_use block, the translator unconditionally emitted a content: "</think>" chunk at content_block_stop, injecting a spurious assistant text chunk immediately before the tool_calls delta and corrupting OpenAI-compatible clients (e.g. Kimi Coding). The close marker is now deferred: it is flushed at the first text_delta that follows the thinking block (preserving the #4633 / decolua/9router#454 behavior for Claude Code / Cursor) or at stream finish when no tool_calls were collected. Tool-use streams never get a text_delta after the thinking block, so </think> is never emitted into content before tool_calls. (#5123)
fix(sse): normalize array user-message content in the Command Code executor to prevent upstream 400 — when a client sends a user turn whose content is an array of content parts (e.g. [{type:"text",text:"…"}, …]), the raw array was forwarded verbatim to the Command Code upstream, which requires messages[N].content for the user role to be a plain string — resulting in expected string, received array / HTTP 400 on DeepSeek V4-Pro and other Command Code models. The user branch of convertMessages now calls normalizeContentText() (already used by system, assistant, and tool branches) so multi-part user content is joined to a string before dispatch. Partially addresses (#5166); the 0-output-token symptom on reasoning-only models is tracked separately.
fix(mcp): return HTTP 404 (not 400) for an unknown/expired Streamable HTTP session id — when an MCP session is terminated or idles out and the client reuses the stale Mcp-Session-Id header, the Streamable HTTP transport replied with HTTP 400. The MCP spec (2025-03-26 and 2025-11-25, Session Management) mandates HTTP 404 Not Found in that case, and spec-compliant clients only re-initialize a session on 404 — so the 400 was non-recoverable. The handler now returns 404 for a present-but-unknown session id, while a missing session id on a non-initialize request correctly stays 400. (#5169 — thanks @czer323)
fix(api): blocking "Auto (Zero-Config)" in Security settings now removes auto/* from /v1/models — the built-in auto/* combo advertiser (#4164 / #4235) at the top of the models catalog ignored settings.blockedProviders, so checking Auto (Zero-Config) under Security → Blocked Providers had no effect and the model picker kept listing every auto/* entry. The injection loop now skips the entire auto/* block when the system provider auto (its id and alias are both auto) is blocked, consistent with how every other provider is filtered from the catalog. (#5192 — thanks @WslzGmzs)
fix(cli): auto-calibrate the server V8 heap from physical RAM instead of a fixed 512MB default — the server was spawned with a hard-coded --max-old-space-size=512 (omniroute serve) or with no heap flag at all (Electron desktop, which then inherited the runtime's low ~512MB default), so RAM-rich machines still OOM-crashed under load (FATAL ERROR: Ineffective mark-compacts near heap limit … ~500MB at code=134) with many providers/accounts and large model catalogs (one report: 16GB RAM, 65 providers, ~100 accounts, ~2600 models). A new calibrateHeapFallbackMb(os.totalmem()) helper derives the default heap as ~35% of physical RAM, clamped to [512, 4096], and is wired into both bin/cli/commands/serve.mjs and electron/main.js. An explicit OMNIROUTE_MEMORY_MB (or a pre-set --max-old-space-size) still wins, so the #2939 override contract is unchanged. (#5172, #5160, #5152 — thanks @manchairwang, @Xyzjesus)
fix(oauth): Antigravity login no longer hangs — fire-and-forget onboarding + bounded post-exchange — the dashboard's Antigravity OAuth login spun indefinitely because postExchange awaited the onboardUser retry loop inline (up to 10 × 5 s per attempt, each fetch with no timeout), blocking the /exchange response forever. Matching the upstream 9router web flow: onboardUser now runs fire-and-forget in a background task; the /exchange endpoint is bounded by a 10 s hard timeout so it always returns; a progress endpoint lets the dashboard poll onboarding completion state. (#5193)
fix(antigravity): retry Antigravity accounts by quota family before escalating the combo — when one Antigravity account returns a quota or rate-limit 429 for a Gemini model (e.g. gemini-3.5-flash-medium), combo orchestration could prematurely advance to the next combo model instead of trying other eligible Antigravity accounts for the same quota family. Antigravity quota-family awareness is now added to the fallback path so a 429 on one account triggers a bounded same-model retry across other Antigravity accounts sharing that quota bucket before the combo degrades to a lower-tier model. (#5180 — thanks @Ardem2025)
fix(translator): accept Claude Messages shape in the non-stream malformed-200 guard — when a Claude client (e.g. Claude Code) is routed to a non-Claude provider, the translated non-streaming response body is in Claude Messages shape (type: "message", content[]) produced by convertOpenAINonStreamingToClaude. detectMalformedNonStream only recognized OpenAI choices[].message and Responses API output[], so this shape fell through to empty_choices → 502. The guard now recognizes the Claude Messages shape: text, tool_use, and thinking blocks carrying a signature count as valid output, while a genuinely empty content: [] is still flagged. (#5156 — thanks @NomenAK)
fix(sse): resolve nameless deepseek-web <tool> blocks via parameter-schema match — when chat.deepseek.com emits a <tool> block with no <name> child, no JSON body name/type key, and no tag suffix, every name-resolution path in extractCall returned null and the raw XML leaked to the client as plain text. A conservative schema-based fallback now compares the block's extracted parameter names against each declared tool's schema keys; if exactly one tool matches, its name is used. Zero or ambiguous (>1) matches still return null so no calls are misattributed. (#5154, #5173)
fix(stream): normalize provider safety finish reasons to content_filter — Gemini and Antigravity can return safety/prohibited terminal reasons (SAFETY, RECITATION, BLOCKLIST, PROHIBITED_CONTENT) that OpenAI-compatible downstream clients do not recognize. A shared finish-reason normalization helper now maps these to the standard content_filter value, applied in both the streaming and JSON collection paths for both providers. (#5197 — thanks @rdself)
fix(responses): normalize non-array Responses API input before routing — the OpenAI Responses API accepts input as a string, object, or list, but OmniRoute only handled list-shaped payloads; a string or object input was silently dropped on the Responses→Chat Completions path. The translator now normalizes input to a list before dispatch; the Codex-native Responses path also normalizes before forwarding (preventing upstream 400 Input must be a list); and the prompt-injection and PII sanitizer extraction paths are guarded against object-valued input so security checks do not throw. (#5204 — thanks @wilsonicdev)
fix(zenmux): normalize vendor-prefixed GLM system roles for Z.AI models — ZenMux exposes Z.AI GLM via vendor-prefixed OpenAI-compatible IDs such as z-ai/glm-5.2. The existing GLM detection only matched bare glm-*/glm ids, so zenmux/z-ai/glm-5.2 kept system messages in place; Z.AI rejects compressed histories ending with a system turn before assistant(tool_calls) → tool sequences. The fix extends GLM detection to cover z-ai/glm-* prefixes and routes them through the existing normalizeSystemRole path. (#5158 — thanks @Thinkscape)
fix(xai): add OAuth connection test probe + normalize xAI reasoning effort aliases — xAI rejects unsupported reasoning effort values (max, xhigh) with HTTP 400 after a provider update; the xAI translator now maps max and xhigh to high before forwarding. Additionally, xAI OAuth connections had no dashboard test configuration, so provider tests returned "Provider test not supported"; a dedicated OAuth test probe is now wired for xAI accounts with regression coverage for the effort normalization. (#5157 — thanks @nguyenxvotanminh3)
fix(serve): honour HOSTNAME from .env instead of hardcoding 0.0.0.0 — bin/cli/commands/serve.mjs spread process.env into the child-process environment but immediately overwrote HOSTNAME with a literal "0.0.0.0", silently discarding any user-configured bind address even though HOSTNAME is documented in .env.example and docs/reference/ENVIRONMENT.md. dist/server.js already read process.env.HOSTNAME correctly; only the CLI wrapper was overriding it. The fix applies process.env.HOSTNAME || "0.0.0.0" so the env value takes effect. (#5134, #5170 — thanks @anki1kr / @Angelo90810)
fix(cli): force NODE_ENV to match dev/start run mode in the custom Next server — when .env.example ships NODE_ENV=production, starting npm run dev via scripts/dev/run-next.mjs forwarded that value to the programmatic next() entry, which — unlike the next CLI — does not normalize it to match the run mode. The resulting production flag caused PostCSS to skip Tailwind's CSS transform, surfacing as Module parse failed: Unexpected character '@' on globals.css. The custom server now explicitly forces NODE_ENV=development for the dev path and NODE_ENV=production for the start path regardless of .env. (#5189 — thanks @backryun)
fix(cli): raise dev server Node heap limit to 8 GB to prevent OOM — npm run dev crashed with FATAL ERROR: Ineffective mark-compacts near heap limit — Allocation failed - JavaScript heap out of memory while compiling heavy dashboard routes because node scripts/dev/run-next.mjs ran on V8's ~4 GB default with no --max-old-space-size flag. The dev npm script now passes --max-old-space-size=8192 at invocation time (the only point where this flag can be set for that process). (#5198 — thanks @backryun)
fix(cli): re-enable Turbopack as the default npm run dev bundler — PR #4092 forced webpack because an earlier Turbopack 16.2.x panic (internal error: entered unreachable code: there must be a path to a root in turbopack-core/module_graph) blocked the OmniRoute module graph. That panic no longer reproduces on the pinned Next 16.2.9, so OMNIROUTE_USE_TURBOPACK is flipped from 0 to 1 in .env.example, aligning it with docs/reference/ENVIRONMENT.md which had already documented the default as 1. (#5206 — thanks @backryun)
fix(auth): allow synthetic no-auth fallback for mimocode — mimocode connections without explicit credentials were blocked before reaching the executor. The auth layer now permits a synthetic no-auth fallback for the mimocode provider so credential-free access patterns work as intended. (#5205 — thanks @KooshaPari)
fix(combo): reject empty Responses API output: [] as a fail-over trigger — a non-streaming Responses API body with object: "response" and output: [] was accepted as a valid HTTP 200 by the combo response-quality validator, allowing a combo target to stop rather than fail over to the next leg. The non-stream validator now inspects Responses-API-shaped bodies before the generic output shortcut and rejects an empty output: [] as empty_choices; structural non-empty output (e.g. function_call) remains valid. (#5207 — thanks @KooshaPari)
fix(proxy): close cached dispatchers when clearing the proxy cache — cached proxy and direct-retry dispatchers were not closed on cache clear, leaking open connection handles. The cache-clear path now calls close() on all evicted dispatchers; dispatcher cache and lifecycle helpers have been extracted from the oversized proxy-dispatcher module into a dedicated helper for reuse. (#5202 — thanks @KooshaPari)
fix(proxy): coalesce concurrent fast-fail health probes per proxy URL — under high concurrency each simultaneous request opened its own TCP health probe for the same proxy URL, creating a thundering-herd burst. Concurrent proxy fast-fail checks are now coalesced so only one TCP probe runs per proxy URL at a time; the completed-result health cache is preserved so subsequent same-URL checks return immediately. (#5109, #5208 — thanks @KooshaPari)
fix(pwa): prefer cached navigation before showing the offline page — the service worker was too eager to display /offline on transient navigation failures. It now caches successful navigation responses and consults the cached route or app shell before falling back to /offline; /offline remains the final fallback when no cached navigation or app shell exists. (#5165, #5209 — thanks @KooshaPari)
fix(request-logger): never render a negative percentage in the compression badge — when every prompt token was compressed (totalIn = 0, compressed > 0), the compression pill displayed (-100%) because the badge format hard-coded a leading - before the percentage value. The badge now omits the negative sign in this case, correctly representing the saving as a positive ratio. (#5201 — thanks @KooshaPari)
fix(dashboard): use amber for home update-step warning icon — the warning-state icon in the home update steps (HomePageClient.tsx) used text-yellow-500 (Tailwind #eab308), which has poor contrast on light backgrounds (~1.9:1, below WCAG AA) and is inconsistent with the amber warning convention used by every sibling element in the same component. Switched to text-amber-500 — a one-line className change with no behavior change. (#5176)

📝 Maintenance

test(combo): deterministic context-relay universal-handoff coverage — covers the universal (provider-agnostic) session-handoff path in context-relay (combo.ts:2099–2139), which previously had only a definition-order assertion and a TODO(phase-2). The test drives the real pipeline via session seams (x-session-id → relayOptions.sessionId → maybeGenerateUniversalHandoff) without live infrastructure. (#5168)
test(combo): end-to-end quota-share DRR routing-decision coverage (matrix parity) — adds the missing E2E test for the quota-share strategy, driving the real handleChat → chatCore → selectQuotaShareTarget → executor pipeline via in-process seams and asserting which connection is dispatched. The DRR selector already had 29 unit tests; this closes the E2E gap and brings quota-share to parity with the 17-strategy public matrix. (#5179)
test(combo): deterministic context-relay codex quota-handoff coverage (closes last gap) — covers the codex-specific handoff block of context-relay (combo.ts:2143–2183), which #5168 left documented-but-untested because it requires a codex connection. All seams (fetchCodexQuota, handoff generation, session relay) are mocked deterministically without live infra. (#5195)
test(ci): wire antigravity-quota-family test under test:vitest (fix test-discovery orphan) — open-sse/services/__tests__/antigravity-quota-family.test.ts (introduced by #5180) was not collected by any active runner, causing check:test-discovery to report a new orphan and gate every subsequent PR on the release branch. The file is now added to vitest.mcp.config.ts include and the corresponding orphan-allowlist entry is removed. (#5196)
test(security): regression guard — PII redaction stays opt-in (default off) + Hard Rule #20 — adds a test asserting both PII_REDACTION_ENABLED and PII_RESPONSE_SANITIZATION feature-flag defaultValue fields are "false" and that data passes through all three application points (piiMasker, piiSanitizer, streamingPiiTransform) untouched when both flags are off, encoding Hard Rule #20 as a CI-enforced contract and fixing a misleading doc implication that PII masking was on by default. (#5159)
docs(i18n): add Traditional Chinese (zh-TW) README + update zh-CN — adds a new Traditional Chinese translation (docs/i18n/zh-TW/README.md) and updates the Simplified Chinese README to the current English baseline; the language index (docs/i18n/README.md) and root README.md badge row are updated accordingly. (#5162 — thanks @lunkerchen)
docs(i18n): full sync of zh-TW and zh-CN README to canonical English v3.8.39 — brings both translations to full parity, adding the complete What's New section, compression real-token examples, and all sections updated in the v3.8.38/39 English README. (#5171 — thanks @lunkerchen)
docs(combo): sync combo/routing-strategy docs to current state + document test coverage — removes a stale ordinal from the Fusion bullet in README.md; adds a new Testing & Coverage section to docs/routing/AUTO-COMBO.md documenting the deterministic strategy matrix (npm run test:combo:matrix), quota-share DRR E2E coverage, and context-relay handoff tests delivered across the v3.8.39 cycle. (#5185)

What's Changed

Release v3.8.39 by @diegosouzapw in #5164

Full Changelog: v3.8.38...v3.8.39

What's Changed

Release v3.8.39 by @diegosouzapw in #5164
fix(docker): copy open-sse workspace manifest before npm ci (v3.8.39 image) by @diegosouzapw in #5223

Full Changelog: v3.8.38...v3.8.39