github diegosouzapw/OmniRoute v3.8.24

one hour ago

[3.8.24] — 2026-06-13

✨ New Features

  • feat(plugins): custom plugin marketplace support — the plugin registry now fetches from a custom URL set in system settings (pluginMarketplaceUrl), falling back to the local seed registry when none is configured. Adds a GET /api/plugins/marketplace endpoint and a revamped Marketplace UI. (#3656 — thanks @oyi77)
  • feat(api-keys): strict-mode controls for the Claude Code default routing pathClaude Code default is now an explicit cc/* model permission, so an API key can allow the default path while blocking specific model families (e.g. Fable) in strict mode. Previously the default path received dynamic/unprefixed models (sonnet, opus, claude-opus-4-8[1m], …) that no single permission represented, so it broke under strict permissions. (#3776 — thanks @Witroch4)
  • feat(flags): expose the emergency budget fallback in the Feature Flags pageOMNIROUTE_EMERGENCY_FALLBACK is now a runtime boolean (enabled by default, applied without restart) resolved through the feature-flag stack, so a DB override can toggle it while still honoring the raw env fallback. Follow-up to #3741 by @zoispag. (#3752 — thanks @rdself)
  • feat(reasoning): preserve xhigh reasoning effort by defaultxhigh now passes through unless a model explicitly sets supportsXHighEffort: false, with the existing max normalization kept separate. (#3756 — thanks @rdself)
  • feat(codex): inject OmniRoute memory into Codex Responses WebSocket requests — retrieved memory is injected into the Responses WebSocket prepare request via the instructions field, with the retrieval query derived from the latest user input (skipping tool/reasoning payloads) and duplicate-safe injection. (#3749 — thanks @kkkayye)
  • feat(dashboard): free provider rankings page — a new dashboard page (with sidebar entry) that ranks free providers (no-auth / OAuth / API-key) by their models' Arena ELO / intelligence scores, joining the provider registry with the model_intelligence table via fuzzy model-name matching. Pure computation over existing data — no external calls. (#3799 — thanks @pizzav-xyz)

🔒 Security

  • security(proxy): IPv6-only egress enforcement + closing IP-leak paths (L1/L2/L3) — de-brackets IPv6 literals at the SOCKS host and the proxy-health tcpCheck (so socks5://[2001:db8::1] and any v6-literal proxy connect instead of dying with ENOTFOUND), adds a per-proxy family policy (auto/ipv4/ipv6), and enforces it end-to-end across SOCKS5/HTTP/HTTPS × global/provider/key × literal/hostname. 16 commits, TDD+BDD, 73 tests. (#3777)
  • security(marketplace): harden the custom-URL SSRF guard against three bypasses found by automated security review — IPv6/AAAA records (only dns.resolve4 was checked, so a private AAAA record or an IPv6 literal slipped through), redirect-following (a public URL could 30x to an internal one), and DNS-rebinding TOCTOU. The guard now resolves A+AAAA via the canonical isPrivateHost, routes the fetch through safeOutboundFetch (public-only, blocks redirects to private hosts), and re-validates on fetch. Reachable only with management auth + a custom pluginMarketplaceUrl. Follow-up to #3656. (#3774)
  • security: resolve all open CodeQL + Dependabot alerts — CodeQL js/insufficient-password-hash (the semantic-cache apiKeyId is now a plaintext key prefix, ${apiKeyId}.${digest}, instead of being folded into the SHA-256 digest, clearing the false positive while preserving per-key cache isolation) and a URL-substring check tightened to an exact host match; Dependabot esbuild < 0.28.1 pinned via override in both workspaces. (#3778) The remaining js/incomplete-url-substring-sanitization instances in the api-key proxy-context test were also cleared by asserting on the parsed URL host/port instead of a substring includes. (thanks @diegosouzapw)

🐛 Fixed

  • fix(dashboard): surface the Plugins page (plugin manager + marketplace) in the sidebar — the plugins page (/dashboard/plugins), which hosts the custom plugin marketplace shipped in #3656, had no menu entry and was reachable only by typing the URL. It now appears under Agentic Features. (thanks @diegosouzapw)

  • fix(proxy): add the IP-family selector (auto / IPv4-only / IPv6-only) to the proxy form — the per-proxy family egress policy from #3777 was backend-only (the dashboard had no control, so every proxy stayed on auto). The proxy registry form now exposes the selector and the create/update schema accepts it, so IPv6-only egress can be enabled from the UI. (thanks @diegosouzapw)

  • fix(combo): deep audit of the combo + quota-shared routing system — repairs 5 dead/broken rules (streaming-USD cost recording, quota-pool usage provider resolution, provider-diversity wiring, maxComboDepth threading, and scoring clamp/NaN-safety incl. connectionDensity) and revives the dead tierAffinity/specificityMatch scoring factors — root cause was a require() that throws under ESM, so both factors silently collapsed to 0.5; now a static import. Validates every auto-router strategy (cost / latency / sla-aware / lkgp / selectWithStrategy + aliases) and the predictive-TTFT decision, adds E2E coverage (3-hop priority failover, per-target timeout failover, real strategy:auto dispatch), and introduces opt-in complexity-aware routing (2026) layered over the existing specificity detector. Per-target credential+proxy isolation verified clean (AsyncLocalStorage). 4 TDD waves, 10 new/updated test files. (#3779 — thanks @diegosouzapw)

  • fix(anthropic): normalize sampling params under extended thinking — Claude models with extended thinking (e.g. Opus 4.8 via the Claude Code provider) returned HTTP 400 when a request carried non-default temperature/top_p (temperature may only be set to 1 …, top_p must be ≥ 0.95 or unset …). Tools like VS Code Copilot's "Ollama" BYOK send temperature: 0.7 + top_p: 0.9, so every thinking-enabled Claude request failed; the proxy now drops/normalizes these params at the chokepoint so the request succeeds. (#3780 — thanks @zhiru)

  • fix(sse): pass Claude passthrough thinking blocks through unchanged — the Anthropic-native Claude OAuth passthrough rewrote every assistant thinking block to redacted_thinking, which the Messages API rejects (submitted thinking blocks are validated against the original response), so every multi-turn request with extended thinking failed with 400 … thinking blocks … cannot be modified (very visible on long Claude Code tool-loops). The blocks are now passed through verbatim; the signature is validated server-side and stays valid on replay (including across an OAuth token switch), so the redaction was unnecessary. (#3775 — thanks @havockdev)

  • fix(mcp): resolve the bundled MCP server entry from dist/ instead of the legacy app/ path — omniroute --mcp crashed on npm installs with ERR_MODULE_NOT_FOUND: Cannot find package @/lib because bin/mcp-server.mjs looked for the compiled entry under app/ (a VPS-deploy path that never exists in the npm package) and fell back to the un-bundled .ts. (#3765 — thanks @megamen32)

  • fix(sse): preserve streamed tool-call arguments end-to-end — incremental tool-call argument deltas could be truncated/duplicated through SSE parsing, transformation and response translation, corrupting tool calls in CLI tool-use output. Dedup now only collapses unambiguous snapshots. (#3762, closes #3701 — thanks @Mffff4)

  • fix(dashboard): repair the Logs page light-mode controls — the "Clean history" button keeps readable contrast (while preserving the red destructive affordance), request-row hover uses a cool blue tint so it no longer reads like a failed-request row, and the custom auto-refresh interval persists in localStorage (clamped to 1–300s). Also refreshes the Feature Flags light-mode treatment. (#3760 — thanks @rdself)

  • fix(dashboard): make the Request Logs "Clean history" action perform a full request-history purge — clears call_logs, legacy request_detail_logs, and local JSON artifacts under DATA_DIR/call_logs (including orphaned artifact files) via a dedicated maintenance API route, instead of a retention-only cleanup. (#3751 — thanks @rdself)

  • fix(cli): detect CLI tools installed outside the GUI PATH on macOS. macOS GUI/Electron apps don't inherit the user's login-shell PATH, so Homebrew (/opt/homebrew/bin), nvm and volta-installed CLIs (Cline, Codex, OpenCode, Continue, Hermes, …) were reported "not installed" and the Cline runtime couldn't be spawned. CLI detection (omniroute doctor) and the provider-runtime lookup now enrich the lookup PATH with the login shell's PATH ($SHELL -ilc, darwin-only, cached, fail-safe). (#3321 — thanks @mikmaneggahommie)

  • fix(dashboard): repair the Playground model selector for custom OpenAI/Anthropic-compatible providers. Two bugs left it unusable: (1) when the provider's catalog prefix didn't resolve, the list was filtered by the raw connection id (which matches nothing) so the selector showed nothing; and (2) selecting a provider reset the model to empty and nothing ever picked a default, so the chat failed with "Set a model in the config pane". The selector now falls back to the full catalog instead of emptying, and auto-selects the first available model. (#3731, #3009 — thanks @tjengbudi)

  • fix(cursor): send a ModelDetails envelope (model_id + display_model_id + display_name) in the Cursor agent request, alongside the existing RequestedModel. Pinned Cursor Claude/GPT thinking variants (e.g. cursor/claude-opus-4-7-thinking-xhigh) were returning an empty turn → 502 Provider returned empty content, because Cursor needs the ModelDetails envelope (which cursor-agent's real wire format sends) to resolve them; RequestedModel alone only resolves server-routed ids (auto/composer-*). The -fast parameter path on RequestedModel is preserved. (#3714)

  • fix(docs): correct the OAuth redirect URI in the Fly.io deployment guide. It told users to register <NEXT_PUBLIC_BASE_URL>/api/oauth/<provider>/callback, but OmniRoute's browser OAuth flow uses a single <NEXT_PUBLIC_BASE_URL>/callback handler (there is no per-provider callback route). The mismatch caused GitLab Duo (and any OAuth provider) to reject the flow with "The redirect URI included is not valid". Added a regression guard test. (#3732)

  • fix(providers): give Ollama Cloud's kimi-k2.7-code its real capabilities (262K context, 262K max output, vision + thinking + tools) instead of the degraded 128000 / 8192 defaults. The model had no spec/registry entry, so importing it via "Import from /models" (whose /v1/models upstream returns no per-model metadata) left it as a bare custom model with fallback capabilities. Added a global kimi-k2.7-code model spec (parity with kimi-k2.6) plus a registry entry on ollama-cloud. (#3761 — thanks @SultanKs4)

  • fix(providers): repair qwen-web (chat.qwen.ai) connection validation, which failed with a misleading provider.validation.ssrf_blocked error. qwen-web had no specialty validator, so the generic OpenAI-compatible path probed a non-existent /api/v2/models URL that answers with a 307 redirect — the outbound guard blocked the redirect and the route mislabeled it as an SSRF security block. Added a qwen-web specialty validator that probes the real session endpoint (GET /api/v2/user, mirroring the executor's anti-bot headers + cookie-jar replay). Also hardened toValidationErrorResult so a blocked redirect is only flagged securityBlocked when its target is a private/internal host — a benign 3xx to a public host is no longer mislabeled as an SSRF attempt (this affected every web-cookie provider, not just qwen). (#3288, #3758)

  • fix(oauth): stop nulling the stored refresh_token of non-rotating providers when a proactive health-check refresh fails with invalid_grant. The destructive refreshToken: null write in tokenHealthCheck was only meant for rotating one-time-use tokens (Codex/OpenAI), but it also fired for Google-family providers (gemini-cli / antigravity / gemini) whose refresh tokens are non-rotating. Once nulled, the connection reported "No valid refresh token available" and could never recover even after re-activation. The token is now preserved (gated on isRotatingProvider) so it stays as the recovery artifact. (#3679 — thanks @3xa228148)

  • fix(dashboard): self-host the Material Symbols icon font instead of loading it from the Google Fonts CDN. On networks where fonts.googleapis.com is unreachable (e.g. mainland China), the icon ligature font never loaded, so every icon rendered as its literal text name (smart_toy, visibility, …) and the layout broke — especially after importing many models. The font is now bundled locally via the material-symbols package (@import "material-symbols/outlined.css" in globals.css), removing the runtime CDN dependency. (#3695 — thanks @lqyiwwx)

  • fix(antigravity): skip Google One AI credits retry on full_quota_exhausted verdict — antigravity executor now calls decide429() before attempting the credits retry so that a quota-exhausted account (24h cooldown) bypasses the extra upstream HTTP call instead of hanging for up to ~41s. Also persists the cooldown in the DB via setConnectionRateLimitUntil so post-restart routing skips exhausted connections without re-learning the hard way. Bonus: antigravity429Engine now recognises the real Antigravity "Individual quota reached. Contact your administrator to enable overages." error message as quota_exhausted. (#3707 — thanks @andrea-kingautomation)

  • fix(cli): ServerSupervisor.handleExit now coerces the exit code to a number before calling process.exit() — Node.js v24 throws TypeError [ERR_INVALID_ARG_TYPE] when process.exit() receives a string (e.g. 'ENOENT' from a spawn error event's err.code). The error callback also now passes -1 instead of the raw err.code, which is an OS error string rather than a meaningful exit code. (#3748)

📝 Maintenance

  • feat(intelligence): enable Arena ELO sync by default + wire it into the live startup path — the Arena AI leaderboard ELO sync (ARENA_ELO_SYNC_ENABLED) that powers the new Free Provider Rankings page (#3799) is now on by default (was opt-in). It was also only initialized from server-init.ts, which the Next standalone runtime never executes (it boots through instrumentation-node.ts), so the sync never actually ran in production — it's now initialized from the live instrumentation path. Fetches from api.wulong.dev on startup (non-blocking, never fatal) and refreshes daily; set ARENA_ELO_SYNC_ENABLED=false to opt out of the outbound sync. (thanks @diegosouzapw)
  • chore(quality): Quality Gates → 100% — completes Fase 6A (systemic hardening: a stale-allowlist helper applied across ~10 gates, docs/architecture/QUALITY_GATES.md, and ratchet engine v2 with --require-tighten + per-metric eps) and the entire Fase 7 (20 security/dead-code/mutation/tooling gates), with the Fase 8 plan documented. (#3757)
  • docs: close the remaining documentation gaps for proxy operations, skills internals, the memory engine, RTK customization, and compression extensibility (post-#3438 audit, 5 areas in one PR). (#3453 — thanks @oyi77)
  • chore(quality): re-baseline providerRegistry.ts file-size (4692→4703) after #3768's Ollama Cloud kimi-k2.7-code capability fix grew the file past the frozen baseline, turning the release's own Fast Quality Gates red. No source file is touched. (#3770)
  • fix(publish): clean the @omniroute/opencode-plugin node_modules after the tsup build — the hard links npm creates on Linux ended up in the published tarball as LINK entries, which the npm registry rejects with E415 "Hard link is not allowed". The dependencies are only needed for the build, never shipped. (thanks @diegosouzapw)
  • chore(docs): prune internal planning/spec artifacts and sync the i18n CHANGELOG to 3.8.24. (thanks @diegosouzapw)
  • test: align two unit tests left stale by this cycle's behavior changesexecutor-codex now asserts that GPT 5.4 Mini's xhigh reasoning effort passes through unchanged (the intended #3756 default; the model ships an xhigh catalog variant and carries no supportsXHighEffort:false flag) instead of the old downgrade-to-high, and plugins-route-error-sanitization now covers the new /api/plugins/marketplace route from #3656 (verified compliant with Hard Rule #12: imports + uses buildErrorBody). No production behavior change. (thanks @diegosouzapw)
  • docs: refresh root + docs/ documentation to the current architecture — a full codebase audit corrected stale counts/facts (MCP 43→87 tools / ~13→30 scopes, DB 45+→83 modules / 55→97 migrations, routing 14→15 strategies, A2A 5→6 skills, Node >=22.0.0 <23 || >=24.0.0 <27, TypeScript 6.0) across CLAUDE.md, AGENTS.md, README.md, and the docs/ index, and documented the plugin marketplace, Notion/Obsidian, and embedded-services subsystems. (thanks @diegosouzapw)

What's Changed

Full Changelog: v3.8.23...v3.8.24

Don't miss a new OmniRoute release

NewReleases is sending notifications on new releases.