github diegosouzapw/OmniRoute v3.8.29

4 hours ago

[3.8.29] — 2026-06-19

✨ New Features

  • feat(cloud-agent): Cursor Cloud Agent via the official API-key REST API (no IDE-OAuth ban risk) — adds a cursor-cloud cloud agent that drives Cursor's Background / Cloud Agents through the official REST API (api.cursor.com) authenticated with a user or service-account API key — the safer, first-party alternative to re-using the Cursor IDE's OAuth session (the existing cursor provider, which carries a ban-risk warning). Implemented as a plain REST adapter mirroring the Devin/Jules agents (createTask/getStatus/sendMessage/listSources), so it does not pull in the @cursor/sdk package and its per-platform native binaries (Cursor's SDK is itself a thin wrapper over this REST API). Cursor's UPPERCASE status enums (CREATING/RUNNING/FINISHED/ERROR) are mapped explicitly to the shared CloudAgentStatus, and baseUrl is overridable per-credential. Credentials are stored encrypted via the existing cloud_agent_credentials table; no schema change. (#4227 — thanks @MRDGH2821)
  • feat(routing): OpenRouter-style auto/<category>:<tier> combos — auto-routing now understands suffixed combos that separate the category (what kind of route) from the tier (how to optimize): auto/coding:fast, auto/coding:cheap (alias :floor), auto/coding:free, auto/coding:pro, auto/coding:reliable, plus the new category roots auto/reasoning, auto/vision, auto/multimodal. The tier picks the scoring weights — :fast → ship-fast, :cheap/:floor → cost-saver, :reliable → a new reliability-first pack (circuit-breaker health + latency stability) — while :free/:pro filter the candidate pool by model tier (classifyTier: free-tier vs. premium models). The category filters the pool by capability (vision/multimodal → vision-capable models, reasoning → reasoning/thinking models). Any valid auto/<category>:<tier> resolves on demand; a curated set is advertised in /v1/models and the dashboard. Filtering is fail-open — if a constraint matches no connected models the full pool is used so routing never breaks. All composition lives in the new open-sse/services/autoCombo/suffixComposition.ts; the core combo scorer (combo.ts) is untouched. Second slice of #4235 (premium account-tier weighting is a later follow-up). (#4235 — thanks @MRDGH2821)
  • feat(routing): advertise the auto/cheap, auto/offline, auto/smart combos (catalog ↔ README sync) — the README lists auto/cheap (cheapest-per-token first), auto/offline (most quota/rate-limit headroom first) and auto/smart (quality-first + 10% exploration), and they already resolved at request time via parseAutoPrefixcreateVirtualAutoCombo. But they were missing from AUTO_TEMPLATE_VARIANTS, so /v1/models and the dashboard combos list (which iterate that catalog) never showed them — the catalog drifted from the docs (visible in the issue's screenshots). Added the three entries so they're advertised everywhere alongside the other built-in auto/* combos. First slice of #4235 (OpenRouter-style auto/<category>:<tier> suffixes + new categories follow). (#4235 — thanks @MRDGH2821)
  • feat(cli): remote mode — drive a remote OmniRoute with scoped access tokens — a new CLI mode that connects to a remote OmniRoute instance using scoped access tokens, so a local CLI can drive a server you don't own a session on. (#4256)
  • feat(api): cost-telemetry parity — X-OmniRoute-* headers on every endpoint + a non-token cost engine — every endpoint now emits the X-OmniRoute-* cost/usage headers, backed by a cost engine that also prices non-token (media/request-based) usage. (#4247)
  • feat(api): register Kimi K2.7 Code models (kimi-k2.7-code + -highspeed) — the new Moonshot thinking-only coding models are registered (fixed sampling; temperature/top_p marked unsupported). (#4183)
  • feat(catalog): add kimi-k2.7-code to the kmca catalog + qwen-web models discovery — surfaces the new Kimi coding model in the kmca catalog and wires qwen-web into model discovery. (#4185)
  • feat(api): expand the zai provider catalog with GLM-5.2 / GLM-4.7 — adds the real GLM-5.2, GLM-4.7 and GLM-4.7-flash model ids to the Anthropic-direct zai provider. (#4201)
  • feat(api): no-thinking gateway model IDs (FCC port, Fase 8.1) — gateway model id variants that force thinking off, ported from free-claude-code. (#4145)
  • feat(sse): mid-stream continuation for truncated streams (FCC port, Task 4.4) — when a stream is cut short, OmniRoute can transparently continue it, ported from free-claude-code. (#4147)
  • feat(sse): per-provider sliding-window rate-limit fallback (FCC port, Fase 8.2) — a per-provider sliding-window rate limiter as a fallback path, ported from free-claude-code. (#4146)
  • feat(sse): transparent stream recovery (FCC port, Fase 4, opt-in) — opt-in transparent recovery of interrupted upstream streams, ported from free-claude-code. (#4131)
  • feat(search): free DuckDuckGo web search as a last-resort provider (FCC port, Fase 6) — adds a no-key DuckDuckGo web-search provider used as a last resort, ported from free-claude-code. (#4136)
  • feat(logging): credential-redaction safety net in the pino logger (FCC port, Fase 8.3) — a logger-level redaction pass that scrubs credentials from log output, ported from free-claude-code. (#4140)
  • feat(memory): opt-in Qdrant scalar int8 quantization (F4.4 Q1) — opt-in int8 scalar quantization for Qdrant-backed memory vectors. (#4187)
  • feat(memory): opt-in sqlite-vec int8 vector quantization (F4.4 Q2) — opt-in int8 quantization for the sqlite-vec memory backend. (#4190)
  • feat(deploy): keep optional deps on update (--include=optional) — the in-place update path now passes --include=optional so native/optional packages aren't dropped on update. (#4260)
  • feat(dashboard): unified visual identity — grid, primitives, tables, form controls (design phases 1-4) — a sweeping design pass aligning the dashboard with the site: grid wallpaper, button/card/input primitives, theme-aware tables and form controls. (#4122)
  • feat(dashboard): grid wallpaper on all standalone screens + fluid 4K layout — the identity grid now backs every standalone screen and the layout scales fluidly to 4K. (#4158)
  • feat(dashboard): make the identity grid visible + unify the focus ring on accent — design follow-up making the grid actually visible and standardizing focus rings on the accent color. (#4141)
  • feat(dashboard): import only free models + free-model list controls — the model-import page can import just the free models, with controls to manage the free-model list. (#4176 — thanks @felipesartori)
  • feat(dashboard): compact grid layout for no-auth provider accounts — a denser grid layout for provider accounts when auth is disabled. (#4137 — thanks @felipesartori)
  • feat(dashboard): derive media serviceKinds from the registries (surface MiniMax + the media catalog)/media-providers/[kind] now derives its service kinds from the registries instead of a hand-maintained list, surfacing ~48 previously-invisible media providers (incl. MiniMax TTS/video/music). (#4212)
  • feat(traffic-inspector): live (in-flight) request filter (Gap 5) — the Traffic Inspector can filter to in-flight requests as they happen. (#4130)
  • feat(agent-bridge): maintenance & diagnostics dashboard controls — adds maintenance and diagnostics controls for the Agent Bridge to the dashboard. (#4127)
  • feat(mitm): TPROXY IP_TRANSPARENT native addon + conditional loader (Epic A) — a native IP_TRANSPARENT addon with a conditional loader, the foundation for TPROXY capture. (#4148)
  • feat(mitm): Fase 3 Epic A spike — TPROXY command builder — a transactional builder for the iptables/TPROXY command set. (#4139)
  • feat(mitm): TPROXY setup layer — transactional apply/revert (Epic A) — applies and reverts the TPROXY routing setup transactionally. (#4144)
  • feat(mitm): add setSocketMark to the TPROXY addon (anti-loop primitive) — exposes setSocketMark so OmniRoute's own egress can be marked and skipped (anti-loop). (#4160)
  • feat(mitm): TPROXY capture-mode listener + connectMarked (Epic A) — the capture-mode listener plus a marked-connect primitive. (#4169)
  • feat(mitm): dynamic per-SNI cert authority for TPROXY (TLS decrypt 1/N) — a per-SNI on-the-fly certificate authority, the first slice of TLS decrypt. (#4173)
  • feat(mitm): TLS-terminating capture for TPROXY (decrypt 2/N) — terminates TLS to capture decrypted traffic. (#4179)
  • feat(mitm): wire the TLS decrypt engine into TPROXY capture mode (decrypt 3/N) — connects the decrypt engine to the capture-mode pipeline. (#4200)
  • feat(mitm): TPROXY capture-mode manager (decrypt 4a/N) — a manager coordinating the TPROXY capture lifecycle. (#4208)
  • feat(mitm): local-only route + trust-store installer for TPROXY decrypt (4b/N) — a loopback-only management route plus a CA trust-store installer for the decrypt CA. (#4211)
  • feat(dashboard): TPROXY decrypt capture toggle in the Traffic Inspector (4c/N) — a UI toggle to enable/disable decrypted capture. (#4216)
  • feat(compression): replace the headroom tabular encoder with a vendored GCF — swaps the tabular encoder for a vendored GCF implementation. (#4167 — thanks @blackwell-systems)
  • feat(compression): live per-engine streaming via compression.step (F3.3) — streams per-engine compression progress through a compression.step event. (#4217)
  • feat(compression): show an engine node for single-engine runs in the studio — the Compression Studio now renders an engine node even when only one engine runs. (#4210)
  • feat(compression): expose the WaterfallInspector via a Canvas/Waterfall toggle — adds a Canvas/Waterfall view toggle that surfaces the WaterfallInspector. (#4238)
  • feat(compression): make mcpAccessibility config reachable via a settings sub-route — exposes the mcpAccessibility config under a dedicated settings sub-route. (#4237)
  • feat(compression): runnable A/B benchmark CLI (F2.4) — a CLI to run A/B compression benchmarks. (#4220)
  • feat(compression): add a transcript loader to the replay harness — the replay harness can now load real transcripts. (#4246)
  • feat(compression): wire MCP tool-cardinality reduction (F4.3, opt-in) — opt-in reduction of MCP tool-set cardinality to shrink prompts. (#4221)
  • feat(compression): wire RTK comment-stripping config + honor preserveDocstrings — RTK comment-stripping is now configurable and honors a preserveDocstrings flag. (#4242)
  • feat(compression): honor the per-filter RTK deduplicate flag — RTK filters now respect a per-filter deduplicate flag. (#4231)
  • feat(compression): honor the registry enabled flag in the stacked loop — the stacked compression loop now skips engines disabled in the registry. (#4244)
  • feat(compression): persist RTK grouping config (unlock R5 enableGrouping) — persists the RTK grouping configuration, unlocking the R5 enableGrouping rule. (#4207)
  • feat(compression): wire ultra's modelPath/slmFallbackToAggressive to the LLMLingua SLM tier — connects the ultra tier's small-language-model knobs to the LLMLingua SLM path. (#4257)
  • feat(quality): Onda 2 mutation-gate tooling — radiography classifier (T1) + mutationScore ratchet (T3) — new mutation-testing tooling: a survivor-radiography classifier and a mutationScore ratchet. (#4234)
  • feat(ci): wire the F2.4 compression budget-gate ratchet — adds a CI ratchet that gates compression budget regressions. (#4232)

🐛 Fixed

  • fix(providers): qwen-web model discovery now lists the live catalog instead of nothing — the qwen-web cookie provider had no entry in PROVIDER_MODELS_CONFIG, so its model-discovery page returned an empty/stale local catalog (the OAuth fallback at the top of the route only fires for provider === "qwen", leaving qwen-web to fall through to the no-config branch). Added a qwen-web entry that fetches the public https://chat.qwen.ai/api/v2/models endpoint (no auth header) and parses the { data: { data: [{ id, name, owned_by }] } } shape (with a flatter { data: [] } fallback). This is Problem #3 of #3931 (diagnosed by @thezukiru); Problem #1 — validator bare-token false-positive — shipped earlier in #3958, and Problem #2 — empty stream from Qwen WAF bot-detection on the streaming endpoint — remains a separate upstream/stealth concern. (#3931 — thanks @thezukiru)
  • fix(providers): ZenMux model discovery now lists the live catalog (incl. the free models) instead of the stale 9-entry hardcoded list — adding a ZenMux key validated fine, but the connection then showed API unavailable — using local catalog and was missing the free models ZenMux advertises (z-ai/glm-5.2-free, moonshotai/kimi-k2.7-code-free). Root cause: zenmux carries a correct modelsUrl in the registry, but — like llm7/byteplus before #3976 — it was not classified by any live-fetch branch of the model-import route (not openai-compatible-*, not self-hosted, not in NAMED_OPENAI_STYLE_PROVIDERS), so the route never probed the upstream /models and fell through to the registry's hardcoded models[]. Added zenmux to NAMED_OPENAI_STYLE_PROVIDERS, so the route probes https://zenmux.ai/api/v1/models (the /chat/completions-stripped <baseUrl>/models candidate) and serves the live list, falling back to the local catalog only when the upstream fetch fails — import never breaks. (#4202 — thanks @mikmaneggahommie)
  • fix(providers): Vercel AI Gateway "import models" now loads the live catalog instead of nothing — adding a Vercel AI Gateway key worked, but clicking import on the models page loaded nothing usable (manually adding the same models worked). Same class as #4202 (zenmux) / #3976 (llm7/byteplus): vercel-ai-gateway carries a real baseUrl (https://ai-gateway.vercel.sh/v1/chat/completions, format openai) in the registry, but was not classified by any live-fetch branch of the model-import route (not openai-compatible-*, not self-hosted, not in NAMED_OPENAI_STYLE_PROVIDERS), so the route never probed the upstream /models and fell through to the registry's tiny 5-entry hardcoded models[]. Added vercel-ai-gateway to NAMED_OPENAI_STYLE_PROVIDERS, so the route probes https://ai-gateway.vercel.sh/v1/models (the /chat/completions-stripped <baseUrl>/models candidate) and serves the live list, falling back to the local catalog only when the upstream fetch fails — import never breaks. (#4249 — thanks @FerLuisxd)
  • fix(sse): clear error when the request queue drops a job (no more fake-upstream "This job timed out after Nms") — under concurrent load, requests that exceed the per-connection rate-limit queue budget (resilienceSettings.requestQueue.maxWaitMs) were dropped by Bottleneck with its raw This job timed out after <maxWaitMs> ms. message. That string is indistinguishable from an upstream gateway timeout, so the 502 body and call-log last_error looked like a provider outage across unrelated providers (TI:0|TO:0) — an operator spent ~3h misdiagnosing local queue saturation as upstream failures. withRateLimit now rewrites that specific Bottleneck error into a clear, OmniRoute-owned message that names the knob (requestQueue.maxWaitMs, tunable in Settings → Resilience), explicitly disclaims an upstream timeout, preserves the original as cause, and tags code: "RATE_LIMIT_QUEUE_TIMEOUT". Behavior is unchanged — the job is still dropped so combo falls back to the next target. (#4165 — thanks @KooshaPari)
  • fix(api): advertise the built-in auto/* combos in /v1/models — OmniRoute ships a zero-setup auto/* catalog (auto/best-coding, auto/pro-reasoning, …, 16 variants) that the dashboard advertises and that resolve on demand, but the /v1/models listing only emitted persisted DB combos + provider models. Clients that build their model picker from /v1/models (e.g. Hermes Agent) never saw any auto/* option. The catalog now emits every AUTO_TEMPLATE_VARIANTS id (as owned_by: "combo") at the top of the list, deduped against persisted combos. (Showing each auto/*'s dynamically-selected members is a separate enhancement.) (#4164 — thanks @MRDGH2821)
  • fix(sse): restore MCP / third-party tool names on the native Claude path (MCP dispatch broken in Claude Code) — since 3.8.27, every MCP tool call routed through OmniRoute to a native Claude OAuth provider failed client-side with Error: No such tool available: <PascalCaseName>: tool schemas arrived fine but the streamed tool_use.name reached Claude Code in its cloaked form (e.g. McpN8nMcpSearchWorkflows instead of the registered mcp__n8n-mcp__search_workflows). The native-Claude tool-name cloak stashes its per-request alias→original map as a non-enumerable _toolNameMap on the request body; the request-inspector capture added in 3.8.27 rebuilds the captured body from its serialized form (JSON.parse(JSON.stringify(...))), which drops non-enumerable properties, so finalBody._toolNameMap was empty and the response-side un-cloak silently fell back to the static built-in map — never restoring dynamic MCP / snake_case names. Built-in tools (Bash/Read/…) were unaffected (static map); cross-format paths were unaffected (they attach the map enumerably). The provider-request capture now re-attaches the per-request map (kept non-enumerable, so it still never re-serializes upstream) when the captured copy lost it, restoring MCP tool dispatch. (#4091 — thanks @pedrotecinf, @NakHalal)
  • fix(dashboard): Logs auto-refresh self-heals in embedded/proxied hosts that pin or mis-fire visibility — a follow-up to #4054: the Request Logger still froze auto-refresh on some hosts (reported on 3.8.28 Docker, works on 3.8.24). #4054 made the initial visibility fail-open, but the pause is event-driven — a host that fires a one-shot visibilitychange → hidden and then keeps reporting "hidden" (or recovers without firing the event again) left the cached visibility flag stuck false, so the interval ticked but never polled (only the manual Refresh button worked). The poll tick now also re-checks the live document.visibilityState, and a window focus listener re-arms polling (a focused window is a reliable signal the page is actively viewed). A genuinely backgrounded browser tab still pauses (it reports "hidden" and never receives focus), preserving the #3109 network-saturation optimization. (#4133 — thanks @tjengbudi)
  • fix(capabilities): unify vision model-id detection into one shared source — three code paths kept independent, drifting vision-model lists, so the same model id could get up to three different verdicts. Two concrete bugs: lite compression's gate was missing pixtral / llava / qwen-vl / glm-4v / kimi-vl / mistral-medium-3, so it stripped images for those real vision models and blinded them (same class as #4071 / #4012); and the /v1/models list was too broad, flagging text models (gemma, bare kimi like kimi-k2) as vision. All three (modelCapabilities routing fallback, /v1/models listing, lite image-strip gate) now delegate to a single conservative source src/shared/constants/visionModels.ts, which also restores glm-4v / gemini-3 coverage and keeps the #3328 MiniMax M3 carve-out. (#4072 — thanks @diego-anselmo)
  • fix(sse): surface mid-stream Gemini errors instead of returning a truncated 200 — when an upstream Gemini SSE stream emitted some partial content and then a JSON error object ({"error":{"code":503,"message":"…high demand…","status":"UNAVAILABLE"}}) instead of a candidates payload, OmniRoute silently dropped it: the gemini→openai translator's no-candidate branch only handled promptFeedback (content-filter) and returned null for anything else, so the stream simply ended and the client got HTTP 200 with a truncated body and finish_reason: "stop" — masking the failure and skipping combo fallback. geminiToOpenAIResponse now detects an error object (optionally wrapped in response), records it as state.upstreamError (preserving the real status — 503/UNAVAILABLE, or 429 for RESOURCE_EXHAUSTED), and lets stream.ts error the stream out through the existing onFailure/buildErrorBody/controller.error path — the same mechanism the openai-responses translator already uses. (#4177 — thanks @hartmark)
  • fix(capabilities): resolve models.dev-synced vision metadata for Mistral -latest aliases — root cause behind the #4071 heuristic: getResolvedModelCapabilities("mistral/pixtral-12b-latest").supportsVision resolved null (vision came only from the #4071 model-id heuristic, with attachment still null) even though models.dev exposes the model as multimodal. Confirmed against the live models.dev API: it catalogs Pixtral 12B under the short id pixtral-12b (with attachment: true, modalities.input: ["text","image"]), while requests use the Mistral API alias pixtral-12b-latest. The synced lookup tried the exact / raw / static-spec-canonical ids — all of which miss the short form — so it fell through to the heuristic. getSyncedCapabilityForResolved now adds a last-resort fallback that retries with a trailing -latest stripped, so synced metadata (attachment / image modalities) wins for these aliases; models whose -latest id is stored verbatim (e.g. pixtral-large-latest) keep resolving directly. Note: the models.dev sync is currently manual-only (Settings → models.dev) with no scheduled refresh, so a fresh instance still relies on the #4071 heuristic until that sync runs — a periodic-refresh cadence is left as a separate follow-up. (#4073 — thanks @diego-anselmo)
  • fix(sse): map Xiaomi MiMo reasoning control to its native thinking:{type} shape — MiMo (api.xiaomimimo.com) controls chain-of-thought only via top-level thinking:{type:"enabled"|"disabled"} and does not understand OpenAI's reasoning_effort/reasoning, while its request validator is strict (400 Param Incorrect). OmniRoute's OpenAI path carried reasoning intent as reasoning_effort, and the claude→openai translator can leave a Claude-shaped thinking:{type, budget_tokens} — so the client's on/off choice was silently dropped and budget_tokens/reasoning_effort rode along as extra params the validator can reject. New open-sse/services/mimoThinking.ts::normalizeMimoThinking (wired in chatCore for provider==="xiaomi-mimo") reduces any thinking object to just {type} (disabled stays; enabled/adaptive/other → enabled) and drops reasoning_effort/reasoning. It deliberately does not synthesize thinking from a bare reasoning_effortmimo-v2-omni is non-thinking, so that could turn a silently-ignored param into a hard error. (#4224)
  • fix(capabilities): Xiaomi MiMo *-pro chat models are text-only (no vision) — only mimo-v2.5 and mimo-v2-omni accept images per Xiaomi's docs; mimo-v2.5-pro/mimo-v2-pro are text-only, but modelSpecs marked them vision-capable and models.dev mislabels them (hermes-agent#18884). Since resolveVisionCapability lets a synced attachment:true win first, an image request could be routed to a blind model (the #4071 failure mode). Corrected the specs and added a hard override in resolveVisionCapability (checked before the synced branch, anchored so mimo-v2.5-pro never matches the multimodal mimo-v2.5) that beats the wrong synced attachment. Also registered the missing native mimo-v2-pro chat model and the missing mimo-v2-tts speech model. (#4224)
  • fix(sse): Claude Opus 4.7+/Fable 5 use adaptive thinking only (no more manual-budget 400s) — Opus 4.7 and later (Opus 4.7/4.8, Fable 5) removed manual extended thinking: thinking.type:"enabled" or any thinking.budget_tokens now returns 400 ("Any request that tries to set a fixed thinking budget gets a 400" — Anthropic migration guide). Reasoning is adaptive-only, steered by output_config.effort. OmniRoute's OpenAI→Claude translator mapped reasoning_effort low/medium/high to a manual thinking:{type:"enabled", budget_tokens}, so those requests hard-400'd on the most-used provider (and a Claude-native passthrough client sending the legacy shape did too). A new adaptiveThinkingOnly model flag now drives two fixes: the translator maps reasoning_effort of every level to {type:"adaptive"} + output_config.effort (preserving the requested level, never a budget) for these models, and a normalizeClaudeAdaptiveThinking catch-all at the existing post-translation thinking-normalization chokepoint collapses any residual manual thinking (passthrough legacy shape, per-model defaults) to {type:"adaptive"}, keyed on the resolved upstream model so it covers every routing mode. Pre-4.7 models (Opus 4.6/4.5, Sonnet, Haiku) keep manual budgets unchanged. (#4230)
  • fix(providers): strip non-default temperature/top_p/top_k for Claude Opus 4.7+/Fable 5 (fixed sampling → no 400) — Opus 4.7 and later reject non-default temperature/top_p/top_k with a 400 (sampling is fixed; reasoning moved to output_config.effort). The translator forwarded client-supplied temperature/top_p unconditionally and the Claude registry models carried no unsupportedParams, so a plain OpenAI-format request with temperature: 0.7 to claude-opus-4-8 hard-400'd. Added unsupportedParams: ["temperature","top_p","top_k"] to the Opus 4.7+/Fable 5 ids in both the claude (dashed claude-opus-4-8) and anthropic (dotted claude-opus-4.7) registries, so they're stripped at the existing getUnsupportedParams dispatch chokepoint. Pre-4.7 Claude models still accept sampling params. (#4230)
  • fix(providers): conditionally strip temperature/top_p for GPT-5 reasoning on the openai Chat Completions path (no 400 when an effort is active) — GPT-5 reasoning models reject non-default temperature/top_p with a 400 whenever a reasoning effort is active, yet accept them again under reasoning_effort:"none" (the GPT-5.1+ default, i.e. non-reasoning mode). On the openai provider only o3 carried REASONING_UNSUPPORTED; gpt-5.5/gpt-5.4/gpt-5.4-mini/gpt-5.4-nano carried no sampling guard, so a temperature + active-effort request hard-400'd. A static unsupportedParams list can't express the none-mode carve-out (it would over-strip the legitimate case), so the new gpt5SamplingGuard drops temperature/top_p only when the resolved effort is active — wired at the existing getUnsupportedParams chokepoint and scoped to the openai Chat Completions surface (the codex Responses path is already covered by the CodexExecutor allowlist; other providers are untouched). (#4245)
  • fix(codex): stop silently dropping GPT-5 output verbosity (verbosity / text.verbosity) — the GPT-5 series added an output-verbosity control: verbosity (low/medium/high) on Chat Completions, nested as text.verbosity on the Responses API. The CodexExecutor gates translated requests through an allowlist that had no text entry, so for the codex provider the hint was dropped before reaching upstream (the openai Chat path already forwarded it). normalizeCodexVerbosity now folds whichever shape arrived into a single validated text:{verbosity} before the allowlist (which now permits text), and the OpenAI Chat↔Responses request translators map verbosity across formats so the hint survives a format crossing for non-codex Responses backends too. Invalid/absent verbosity collapses to no text (status quo). (#4245)
  • fix(sse): map reasoning_effort to DeepSeek V4's native {high, max} vocabulary — DeepSeek V4 only understands high/max reasoning levels, so other reasoning_effort values are mapped onto its native vocabulary instead of being rejected. (#4219)
  • fix(glm): default max_tokens and an extended timeout for GLM-5.2+ thinking — GLM-5.2+ thinking responses are slow and need headroom, so OmniRoute now sets a sensible default max_tokens and a longer timeout for them. (#4255 — thanks @dhaern)
  • fix(antigravity): default includeThoughts for modern Gemini models — modern Gemini models on the Antigravity path now default to including thoughts so reasoning isn't silently dropped. (#4180 — thanks @dhaern)
  • fix(provider-registry): add correct contextLength to theoldllm models — fills in accurate context-window sizes for theoldllm's models. (#4184 — thanks @herjarsa)
  • fix(models): expose combo model token limits/v1/models now reports token limits for combo models. (#4189 — thanks @megamen32)
  • fix(combo): keep the passthrough quota fallback scoped — prevents the passthrough quota fallback from leaking across unrelated targets. (#4194 — thanks @Svetznaniy33)
  • fix(combo): opt proactive-fallback compression into the TV1 bail-out (no silent target drop) — proactive-fallback compression now participates in the TV1 bail-out so a target is never silently dropped. (#4228)
  • fix(compression): show engine preview output — the Compression Studio preview now renders the engine's output. (#4128 — thanks @megamen32)
  • fix(compression): harden engines against I/O failures and misconfig (F5.3) — compression engines degrade gracefully on I/O errors and bad configuration instead of throwing. (#4198)
  • fix(compression): harden RTK raw-output redaction + ReDoS guard for custom filters (F5.3) — broadens RTK raw-output redaction and adds a ReDoS guard for user-supplied filter patterns. (#4203)
  • fix(compression): bound mcpAccessibility maxTextChars on the live read path — the live read path now clamps maxTextChars so a small value can't make tools disappear. (#4206)
  • fix(dashboard): data tables paint an opaque surface so the grid doesn't bleed through — data tables now render on an opaque surface, fixing the grid wallpaper showing through. (#4233)
  • fix(dashboard): make the provider card hover visible (was ~1% opacity) — the provider-card hover state was effectively invisible; it now has a visible surface. (#4214)
  • fix(vscode): sanitize implicit editor context — redacts sensitive filenames/keywords from the implicit VS Code editor context before it's sent upstream. (#4124 — thanks @zhiru)
  • fix(build): raise the Node heap for the local next build to stop OOM/stall — bumps the build-time heap so the local production build no longer OOMs or stalls. (#4171)
  • fix(mitm): TPROXY OUTPUT-based recipe for local traffic (validated e2e on VPS) — switches the TPROXY rules to an OUTPUT-chain recipe so locally-originated traffic is captured; validated end-to-end on the VPS. (#4156)
  • fix(mitm): forward anti-loop — put the bypass-marked socket on the Agent (decrypt 4d) — places the bypass-marked socket on the HTTP Agent so OmniRoute's own forwarded traffic never re-enters the capture loop; VPS-validated. (#4229)
  • fix(free-tiers): retire dead-tier hasFree, round the headline to ~1.6B, regenerate the per-provider table — drops dead free tiers from the headline math and regenerates the per-provider free-tier table. (#4142)
  • fix(free-tiers): retire 4 re-verified-dead free tiers, flag iflytek/sparkdesk ToS, clarify monsterapi one-time — removes four confirmed-dead free tiers and annotates ToS/one-time caveats. (#4152)

🧪 Tests

  • test(sse): guard the Antigravity _toolNameMap cloak map through the request-capture round-trip — follow-up to #4091: the generic capture fix in createPreparedRequestLogger().body() (#4153) re-attaches the non-enumerable _toolNameMap that the request-inspector drops when it rebuilds the upstream body via JSON.parse(JSON.stringify(...)), but the only regression test covered the native-Claude OAuth cloak (PascalCase aliases). The Antigravity cloak differs — cloakAntigravityToolPayload suffixes custom tools with _ide (workspace_readworkspace_read_ide), leaves native tools untouched, and returns the reverse map separately — so a refactor of providerRequestLogging.ts or the executor could silently re-break Antigravity tool dispatch without tripping the Claude test. Adds a dedicated regression test driving the real cloakAntigravityToolPayload through the capture round-trip and asserting the _ide reverse map survives, stays non-enumerable (never re-serializes upstream), and that all-native traffic produces no spurious map (verified failing with the #4153 re-attach removed). No production change. (#4181 — thanks @hertznsk)
  • test(chatcore): dedicated unit tests for 6 leaves + wire into stryker mutate (QG v2 Fase 9 T5 Fase 3) — adds focused unit tests for 6 chatCore leaf helpers and enrolls them in mutation testing. (#4218)
  • test(chatcore): telemetry / memory-skills / semantic-cache tests + wire 2 into stryker (QG v2 Fase 9 T5 Fase 3) — new tests for the telemetry, memory-skills and semantic-cache leaves, two of which are added to the mutation set. (#4222)
  • test+ci(chatcore): semanticCache HIT-path fixture (15/15 mutate) + 350min budget headroom — closes the semantic-cache HIT path to a full 15/15 mutation score and gives the nightly auth/accountFallback batches more budget headroom. (#4225)
  • test(compression): close F5.1 coverage gaps (replay reducer, live accumulator, StatusDot) — fills the remaining F5.1 compression coverage gaps. (#4192)
  • test(db,sse): de-flake db-backup + chatcore streaming timing assertions — stabilizes two timing-sensitive tests (fire-and-forget backup completion + a streaming race). (#4132)
  • test: align stale integration tests surfaced post-v3.8.28 on main — realigns integration tests that drifted after the v3.8.28 merge. (#4129)

📝 Maintenance

  • refactor(sse): split chatCore.ts pure helpers into chatCore/ modules (−561 LOC) — extracts pure helpers out of the chatCore god-file into dedicated modules (Onda 3). (#4159)
  • refactor(chatcore): extract passthrough/header/telemetry helpers (QG v2 Fase 9 T5 C2-C3-C5) — further chatCore decomposition. (#4188)
  • refactor(chatcore): extract combo/proxy context cache + semaphore helpers (QG v2 Fase 9 T5 C6-C7) — continues the chatCore split. (#4193)
  • refactor(combo): god-file split pilot — types + validateQuality + predicates (QG v2 Fase 9 T5 D1-D3) — first slice of the combo.ts decomposition. (#4162)
  • refactor(combo): god-file split part 2 — shadow + sorters + structure (QG v2 Fase 9 T5 D4-D6) — continues the combo.ts split. (#4175)
  • refactor(combo): god-file split part 3 — auto strategy (QG v2 Fase 9 T5 D8) — extracts the auto strategy from combo.ts. (#4186)
  • refactor(combo): extract round-robin sticky state to combo/rrState.ts (D7a) — moves round-robin sticky state into its own module. (#4196)
  • refactor(combo): extract the reset-aware quota block to combo/quotaStrategies.ts (D7b) — moves the reset-aware quota strategies into their own module. (#4204)
  • refactor(compression): remove vestigial SLM seam + dead deprecated alias — drops dead compression code. (#4253)
  • chore(compression): remove vestigial reconstructCcr/SessionDedup round-trip helpers — removes unused round-trip helpers. (#4226)
  • chore(compression): remove dead exports + fix stale llmlingua docs — prunes dead exports and corrects stale LLMLingua docs. (#4223)
  • chore(build): build + ship the TPROXY native addon in the standalone (prebuilds 4e) — bundles the native TPROXY addon prebuilds into the standalone build. (#4236)
  • chore(ci): add quota + 6 covered chatCore leaves to stryker mutate (QG v2 Fase 9 T5 Fase 3 follow-up) — enrolls more covered leaves into mutation testing. (#4209)
  • chore(ci): re-add 8 combo split leaves to stryker mutate + expand nightly batch-matrix 3→5 (QG v2 Fase 9 T5 Fase 3) — restores mutation coverage for the split combo leaves and widens the nightly matrix. (#4205)
  • chore(quality): close v3.8.28 cycle gate drift (re-baseline + nightly-mutation scope) — reconciles quality-gate baselines after the v3.8.28 cycle. (#4135)
  • ci(mutation): split nightly into 3 parallel batches to fit the 180min budget (QG v2 Fase 9 T0) — parallelizes the nightly mutation run. (#4150)
  • ci(mutation): restore cold-seed timeout headroom (a/b lost in #4225 squash) + extend to c/d/g/h — restores and extends per-batch cold-seed timeouts. (#4258)
  • ci(docs): harden the fabricated-docs checker + enforce --strict (QG v2 Fase 9 T9) — tightens the anti-hallucination docs checker. (#4149)
  • ci: derive the oasdiff base-ref from the package version + flag the mutation-toolchain regression — fixes the OpenAPI-diff base-ref and surfaces a mutation-toolchain regression. (#4134)
  • docs(ci): correct the mutation-gate note (no regression — stryker -c is --concurrency); record Task 12 GO — corrects a misread of the stryker flag and records the spike GO. (#4138)
  • docs(api): document the /api/v1/ws chat WebSocket endpoint in openapi.yaml — adds the WebSocket chat endpoint to the OpenAPI spec. (#4215)
  • docs(readme): expand Acknowledgments into a themed, star-counted credits hall — reworks the README acknowledgments section. (#4195)
  • style(dashboard): shrink the identity grid cell 46px → 32px (~30% smaller) — tightens the identity grid density. (#4143)

🔧 Dependencies

  • deps: bump the production group with 5 updates — routine production-dependency bumps. (#4121)
  • chore(deps): bump github/codeql-action from 3 to 4 — CI action update. (#4120)
  • chore(deps): bump actions/setup-python from 5 to 6 — CI action update. (#4119)

What's Changed

Full Changelog: v3.8.28...v3.8.29

Don't miss a new OmniRoute release

NewReleases is sending notifications on new releases.