[3.8.29] — 2026-06-19
✨ New Features
- feat(cloud-agent): Cursor Cloud Agent via the official API-key REST API (no IDE-OAuth ban risk) — adds a
cursor-cloudcloud agent that drives Cursor's Background / Cloud Agents through the official REST API (api.cursor.com) authenticated with a user or service-account API key — the safer, first-party alternative to re-using the Cursor IDE's OAuth session (the existingcursorprovider, which carries a ban-risk warning). Implemented as a plain REST adapter mirroring the Devin/Jules agents (createTask/getStatus/sendMessage/listSources), so it does not pull in the@cursor/sdkpackage and its per-platform native binaries (Cursor's SDK is itself a thin wrapper over this REST API). Cursor's UPPERCASE status enums (CREATING/RUNNING/FINISHED/ERROR) are mapped explicitly to the sharedCloudAgentStatus, andbaseUrlis overridable per-credential. Credentials are stored encrypted via the existingcloud_agent_credentialstable; no schema change. (#4227 — thanks @MRDGH2821) - feat(routing): OpenRouter-style
auto/<category>:<tier>combos — auto-routing now understands suffixed combos that separate the category (what kind of route) from the tier (how to optimize):auto/coding:fast,auto/coding:cheap(alias:floor),auto/coding:free,auto/coding:pro,auto/coding:reliable, plus the new category rootsauto/reasoning,auto/vision,auto/multimodal. The tier picks the scoring weights —:fast→ ship-fast,:cheap/:floor→ cost-saver,:reliable→ a new reliability-first pack (circuit-breaker health + latency stability) — while:free/:profilter the candidate pool by model tier (classifyTier: free-tier vs. premium models). The category filters the pool by capability (vision/multimodal→ vision-capable models,reasoning→ reasoning/thinking models). Any validauto/<category>:<tier>resolves on demand; a curated set is advertised in/v1/modelsand the dashboard. Filtering is fail-open — if a constraint matches no connected models the full pool is used so routing never breaks. All composition lives in the newopen-sse/services/autoCombo/suffixComposition.ts; the core combo scorer (combo.ts) is untouched. Second slice of #4235 (premium account-tier weighting is a later follow-up). (#4235 — thanks @MRDGH2821) - feat(routing): advertise the
auto/cheap,auto/offline,auto/smartcombos (catalog ↔ README sync) — the README listsauto/cheap(cheapest-per-token first),auto/offline(most quota/rate-limit headroom first) andauto/smart(quality-first + 10% exploration), and they already resolved at request time viaparseAutoPrefix→createVirtualAutoCombo. But they were missing fromAUTO_TEMPLATE_VARIANTS, so/v1/modelsand the dashboard combos list (which iterate that catalog) never showed them — the catalog drifted from the docs (visible in the issue's screenshots). Added the three entries so they're advertised everywhere alongside the other built-inauto/*combos. First slice of #4235 (OpenRouter-styleauto/<category>:<tier>suffixes + new categories follow). (#4235 — thanks @MRDGH2821) - feat(cli): remote mode — drive a remote OmniRoute with scoped access tokens — a new CLI mode that connects to a remote OmniRoute instance using scoped access tokens, so a local CLI can drive a server you don't own a session on. (#4256)
- feat(api): cost-telemetry parity —
X-OmniRoute-*headers on every endpoint + a non-token cost engine — every endpoint now emits theX-OmniRoute-*cost/usage headers, backed by a cost engine that also prices non-token (media/request-based) usage. (#4247) - feat(api): register Kimi K2.7 Code models (
kimi-k2.7-code+-highspeed) — the new Moonshot thinking-only coding models are registered (fixed sampling;temperature/top_pmarked unsupported). (#4183) - feat(catalog): add
kimi-k2.7-codeto the kmca catalog + qwen-web models discovery — surfaces the new Kimi coding model in the kmca catalog and wires qwen-web into model discovery. (#4185) - feat(api): expand the
zaiprovider catalog with GLM-5.2 / GLM-4.7 — adds the real GLM-5.2, GLM-4.7 and GLM-4.7-flash model ids to the Anthropic-directzaiprovider. (#4201) - feat(api): no-thinking gateway model IDs (FCC port, Fase 8.1) — gateway model id variants that force thinking off, ported from free-claude-code. (#4145)
- feat(sse): mid-stream continuation for truncated streams (FCC port, Task 4.4) — when a stream is cut short, OmniRoute can transparently continue it, ported from free-claude-code. (#4147)
- feat(sse): per-provider sliding-window rate-limit fallback (FCC port, Fase 8.2) — a per-provider sliding-window rate limiter as a fallback path, ported from free-claude-code. (#4146)
- feat(sse): transparent stream recovery (FCC port, Fase 4, opt-in) — opt-in transparent recovery of interrupted upstream streams, ported from free-claude-code. (#4131)
- feat(search): free DuckDuckGo web search as a last-resort provider (FCC port, Fase 6) — adds a no-key DuckDuckGo web-search provider used as a last resort, ported from free-claude-code. (#4136)
- feat(logging): credential-redaction safety net in the pino logger (FCC port, Fase 8.3) — a logger-level redaction pass that scrubs credentials from log output, ported from free-claude-code. (#4140)
- feat(memory): opt-in Qdrant scalar int8 quantization (F4.4 Q1) — opt-in int8 scalar quantization for Qdrant-backed memory vectors. (#4187)
- feat(memory): opt-in sqlite-vec int8 vector quantization (F4.4 Q2) — opt-in int8 quantization for the sqlite-vec memory backend. (#4190)
- feat(deploy): keep optional deps on
update(--include=optional) — the in-place update path now passes--include=optionalso native/optional packages aren't dropped on update. (#4260) - feat(dashboard): unified visual identity — grid, primitives, tables, form controls (design phases 1-4) — a sweeping design pass aligning the dashboard with the site: grid wallpaper, button/card/input primitives, theme-aware tables and form controls. (#4122)
- feat(dashboard): grid wallpaper on all standalone screens + fluid 4K layout — the identity grid now backs every standalone screen and the layout scales fluidly to 4K. (#4158)
- feat(dashboard): make the identity grid visible + unify the focus ring on accent — design follow-up making the grid actually visible and standardizing focus rings on the accent color. (#4141)
- feat(dashboard): import only free models + free-model list controls — the model-import page can import just the free models, with controls to manage the free-model list. (#4176 — thanks @felipesartori)
- feat(dashboard): compact grid layout for no-auth provider accounts — a denser grid layout for provider accounts when auth is disabled. (#4137 — thanks @felipesartori)
- feat(dashboard): derive media
serviceKindsfrom the registries (surface MiniMax + the media catalog) —/media-providers/[kind]now derives its service kinds from the registries instead of a hand-maintained list, surfacing ~48 previously-invisible media providers (incl. MiniMax TTS/video/music). (#4212) - feat(traffic-inspector): live (in-flight) request filter (Gap 5) — the Traffic Inspector can filter to in-flight requests as they happen. (#4130)
- feat(agent-bridge): maintenance & diagnostics dashboard controls — adds maintenance and diagnostics controls for the Agent Bridge to the dashboard. (#4127)
- feat(mitm): TPROXY IP_TRANSPARENT native addon + conditional loader (Epic A) — a native
IP_TRANSPARENTaddon with a conditional loader, the foundation for TPROXY capture. (#4148) - feat(mitm): Fase 3 Epic A spike — TPROXY command builder — a transactional builder for the iptables/TPROXY command set. (#4139)
- feat(mitm): TPROXY setup layer — transactional apply/revert (Epic A) — applies and reverts the TPROXY routing setup transactionally. (#4144)
- feat(mitm): add
setSocketMarkto the TPROXY addon (anti-loop primitive) — exposessetSocketMarkso OmniRoute's own egress can be marked and skipped (anti-loop). (#4160) - feat(mitm): TPROXY capture-mode listener +
connectMarked(Epic A) — the capture-mode listener plus a marked-connect primitive. (#4169) - feat(mitm): dynamic per-SNI cert authority for TPROXY (TLS decrypt 1/N) — a per-SNI on-the-fly certificate authority, the first slice of TLS decrypt. (#4173)
- feat(mitm): TLS-terminating capture for TPROXY (decrypt 2/N) — terminates TLS to capture decrypted traffic. (#4179)
- feat(mitm): wire the TLS decrypt engine into TPROXY capture mode (decrypt 3/N) — connects the decrypt engine to the capture-mode pipeline. (#4200)
- feat(mitm): TPROXY capture-mode manager (decrypt 4a/N) — a manager coordinating the TPROXY capture lifecycle. (#4208)
- feat(mitm): local-only route + trust-store installer for TPROXY decrypt (4b/N) — a loopback-only management route plus a CA trust-store installer for the decrypt CA. (#4211)
- feat(dashboard): TPROXY decrypt capture toggle in the Traffic Inspector (4c/N) — a UI toggle to enable/disable decrypted capture. (#4216)
- feat(compression): replace the headroom tabular encoder with a vendored GCF — swaps the tabular encoder for a vendored GCF implementation. (#4167 — thanks @blackwell-systems)
- feat(compression): live per-engine streaming via
compression.step(F3.3) — streams per-engine compression progress through acompression.stepevent. (#4217) - feat(compression): show an engine node for single-engine runs in the studio — the Compression Studio now renders an engine node even when only one engine runs. (#4210)
- feat(compression): expose the WaterfallInspector via a Canvas/Waterfall toggle — adds a Canvas/Waterfall view toggle that surfaces the WaterfallInspector. (#4238)
- feat(compression): make
mcpAccessibilityconfig reachable via a settings sub-route — exposes themcpAccessibilityconfig under a dedicated settings sub-route. (#4237) - feat(compression): runnable A/B benchmark CLI (F2.4) — a CLI to run A/B compression benchmarks. (#4220)
- feat(compression): add a transcript loader to the replay harness — the replay harness can now load real transcripts. (#4246)
- feat(compression): wire MCP tool-cardinality reduction (F4.3, opt-in) — opt-in reduction of MCP tool-set cardinality to shrink prompts. (#4221)
- feat(compression): wire RTK comment-stripping config + honor
preserveDocstrings— RTK comment-stripping is now configurable and honors apreserveDocstringsflag. (#4242) - feat(compression): honor the per-filter RTK
deduplicateflag — RTK filters now respect a per-filterdeduplicateflag. (#4231) - feat(compression): honor the registry
enabledflag in the stacked loop — the stacked compression loop now skips engines disabled in the registry. (#4244) - feat(compression): persist RTK grouping config (unlock R5
enableGrouping) — persists the RTK grouping configuration, unlocking the R5enableGroupingrule. (#4207) - feat(compression): wire ultra's
modelPath/slmFallbackToAggressiveto the LLMLingua SLM tier — connects the ultra tier's small-language-model knobs to the LLMLingua SLM path. (#4257) - feat(quality): Onda 2 mutation-gate tooling — radiography classifier (T1) +
mutationScoreratchet (T3) — new mutation-testing tooling: a survivor-radiography classifier and amutationScoreratchet. (#4234) - feat(ci): wire the F2.4 compression budget-gate ratchet — adds a CI ratchet that gates compression budget regressions. (#4232)
🐛 Fixed
- fix(providers): qwen-web model discovery now lists the live catalog instead of nothing — the
qwen-webcookie provider had no entry inPROVIDER_MODELS_CONFIG, so its model-discovery page returned an empty/stale local catalog (the OAuth fallback at the top of the route only fires forprovider === "qwen", leavingqwen-webto fall through to the no-config branch). Added aqwen-webentry that fetches the publichttps://chat.qwen.ai/api/v2/modelsendpoint (no auth header) and parses the{ data: { data: [{ id, name, owned_by }] } }shape (with a flatter{ data: [] }fallback). This is Problem #3 of #3931 (diagnosed by @thezukiru); Problem #1 — validator bare-token false-positive — shipped earlier in #3958, and Problem #2 — empty stream from Qwen WAF bot-detection on the streaming endpoint — remains a separate upstream/stealth concern. (#3931 — thanks @thezukiru) - fix(providers): ZenMux model discovery now lists the live catalog (incl. the free models) instead of the stale 9-entry hardcoded list — adding a ZenMux key validated fine, but the connection then showed
API unavailable — using local catalogand was missing the free models ZenMux advertises (z-ai/glm-5.2-free,moonshotai/kimi-k2.7-code-free). Root cause:zenmuxcarries a correctmodelsUrlin the registry, but — likellm7/byteplusbefore #3976 — it was not classified by any live-fetch branch of the model-import route (notopenai-compatible-*, not self-hosted, not inNAMED_OPENAI_STYLE_PROVIDERS), so the route never probed the upstream/modelsand fell through to the registry's hardcodedmodels[]. AddedzenmuxtoNAMED_OPENAI_STYLE_PROVIDERS, so the route probeshttps://zenmux.ai/api/v1/models(the/chat/completions-stripped<baseUrl>/modelscandidate) and serves the live list, falling back to the local catalog only when the upstream fetch fails — import never breaks. (#4202 — thanks @mikmaneggahommie) - fix(providers): Vercel AI Gateway "import models" now loads the live catalog instead of nothing — adding a Vercel AI Gateway key worked, but clicking import on the models page loaded nothing usable (manually adding the same models worked). Same class as #4202 (zenmux) / #3976 (llm7/byteplus):
vercel-ai-gatewaycarries a realbaseUrl(https://ai-gateway.vercel.sh/v1/chat/completions, formatopenai) in the registry, but was not classified by any live-fetch branch of the model-import route (notopenai-compatible-*, not self-hosted, not inNAMED_OPENAI_STYLE_PROVIDERS), so the route never probed the upstream/modelsand fell through to the registry's tiny 5-entry hardcodedmodels[]. Addedvercel-ai-gatewaytoNAMED_OPENAI_STYLE_PROVIDERS, so the route probeshttps://ai-gateway.vercel.sh/v1/models(the/chat/completions-stripped<baseUrl>/modelscandidate) and serves the live list, falling back to the local catalog only when the upstream fetch fails — import never breaks. (#4249 — thanks @FerLuisxd) - fix(sse): clear error when the request queue drops a job (no more fake-upstream "This job timed out after Nms") — under concurrent load, requests that exceed the per-connection rate-limit queue budget (
resilienceSettings.requestQueue.maxWaitMs) were dropped by Bottleneck with its rawThis job timed out after <maxWaitMs> ms.message. That string is indistinguishable from an upstream gateway timeout, so the 502 body and call-loglast_errorlooked like a provider outage across unrelated providers (TI:0|TO:0) — an operator spent ~3h misdiagnosing local queue saturation as upstream failures.withRateLimitnow rewrites that specific Bottleneck error into a clear, OmniRoute-owned message that names the knob (requestQueue.maxWaitMs, tunable in Settings → Resilience), explicitly disclaims an upstream timeout, preserves the original ascause, and tagscode: "RATE_LIMIT_QUEUE_TIMEOUT". Behavior is unchanged — the job is still dropped so combo falls back to the next target. (#4165 — thanks @KooshaPari) - fix(api): advertise the built-in
auto/*combos in/v1/models— OmniRoute ships a zero-setupauto/*catalog (auto/best-coding,auto/pro-reasoning, …, 16 variants) that the dashboard advertises and that resolve on demand, but the/v1/modelslisting only emitted persisted DB combos + provider models. Clients that build their model picker from/v1/models(e.g. Hermes Agent) never saw anyauto/*option. The catalog now emits everyAUTO_TEMPLATE_VARIANTSid (asowned_by: "combo") at the top of the list, deduped against persisted combos. (Showing eachauto/*'s dynamically-selected members is a separate enhancement.) (#4164 — thanks @MRDGH2821) - fix(sse): restore MCP / third-party tool names on the native Claude path (MCP dispatch broken in Claude Code) — since 3.8.27, every MCP tool call routed through OmniRoute to a native Claude OAuth provider failed client-side with
Error: No such tool available: <PascalCaseName>: tool schemas arrived fine but the streamedtool_use.namereached Claude Code in its cloaked form (e.g.McpN8nMcpSearchWorkflowsinstead of the registeredmcp__n8n-mcp__search_workflows). The native-Claude tool-name cloak stashes its per-request alias→original map as a non-enumerable_toolNameMapon the request body; the request-inspector capture added in 3.8.27 rebuilds the captured body from its serialized form (JSON.parse(JSON.stringify(...))), which drops non-enumerable properties, sofinalBody._toolNameMapwas empty and the response-side un-cloak silently fell back to the static built-in map — never restoring dynamic MCP / snake_case names. Built-in tools (Bash/Read/…) were unaffected (static map); cross-format paths were unaffected (they attach the map enumerably). The provider-request capture now re-attaches the per-request map (kept non-enumerable, so it still never re-serializes upstream) when the captured copy lost it, restoring MCP tool dispatch. (#4091 — thanks @pedrotecinf, @NakHalal) - fix(dashboard): Logs auto-refresh self-heals in embedded/proxied hosts that pin or mis-fire visibility — a follow-up to #4054: the Request Logger still froze auto-refresh on some hosts (reported on 3.8.28 Docker, works on 3.8.24). #4054 made the initial visibility fail-open, but the pause is event-driven — a host that fires a one-shot
visibilitychange→ hidden and then keeps reporting"hidden"(or recovers without firing the event again) left the cached visibility flag stuckfalse, so the interval ticked but never polled (only the manual Refresh button worked). The poll tick now also re-checks the livedocument.visibilityState, and a windowfocuslistener re-arms polling (a focused window is a reliable signal the page is actively viewed). A genuinely backgrounded browser tab still pauses (it reports"hidden"and never receives focus), preserving the #3109 network-saturation optimization. (#4133 — thanks @tjengbudi) - fix(capabilities): unify vision model-id detection into one shared source — three code paths kept independent, drifting vision-model lists, so the same model id could get up to three different verdicts. Two concrete bugs: lite compression's gate was missing pixtral / llava / qwen-vl / glm-4v / kimi-vl / mistral-medium-3, so it stripped images for those real vision models and blinded them (same class as #4071 / #4012); and the
/v1/modelslist was too broad, flagging text models (gemma, barekimilikekimi-k2) as vision. All three (modelCapabilitiesrouting fallback,/v1/modelslisting, lite image-strip gate) now delegate to a single conservative sourcesrc/shared/constants/visionModels.ts, which also restoresglm-4v/gemini-3coverage and keeps the #3328 MiniMax M3 carve-out. (#4072 — thanks @diego-anselmo) - fix(sse): surface mid-stream Gemini errors instead of returning a truncated 200 — when an upstream Gemini SSE stream emitted some partial content and then a JSON error object (
{"error":{"code":503,"message":"…high demand…","status":"UNAVAILABLE"}}) instead of acandidatespayload, OmniRoute silently dropped it: the gemini→openai translator's no-candidate branch only handledpromptFeedback(content-filter) and returnednullfor anything else, so the stream simply ended and the client got HTTP 200 with a truncated body andfinish_reason: "stop"— masking the failure and skipping combo fallback.geminiToOpenAIResponsenow detects anerrorobject (optionally wrapped inresponse), records it asstate.upstreamError(preserving the real status — 503/UNAVAILABLE, or 429 forRESOURCE_EXHAUSTED), and letsstream.tserror the stream out through the existingonFailure/buildErrorBody/controller.errorpath — the same mechanism the openai-responses translator already uses. (#4177 — thanks @hartmark) - fix(capabilities): resolve models.dev-synced vision metadata for Mistral
-latestaliases — root cause behind the #4071 heuristic:getResolvedModelCapabilities("mistral/pixtral-12b-latest").supportsVisionresolvednull(vision came only from the #4071 model-id heuristic, withattachmentstillnull) even though models.dev exposes the model as multimodal. Confirmed against the live models.dev API: it catalogs Pixtral 12B under the short idpixtral-12b(withattachment: true,modalities.input: ["text","image"]), while requests use the Mistral API aliaspixtral-12b-latest. The synced lookup tried the exact / raw / static-spec-canonical ids — all of which miss the short form — so it fell through to the heuristic.getSyncedCapabilityForResolvednow adds a last-resort fallback that retries with a trailing-lateststripped, so synced metadata (attachment/ image modalities) wins for these aliases; models whose-latestid is stored verbatim (e.g.pixtral-large-latest) keep resolving directly. Note: the models.dev sync is currently manual-only (Settings → models.dev) with no scheduled refresh, so a fresh instance still relies on the #4071 heuristic until that sync runs — a periodic-refresh cadence is left as a separate follow-up. (#4073 — thanks @diego-anselmo) - fix(sse): map Xiaomi MiMo reasoning control to its native
thinking:{type}shape — MiMo (api.xiaomimimo.com) controls chain-of-thought only via top-levelthinking:{type:"enabled"|"disabled"}and does not understand OpenAI'sreasoning_effort/reasoning, while its request validator is strict (400 Param Incorrect). OmniRoute's OpenAI path carried reasoning intent asreasoning_effort, and the claude→openai translator can leave a Claude-shapedthinking:{type, budget_tokens}— so the client's on/off choice was silently dropped andbudget_tokens/reasoning_effortrode along as extra params the validator can reject. Newopen-sse/services/mimoThinking.ts::normalizeMimoThinking(wired inchatCoreforprovider==="xiaomi-mimo") reduces any thinking object to just{type}(disabledstays;enabled/adaptive/other →enabled) and dropsreasoning_effort/reasoning. It deliberately does not synthesize thinking from a barereasoning_effort—mimo-v2-omniis non-thinking, so that could turn a silently-ignored param into a hard error. (#4224) - fix(capabilities): Xiaomi MiMo
*-prochat models are text-only (no vision) — onlymimo-v2.5andmimo-v2-omniaccept images per Xiaomi's docs;mimo-v2.5-pro/mimo-v2-proare text-only, butmodelSpecsmarked them vision-capable and models.dev mislabels them (hermes-agent#18884). SinceresolveVisionCapabilitylets a syncedattachment:truewin first, an image request could be routed to a blind model (the #4071 failure mode). Corrected the specs and added a hard override inresolveVisionCapability(checked before the synced branch, anchored somimo-v2.5-pronever matches the multimodalmimo-v2.5) that beats the wrong synced attachment. Also registered the missing nativemimo-v2-prochat model and the missingmimo-v2-ttsspeech model. (#4224) - fix(sse): Claude Opus 4.7+/Fable 5 use adaptive thinking only (no more manual-budget 400s) — Opus 4.7 and later (Opus 4.7/4.8, Fable 5) removed manual extended thinking:
thinking.type:"enabled"or anythinking.budget_tokensnow returns400("Any request that tries to set a fixed thinking budget gets a 400" — Anthropic migration guide). Reasoning is adaptive-only, steered byoutput_config.effort. OmniRoute's OpenAI→Claude translator mappedreasoning_effortlow/medium/high to a manualthinking:{type:"enabled", budget_tokens}, so those requests hard-400'd on the most-used provider (and a Claude-native passthrough client sending the legacy shape did too). A newadaptiveThinkingOnlymodel flag now drives two fixes: the translator mapsreasoning_effortof every level to{type:"adaptive"}+output_config.effort(preserving the requested level, never a budget) for these models, and anormalizeClaudeAdaptiveThinkingcatch-all at the existing post-translation thinking-normalization chokepoint collapses any residual manual thinking (passthrough legacy shape, per-model defaults) to{type:"adaptive"}, keyed on the resolved upstream model so it covers every routing mode. Pre-4.7 models (Opus 4.6/4.5, Sonnet, Haiku) keep manual budgets unchanged. (#4230) - fix(providers): strip non-default temperature/top_p/top_k for Claude Opus 4.7+/Fable 5 (fixed sampling → no 400) — Opus 4.7 and later reject non-default
temperature/top_p/top_kwith a400(sampling is fixed; reasoning moved tooutput_config.effort). The translator forwarded client-suppliedtemperature/top_punconditionally and the Claude registry models carried nounsupportedParams, so a plain OpenAI-format request withtemperature: 0.7toclaude-opus-4-8hard-400'd. AddedunsupportedParams: ["temperature","top_p","top_k"]to the Opus 4.7+/Fable 5 ids in both theclaude(dashedclaude-opus-4-8) andanthropic(dottedclaude-opus-4.7) registries, so they're stripped at the existinggetUnsupportedParamsdispatch chokepoint. Pre-4.7 Claude models still accept sampling params. (#4230) - fix(providers): conditionally strip temperature/top_p for GPT-5 reasoning on the
openaiChat Completions path (no 400 when an effort is active) — GPT-5 reasoning models reject non-defaulttemperature/top_pwith a400whenever a reasoning effort is active, yet accept them again underreasoning_effort:"none"(the GPT-5.1+ default, i.e. non-reasoning mode). On theopenaiprovider onlyo3carriedREASONING_UNSUPPORTED;gpt-5.5/gpt-5.4/gpt-5.4-mini/gpt-5.4-nanocarried no sampling guard, so atemperature+ active-effort request hard-400'd. A staticunsupportedParamslist can't express thenone-mode carve-out (it would over-strip the legitimate case), so the newgpt5SamplingGuarddropstemperature/top_ponly when the resolved effort is active — wired at the existinggetUnsupportedParamschokepoint and scoped to theopenaiChat Completions surface (thecodexResponses path is already covered by the CodexExecutor allowlist; other providers are untouched). (#4245) - fix(codex): stop silently dropping GPT-5 output verbosity (
verbosity/text.verbosity) — the GPT-5 series added an output-verbosity control:verbosity(low/medium/high) on Chat Completions, nested astext.verbosityon the Responses API. The CodexExecutor gates translated requests through an allowlist that had notextentry, so for thecodexprovider the hint was dropped before reaching upstream (theopenaiChat path already forwarded it).normalizeCodexVerbositynow folds whichever shape arrived into a single validatedtext:{verbosity}before the allowlist (which now permitstext), and the OpenAI Chat↔Responses request translators mapverbosityacross formats so the hint survives a format crossing for non-codex Responses backends too. Invalid/absent verbosity collapses to notext(status quo). (#4245) - fix(sse): map
reasoning_effortto DeepSeek V4's native{high, max}vocabulary — DeepSeek V4 only understandshigh/maxreasoning levels, so otherreasoning_effortvalues are mapped onto its native vocabulary instead of being rejected. (#4219) - fix(glm): default
max_tokensand an extended timeout for GLM-5.2+ thinking — GLM-5.2+ thinking responses are slow and need headroom, so OmniRoute now sets a sensible defaultmax_tokensand a longer timeout for them. (#4255 — thanks @dhaern) - fix(antigravity): default
includeThoughtsfor modern Gemini models — modern Gemini models on the Antigravity path now default to including thoughts so reasoning isn't silently dropped. (#4180 — thanks @dhaern) - fix(provider-registry): add correct
contextLengthto theoldllm models — fills in accurate context-window sizes for theoldllm's models. (#4184 — thanks @herjarsa) - fix(models): expose combo model token limits —
/v1/modelsnow reports token limits for combo models. (#4189 — thanks @megamen32) - fix(combo): keep the passthrough quota fallback scoped — prevents the passthrough quota fallback from leaking across unrelated targets. (#4194 — thanks @Svetznaniy33)
- fix(combo): opt proactive-fallback compression into the TV1 bail-out (no silent target drop) — proactive-fallback compression now participates in the TV1 bail-out so a target is never silently dropped. (#4228)
- fix(compression): show engine preview output — the Compression Studio preview now renders the engine's output. (#4128 — thanks @megamen32)
- fix(compression): harden engines against I/O failures and misconfig (F5.3) — compression engines degrade gracefully on I/O errors and bad configuration instead of throwing. (#4198)
- fix(compression): harden RTK raw-output redaction + ReDoS guard for custom filters (F5.3) — broadens RTK raw-output redaction and adds a ReDoS guard for user-supplied filter patterns. (#4203)
- fix(compression): bound
mcpAccessibilitymaxTextCharson the live read path — the live read path now clampsmaxTextCharsso a small value can't make tools disappear. (#4206) - fix(dashboard): data tables paint an opaque surface so the grid doesn't bleed through — data tables now render on an opaque surface, fixing the grid wallpaper showing through. (#4233)
- fix(dashboard): make the provider card hover visible (was ~1% opacity) — the provider-card hover state was effectively invisible; it now has a visible surface. (#4214)
- fix(vscode): sanitize implicit editor context — redacts sensitive filenames/keywords from the implicit VS Code editor context before it's sent upstream. (#4124 — thanks @zhiru)
- fix(build): raise the Node heap for the local
next buildto stop OOM/stall — bumps the build-time heap so the local production build no longer OOMs or stalls. (#4171) - fix(mitm): TPROXY OUTPUT-based recipe for local traffic (validated e2e on VPS) — switches the TPROXY rules to an OUTPUT-chain recipe so locally-originated traffic is captured; validated end-to-end on the VPS. (#4156)
- fix(mitm): forward anti-loop — put the bypass-marked socket on the Agent (decrypt 4d) — places the bypass-marked socket on the HTTP Agent so OmniRoute's own forwarded traffic never re-enters the capture loop; VPS-validated. (#4229)
- fix(free-tiers): retire dead-tier
hasFree, round the headline to ~1.6B, regenerate the per-provider table — drops dead free tiers from the headline math and regenerates the per-provider free-tier table. (#4142) - fix(free-tiers): retire 4 re-verified-dead free tiers, flag iflytek/sparkdesk ToS, clarify monsterapi one-time — removes four confirmed-dead free tiers and annotates ToS/one-time caveats. (#4152)
🧪 Tests
- test(sse): guard the Antigravity
_toolNameMapcloak map through the request-capture round-trip — follow-up to #4091: the generic capture fix increatePreparedRequestLogger().body()(#4153) re-attaches the non-enumerable_toolNameMapthat the request-inspector drops when it rebuilds the upstream body viaJSON.parse(JSON.stringify(...)), but the only regression test covered the native-Claude OAuth cloak (PascalCase aliases). The Antigravity cloak differs —cloakAntigravityToolPayloadsuffixes custom tools with_ide(workspace_read→workspace_read_ide), leaves native tools untouched, and returns the reverse map separately — so a refactor ofproviderRequestLogging.tsor the executor could silently re-break Antigravity tool dispatch without tripping the Claude test. Adds a dedicated regression test driving the realcloakAntigravityToolPayloadthrough the capture round-trip and asserting the_idereverse map survives, stays non-enumerable (never re-serializes upstream), and that all-native traffic produces no spurious map (verified failing with the #4153 re-attach removed). No production change. (#4181 — thanks @hertznsk) - test(chatcore): dedicated unit tests for 6 leaves + wire into stryker mutate (QG v2 Fase 9 T5 Fase 3) — adds focused unit tests for 6 chatCore leaf helpers and enrolls them in mutation testing. (#4218)
- test(chatcore): telemetry / memory-skills / semantic-cache tests + wire 2 into stryker (QG v2 Fase 9 T5 Fase 3) — new tests for the telemetry, memory-skills and semantic-cache leaves, two of which are added to the mutation set. (#4222)
- test+ci(chatcore): semanticCache HIT-path fixture (15/15 mutate) + 350min budget headroom — closes the semantic-cache HIT path to a full 15/15 mutation score and gives the nightly auth/accountFallback batches more budget headroom. (#4225)
- test(compression): close F5.1 coverage gaps (replay reducer, live accumulator, StatusDot) — fills the remaining F5.1 compression coverage gaps. (#4192)
- test(db,sse): de-flake db-backup + chatcore streaming timing assertions — stabilizes two timing-sensitive tests (fire-and-forget backup completion + a streaming race). (#4132)
- test: align stale integration tests surfaced post-v3.8.28 on main — realigns integration tests that drifted after the v3.8.28 merge. (#4129)
📝 Maintenance
- refactor(sse): split chatCore.ts pure helpers into chatCore/ modules (−561 LOC) — extracts pure helpers out of the chatCore god-file into dedicated modules (Onda 3). (#4159)
- refactor(chatcore): extract passthrough/header/telemetry helpers (QG v2 Fase 9 T5 C2-C3-C5) — further chatCore decomposition. (#4188)
- refactor(chatcore): extract combo/proxy context cache + semaphore helpers (QG v2 Fase 9 T5 C6-C7) — continues the chatCore split. (#4193)
- refactor(combo): god-file split pilot — types + validateQuality + predicates (QG v2 Fase 9 T5 D1-D3) — first slice of the combo.ts decomposition. (#4162)
- refactor(combo): god-file split part 2 — shadow + sorters + structure (QG v2 Fase 9 T5 D4-D6) — continues the combo.ts split. (#4175)
- refactor(combo): god-file split part 3 — auto strategy (QG v2 Fase 9 T5 D8) — extracts the auto strategy from combo.ts. (#4186)
- refactor(combo): extract round-robin sticky state to
combo/rrState.ts(D7a) — moves round-robin sticky state into its own module. (#4196) - refactor(combo): extract the reset-aware quota block to
combo/quotaStrategies.ts(D7b) — moves the reset-aware quota strategies into their own module. (#4204) - refactor(compression): remove vestigial SLM seam + dead deprecated alias — drops dead compression code. (#4253)
- chore(compression): remove vestigial reconstructCcr/SessionDedup round-trip helpers — removes unused round-trip helpers. (#4226)
- chore(compression): remove dead exports + fix stale llmlingua docs — prunes dead exports and corrects stale LLMLingua docs. (#4223)
- chore(build): build + ship the TPROXY native addon in the standalone (prebuilds 4e) — bundles the native TPROXY addon prebuilds into the standalone build. (#4236)
- chore(ci): add quota + 6 covered chatCore leaves to stryker mutate (QG v2 Fase 9 T5 Fase 3 follow-up) — enrolls more covered leaves into mutation testing. (#4209)
- chore(ci): re-add 8 combo split leaves to stryker mutate + expand nightly batch-matrix 3→5 (QG v2 Fase 9 T5 Fase 3) — restores mutation coverage for the split combo leaves and widens the nightly matrix. (#4205)
- chore(quality): close v3.8.28 cycle gate drift (re-baseline + nightly-mutation scope) — reconciles quality-gate baselines after the v3.8.28 cycle. (#4135)
- ci(mutation): split nightly into 3 parallel batches to fit the 180min budget (QG v2 Fase 9 T0) — parallelizes the nightly mutation run. (#4150)
- ci(mutation): restore cold-seed timeout headroom (a/b lost in #4225 squash) + extend to c/d/g/h — restores and extends per-batch cold-seed timeouts. (#4258)
- ci(docs): harden the fabricated-docs checker + enforce
--strict(QG v2 Fase 9 T9) — tightens the anti-hallucination docs checker. (#4149) - ci: derive the oasdiff base-ref from the package version + flag the mutation-toolchain regression — fixes the OpenAPI-diff base-ref and surfaces a mutation-toolchain regression. (#4134)
- docs(ci): correct the mutation-gate note (no regression —
stryker -cis--concurrency); record Task 12 GO — corrects a misread of the stryker flag and records the spike GO. (#4138) - docs(api): document the
/api/v1/wschat WebSocket endpoint in openapi.yaml — adds the WebSocket chat endpoint to the OpenAPI spec. (#4215) - docs(readme): expand Acknowledgments into a themed, star-counted credits hall — reworks the README acknowledgments section. (#4195)
- style(dashboard): shrink the identity grid cell 46px → 32px (~30% smaller) — tightens the identity grid density. (#4143)
🔧 Dependencies
- deps: bump the production group with 5 updates — routine production-dependency bumps. (#4121)
- chore(deps): bump github/codeql-action from 3 to 4 — CI action update. (#4120)
- chore(deps): bump actions/setup-python from 5 to 6 — CI action update. (#4119)
What's Changed
- Release v3.8.29 by @diegosouzapw in #4126
Full Changelog: v3.8.28...v3.8.29