[3.8.43] — 2026-07-02
✨ New Features
-
usage (quota percentages + provider USD drilldown):
@@om-usageand the HTTP usage endpoint now report personal API-key quotas as remaining percentages (USD amounts stay out of the command output), provider quota remaining is scaled by the configured quota cutoff so the protected reserve reads as 0% left, and the quota dashboard regains a provider USD cost drilldown (/api/usage/provider-window-costs+ProviderUsdCostModal, management-auth gated). Also honors observed provider quota resets: a same-resetAtreset (usage dropping back to the reset floor) is detected and preferred over stale recorded weekly events for provider USD windows and API-key USD quotas. Newsrc/lib/usage/providerWindowCosts.ts. Regression guards:tests/unit/provider-window-costs.test.ts,tests/unit/internal-usage-command.test.ts,tests/unit/api-key-usage-limits.test.ts,tests/unit/lib/quota-reset-events.test.ts. Extracted from #5863 by @Witroch4. -
dashboard (live WS behind reverse proxy): the live dashboard WebSocket can now be fronted by a reverse proxy or Cloudflare Tunnel via
NEXT_PUBLIC_LIVE_WS_PUBLIC_URL(e.g.wss://ws.my-ai.com/live-ws). The URL is honored both at build time (env inlined into the bundle) and at runtime for prebuilt Docker/npm images: the/api/v1/ws?handshake=1handshake now echoes a lazily-readlive.publicUrl(onlyws:///wss://values are accepted; anything else is rejected tonull), anduseLiveDashboardresolves the URL from that handshake before connecting, falling back to the previousws(s)://hostname:20129default. Also documentsLIVE_WS_ALLOWED_HOSTSand aligns the GitLab Duo OAuth scopes line in.env.examplewith the live config (ai_features read_user). Regression guard:tests/unit/live-ws-public-url.test.ts(5). (#5877 by @ianriizky) -
providers (CLI profile auto-sync): opt-in toggles to auto-regenerate CLI tool profiles after a provider model sync. When enabled, a model-catalog change (re)writes that tool's profile files from the live catalog — Codex (
~/.codex/*.config.toml) and now Claude Code (~/.claude/profiles/<name>/settings.json, via an extractedsyncClaudeProfilesFromModels+ a newclaudeProfileAutoSync.tsmirroring the Codex path). Both are off by default and never touch the active/default CLI config; they are backed by theOMNIROUTE_AUTO_SYNC_CODEX_PROFILES/OMNIROUTE_AUTO_SYNC_CLAUDE_PROFILESfeature flags (DB/dashboard override > env > default "false") and additionally gated behind the existingCLI_ALLOW_CONFIG_WRITESwrite-guard. A "CLI profile auto-sync" card on the CLI Code dashboard toggles each (moved from the providers dashboard in #5778 — thanks @rdself). Regression guards:tests/unit/claude-profile-auto-sync-gate.test.ts,tests/unit/codex-profile-auto-sync-gate.test.ts,tests/unit/cli/setup-claude.test.ts(follow-up to #5737). -
cli (startup banner): the
servestartup banner now prints the running OmniRoute version (v3.8.x) beneath the ASCII logo, so the active version is visible at a glance without a separate--versioncall. Regression guard:tests/unit/cli-serve-version-banner.test.ts. Thanks @chirag127 (#5752). -
analytics (subscription cost): flat-rate providers now show $0 in cost analytics instead of an inflated per-token estimate. Subscription / coding-plan providers (every cookie-web provider — ChatGPT Web, grok-web, … — plus the dedicated Minimax Coding, Kimi Coding, GLM Coding, Alibaba Coding Plan, and Xiaomi MiMo plans) bill a flat fee, not per token, yet still carry per-token pricing rows used for estimates — so the analytics dashboard over-reported their cost. A new flat-rate classifier (
src/lib/usage/flatRateProviders.ts) is consulted by the analytics surfaces (analytics route, usage stats, usage analytics) via an opt-inflatRateAsZerocost option, so those providers read $0 while budget / quota / routing keep estimating unchanged. Deliberately NOT zeroed:codex/cx(OmniRoute actively tracks Codex token cost — Fast-tier multipliers, GPT-5.x pricing — and Codex can be a metered account),byteplus(metered ModelArk),minimax-cn(metered China API). Regression guard:tests/unit/flat-rate-cost-5552.test.ts. (#5552) -
mcp (RTK): expose the RTK tool-output learn/discover workflow as two new MCP tools so an agent can grow the RTK filter catalog without leaving the protocol.
omniroute_rtk_discoveranalyzes recently captured raw tool output (discoverRepeatedNoise/suggestFilter) and returns candidate noise patterns plus a suggested filter;omniroute_rtk_learnlists the captured command samples (listRtkCommandSamples) and resolves a command to its RTK filter id (commandToId). Both are read-only (scoperead:compression), wrap the existing RTK discovery primitives (no new logic in the engine), and log to the MCP audit trail. Regression guard:tests/unit/compression/rtk-mcp-tools.test.ts(4). gaps v3.8.42 — T07. -
compression (LLM tier): add an opt-in, default-off LLM-tier compression engine (
llm) that condenses the prose of non-system messages via a pluggable chat-completion backend. It mirrors thellmlinguaengine's contract but is safe by construction: the default backend is a no-op pass-through (the engine never mutates the payload until an operator both enables it and wires a real backend viasetLlmCompressorBackend()), it is not part of the default stacked pipeline,enableddefaults tofalse, fenced code blocks andsystemmessages are never sent to the model, and every backend error fails open (the original segment/body is kept, never thrown). AminTokensfloor skips small prompts. The real production backend is intentionally a VPS-validated follow-up (Hard Rule #18), exactly as thellmlinguaworker backend is gated. Newopen-sse/services/compression/engines/llm/index.ts. Regression guard:tests/unit/compression/llm-compressor-engine.test.ts(8). gaps v3.8.42 — T05/C3. -
memory (typed decay): add opt-in typed memory decay (TV6) so the conversational memory store stops accumulating stale
episodicnoise. Each injected memory now tracks anaccess_count+last_accessed_at(always-on, non-destructive telemetry; migration111_memory_typed_decay), and an opt-in, default-off sweep (MEMORY_TYPED_DECAY_ENABLED, defaultfalse) deletes memories that are past a per-type TTL and not immune. Onlyepisodicdecays by default (30d, env-tunable);factual/procedural/semanticare immune, and any memory accessed>= 3times earns access immunity (mirroring "guardrail/convention/decision never decay"). The decay clock re-bases on the last access, so used memories survive. Deletions reusedeleteMemory(SQLite + sqlite-vec + Qdrant stay in sync) and fail open; an optional periodic sweep is doubly opt-in (also needsMEMORY_TYPED_DECAY_SWEEP_INTERVAL>0). With the flag off nothing is ever deleted (Rule #20 spirit). Newsrc/lib/memory/typedDecay.ts. Regression guard:tests/unit/memory/typed-decay.test.ts(15). gaps v3.8.42 — T10/TV6. -
dashboard (combos): the named-combos editor now lets you drag to reorder the stacked-compression pipeline instead of only editing fixed-position steps. A new pure model (
src/shared/components/compression/compressionPipelineModel.ts) owns add/remove/move/update with the engine→intensity invariant and a never-empty guarantee, and a@dnd-kit/sortableeditor (CompressionPipelineEditor.tsx, matching the sidebar reorder pattern) replaces the inline list inCompressionCombosPageClient. Order persists through the existing combos endpoint. Regression guards:tests/unit/compression-pipeline-model.test.ts(11) +tests/unit/ui/compression-pipeline-editor.test.tsx(4). A dedicatedtests/e2e/compression-studio.spec.ts(Tela A render + tab switch) closes the studios e2e gap the combo-live spec did not cover. gaps v3.8.42 — T06 + T03. -
compression (pipeline): add an opt-in, default-off per-engine circuit-breaker to the stacked compression pipeline (T02). When an engine throws repeatedly across requests, its breaker opens and the stacked loops skip that engine (keeping the body verbatim for that step — fail-open) for a cooldown, then probe once (lazy half-open); success closes it, a failed probe re-opens it. This is distinct from the provider circuit-breaker (
src/shared/utils/circuitBreaker.ts, provider-scoped + DB-persisted) — the newpipelineEngineBreaker.tsis engine-scoped, process-local, and adds zero DB/IO on the hot path. It composes with the existing per-request TV1 bail-out (which skips within a single request); the breaker adds cross-request memory. Default off (COMPRESSION_PIPELINE_BREAKER_ENABLED=false) → byte-identical to the pre-breaker pipeline (a throwing engine still propagates unless TV1 is separately enabled). Configurable per-call, per-CompressionConfig, or via env (_THRESHOLD/_COOLDOWN_MS). Regression guard:tests/unit/compression/pipeline-circuit-breaker.test.ts(9, incl. a throwing-engine integration); existing strategySelector/bail-out suites stay green. gaps v3.8.42 — T02 (2.2). -
compression (CCR): the CCR retrieval-feedback (H8) is now graduated instead of a binary cliff. Previously a block retrieved
>= 3times was flagged do-not-compress and everything below that stayed fully compressible. Now each prior retrieval raises a block's effectiveminCharslinearly (effectiveMinChars), so frequently-retrieved content is compressed progressively less; the>= 3exclusion is preserved (asInfinity). The ramp is controlled by aretrievalRampFactor(default2, per-combo config orCOMPRESSION_CCR_RETRIEVAL_RAMP_FACTOR);1reproduces the exact legacy binary behavior. Per-(principal, hash)isolation is unchanged. Regression guard:tests/unit/compression/ccr-retrieval-ramp.test.ts(12); existing CCR suites (51) stay green. gaps v3.8.42 — T08/H8. -
compression (cache-aware): add an opt-in, default-off usage-observed prefix freeze (H5). The cache-aware guard previously preserved the system prompt only for providers a static heuristic recognized as caching. It now also learns which system prompts actually recur: once a system prompt has been observed
>=a threshold across requests, it is treated as a stable cacheable prefix and preserved from compression even for providers the static check misses — recovering prompt-cache hits that a prefix-compressing mode would otherwise bust. Content-addressed by a hash of the system prompt (OpenAI / Claude / Gemini shapes), in-memory + bounded, zero DB/IO; a "freeze" only preserves the prefix, so it never mutates a payload. Default OFF (COMPRESSION_PREFIX_FREEZE_ENABLED, threshold_THRESHOLD); respects theneverpreserve-mode (never freezes). Newopen-sse/services/compression/prefixFreeze.ts, wired intoresolveCacheAwareConfig. Regression guard:tests/unit/compression/prefix-freeze.test.ts(10); 44 existing cache-aware / preserve-mode tests stay green. gaps v3.8.42 — T08/H5. -
compression (read-lifecycle): add a new opt-in, default-off
read-lifecycleengine (H7) that collapses stale/superseded file-Read tool results. In agentic conversations the same file is Read repeatedly; an earlier Read becomes stale once the same path is re-read (superseded by a newer view) or modified by a later Write/Edit. The engine replaces those earlier Read results with a short stub — keeping only the current (last, un-superseded) Read intact — recovering the tokens the model no longer needs. Unlikesession-dedup(identical-content) orccr(reversible markers), this is semantic + lossy, so it is opt-in (enableddefaultsfalse). Conservative by construction: matches only well-known Read/Write tool names, compares exact paths, collapses a Read only when a strictly-later invocation touches the same path, and fail-opens on any unexpected shape. Supports both the Anthropic (tool_use/tool_result) and OpenAI (tool_calls+role:"tool") shapes. Newopen-sse/services/compression/engines/readLifecycle/index.ts. Regression guard:tests/unit/compression/read-lifecycle.test.ts(10). gaps v3.8.42 — T08/H7. -
observability (correlation IDs): requests now carry a correlation id threaded through logs so a single request can be traced end-to-end across the pipeline. (#5834 — thanks @hartmark)
-
cli (startup banner — boot time): the
serveready banner now shows how long startup took, so slow-boot conditions are visible at a glance. (#5799 — thanks @ishatiwari21) -
api (quota-policy bypass scope): add an opt-in API-key provider quota-policy bypass scope, so a designated key can be exempted from provider quota enforcement without disabling quotas globally. (#5731 — thanks @Witroch4)
-
providers (Ollama local): add a first-class Ollama local-provider card to the providers dashboard so the local LLM runtime can be configured like any other provider. (#5712 — thanks @diegosouzapw)
-
codex (fallback profiles): generate fallback CLI profiles for Codex-compatible models so compatible models get a usable profile automatically. (#5701 — thanks @skyzea1)
-
api (response-body validation + failover): add a configurable response-body validation step that can fail a target over to the next candidate when the upstream returns a structurally-invalid body (routing/#4985). (#5684 — thanks @diegosouzapw)
-
providers (SenseNova): complete the SenseNova free Token Plan — chat completions plus Text-to-Image (ported from 9router#2233). (#5679 — thanks @diegosouzapw)
-
db (self-correcting context windows): add self-correcting model context-window overrides so a model whose advertised context length is wrong is corrected automatically (models/#5004). (#5667 — thanks @diegosouzapw)
-
routing (latency strategy): optimize the latency routing strategy using observed per-target performance metrics for better candidate selection. (#5629 — thanks @KooshaPari)
-
compression (preserveSystemPrompt mode): add a
preserveSystemPromptmode enum (always|whenNoCache|never) with legacy back-compat, giving operators explicit control over when the system prompt is protected from compression (T05/C5). (#5653 — thanks @diegosouzapw) -
commandCode (vision): add multimodal image support for Command Code vision models. (#5557 — thanks @Stazyu)
-
compression (read-lifecycle engine): T08/H7 (2.5) — an opt-in read-lifecycle engine that collapses superseded file reads so stale earlier reads of the same file are pruned from the context. (#5754 — thanks @diegosouzapw)
-
compression (usage-observed prefix freeze): T08/H5 (2.4) — opt-in prefix freeze driven by observed usage, keeping a stable cached prefix from being rewritten by downstream engines. (#5744 — thanks @diegosouzapw)
-
compression (CCR retrieval-feedback ramp): T08/H8 (2.3) — a graduated Context-Compression-Ratio retrieval-feedback ramp that tunes compression aggressiveness from retrieval signals. (#5739 — thanks @diegosouzapw)
-
compression (per-engine circuit breaker): T02 — an opt-in per-engine pipeline circuit-breaker that disables a misbehaving compression engine without failing the whole pipeline. (#5735 — thanks @diegosouzapw)
-
compression (LLM-tier engine): T05/C3 — an opt-in LLM-tier compression engine that uses a model pass for higher-ratio semantic compression. (#5702 — thanks @diegosouzapw)
-
dashboard (compression pipeline editor): T06/T03 — a drag-to-reorder compression pipeline editor plus a compression-studio e2e flow. (#5727 — thanks @diegosouzapw)
-
memory (typed decay): T10/TV6 — opt-in typed memory decay so aged, low-value memories fade on a per-type schedule. (#5723 — thanks @diegosouzapw)
-
mcp (RTK tools): T07 — expose the RTK learn/discover capabilities as first-class MCP tools. (#5691 — thanks @diegosouzapw)
-
providers (CLI profile auto-sync): opt-in CLI profile auto-sync toggles, including Claude Code auto-sync, so generated CLI profiles can track provider changes automatically. (#5755 — thanks @diegosouzapw)
🔧 Bug Fixes
-
fix(opencode): stop fabricating
User-Agent: opencode/localandx-opencode-client: cliheaders when the client sends none — the executor-dedup refactor (#5720) accidentally re-introduced header fabrication, violating the forward-only contract (inventing opencode-internal values risks upstream rejection). Restored to forward-only: those headers are emitted only when a real client source is present. Regression guard:tests/unit/opencode-executor.test.ts. (thanks @diegosouzapw) -
fix(executors):
resolveEffectiveKeyreturnsundefined(not"") when no API key is present — a type-coercion cleanup (#5798) changedapiKey ?? ""to satisfy the typechecker, silently mutating auth-key resolution semantics. Widened the return type tostring | undefinedand reverted the coercion so OAuth-only credentials resolve correctly. Regression guard:tests/unit/refactor-buildHeaders-preamble.test.ts. (thanks @diegosouzapw) -
fix(translator): restore the terminal
message_delta+message_stopon Responses→Claude streams — the doubled-tool-args dedup (#5828) guarded the finish handler on the sharedstate.finishReason, which the openai-responses→openai leg sets first in the hub path, so the openai→claude leg dropped its terminal events and the stream ended aftercontent_block_delta. The dedup now uses a dedicatedstate.claudeFinishEmittedflag. Regression guard:tests/unit/claude-code-rendering-fixes.test.ts. (thanks @diegosouzapw) -
fix(pricing): add the Kiro
claude-sonnet-5pricing row so the newly-catalogued model (#5796) no longer reports$0.00usage. Regression guard:tests/unit/catalog-updates-v3x.test.ts. (thanks @diegosouzapw) -
fix(github): keep Copilot access-token sessions active. GitHub Copilot device-flow accounts can hold a GitHub access token plus a short-lived Copilot token without a refresh token; the proactive health check treated that as terminal
no_refresh_tokenand marked the connection expired minutes after login. The health check now keeps those sessions active, clears staleno_refresh_tokenstate, and refreshes the Copilot sub-token when needed. Regression guard:tests/unit/token-health-no-refresh-token-expired-5326.test.ts. Extracted from #5863 by @Witroch4. -
fix(kiro): bound the Claude model-id dash→dot normalization to a 1–2 digit minor so date-suffixed ids (e.g. claude-opus-4-20250514) are no longer corrupted. (thanks @voravitl)
-
fix(usage): preserve (bounded) tool definitions in request logs even when the request body is truncated, so the request-details view can still show available tools. (thanks @noir017)
-
fix(providers): route OpenAI responses-only models to
/v1/responsesinstead of 404ing on/v1/chat/completions. The curatedgpt-5.5-pro/gpt-5.4-proentries never worked (OpenAI only serves*-proreasoning models via the Responses API), and "Test all models" surfaced the same 404s. The registry entries now carrytargetFormat: "openai-responses"(reusing the existing per-model translation plumbing shared withgh/codex),DefaultExecutor.buildUrlswaps theopenaiendpoint to/responsesin lockstep (honoring custom base URLs), and a-prosuffix heuristic covers dynamically-synced ids such aso1-pro/gpt-5.2-pro(same spirit as the gh executor's/codex/irouting, 9router#102). Legacy completions-only ids (e.g.gpt-3.5-turbo-instruct) are out of scope — they are not in the catalog and OmniRoute has no legacy/v1/completionsupstream. Regression guard:tests/unit/openai-responses-only-models-5842.test.ts(8). Thanks @maikokan. (#5842) -
fix(image): keep bare codex image aliases (e.g.
gpt-5.5) resolving to the codex image pipeline even when a combo shares the same name. A chat combo namedgpt-5.5used to shadow the bare image alias inresolveImageRouteModel, hijacking/v1/images/*requests to a chat target (regression path adjacent to #5887); codex bare models are now reserved before bare-combo resolution, while non-codex aliases (e.g.gpt-image-2) remain user-shadowable (#3214/#3215 behavior preserved). Regression guard:tests/unit/image-routes-combo-edits-3214-3215.test.ts(9). (#5902 by @KooshaPari) -
fix(ci): re-green the
release/v3.8.43fast-gates queue — every PR→release was inheriting base-reds (#5798). Five distinct blockers cleared: (1) stalemodelContextOverridesentry in thecheck:db-rulesintentionally-internal allowlist (#5827 allowlisted it while the #5609 fix re-exported it fromlocalDb.ts; the re-export stays, the obsolete entry goes, classification guard re-pinned to 33); (2)LIVE_WS_ALLOWED_HOSTS/NEXT_PUBLIC_LIVE_WS_PUBLIC_URLdocumented indocs/reference/ENVIRONMENT.md(env/docs contract, from #5877); (3) the Router Backends ADR's references to the not-yet-merged registry (#5868) marked as landing-with-PR socheck:fabricated-docs --strictpasses; (4)antigravity-429-quota-tdd+middleware-header-strip-5849added to strykertap.testFiles(check:mutation-test-coverage); (5) file-size / complexity / cognitive-complexity ratchets rebaselined with justification notes — all drift measured identical on the pristine tip and this PR (net-zero). Regression guard:tests/unit/check-db-rules-classification.test.ts. (#5798) -
providers (codex image auto-routing regression): an unprefixed
gpt-5.5request from a codex-only setup (no OpenAI connection) now correctly infers thecodexprovider again — the OpenAI static-catalog short-circuit inresolveModelByProviderInferencewas preempting the codex-preference block, sogpt-5.5(added to the OpenAI catalog) stopped auto-routing to Codex image generation. Users with an active OpenAI connection are unaffected (OpenAI stays default). Regression guard:tests/unit/codex-gpt55-routing-5887.test.ts. (#5887) -
api (proxy header hygiene): upstream
x-middleware-*control headers (emitted by providers hosted behind Next.js, e.g. synthetic.new) are now stripped from proxied responses instead of forwarded verbatim — forwardingx-middleware-rewritemade Next 16 throwNextResponse.rewrite() was used in a app route handlerand return 500 despite a successful upstream call. Applies to both streaming and JSON paths. Regression guard:tests/unit/middleware-header-strip-5849.test.ts. (#5849) -
docs (pnpm global install): replaced the unsupported
pnpm approve-builds -gstep with the install-timepnpm add -g omniroute@latest --allow-build=better-sqlite3flag across README + Setup Guide (and i18n mirrors), fixing native-build approval for pnpm v11 global installs. (#5554) -
dashboard (token badge): the red "Token Expired" connection badge no longer flashes for OAuth refresh-capable providers (Antigravity/Gemini) whose access token merely lapsed but is auto-refreshed — it now shows only when the connection is terminally expired (
testStatus === "expired"). Continuation of #5326. Regression guard:tests/unit/ui/connection-row-token-badge-5836.test.tsx. (#5836) -
db (auto backup toggle): full pre-write SQLite backups now honor the persisted
backup.autoBackupEnableddashboard setting — previously only theDISABLE_SQLITE_AUTO_BACKUPenv var was checked, so disabling auto-backup in the UI had no effect and ~70MB pre-write snapshots kept firing. Manual and pre-restore backups still always run. Regression guard:tests/unit/db-backup-autobackup-setting-5871.test.ts. (#5871) -
providers (auto/ routing for custom providers): custom OpenAI-/Anthropic-compatible providers (dynamic
*-compatible-*connection IDs) are no longer excluded fromauto/routing — the Auto-Combo virtual factory previously skipped any connection whose provider was absent from the static registry. It now falls back to the connection'sdefaultModel. Regression guard:tests/unit/auto-custom-provider-5873.test.ts. (#5873) -
middleware (hook sandbox): operator-authored pre-request hook code now runs inside a hardened Node
vmsandbox (minimal context, no ambient globals/process.env, execution timeout, norequire) instead ofnew Function()in the main process — closing the Hard Rule #3 / SonarCloud S1523 exposure. Regression guard:tests/unit/middleware-hook-sandbox-5872.test.ts. (#5872) -
mcp-server (auth forwarding): the per-caller MCP identity forwarded via
withMcpHttpAuthContextnow wins over the staticOMNIROUTE_API_KEYenv fallback in the internal-fetch helpers (apiFetch,omniRouteFetch) — previously the env key was spread after the forwarded headers and clobbered the caller'sAuthorization. Regression guard:open-sse/mcp-server/__tests__/httpAuthContext.test.ts. (#5819) -
dashboard (Modal provider — two-field auth): the Modal provider connection form now exposes two fields — Token ID + Token Secret — instead of a single API-key input, since Modal authenticates with
Authorization: Bearer <token-id>:<token-secret>. The dashboard combines the two fields into theid:secretcredential before saving (combineModalCredential, trims both parts), while a value pasted in the legacy single-field format keeps working verbatim (empty secret → passthrough), so existing saved connections need no migration; the key-help link points at Modal's token settings. Regression guard:tests/unit/modal-credential-combine.test.ts(5). (#5881, closes #5446) Follow-up: the Validation Model Id field is now pre-filled for Modal with the same model the server-side validator probes (Qwen/Qwen3-4B-Thinking-2507-FP8, shared viaMODAL_DEFAULT_VALIDATION_MODEL_IDinsrc/shared/constants/modal.ts), closing the last checklist item of #5446. Regression guard:tests/unit/modal-validation-model-prefill.test.ts. -
api (chat completions — early SSE keepalive gate): the
/v1/chat/completionsroute wrapped the response in the early-stream keepalive wheneverstreamwas not explicitlyfalse, so a client that omittedstreamand asked for JSON (Accept: application/json) could receive premature SSE framing. The keepalive wrapper is now gated on an explicitstream: truein the body or an Accept header that forces SSE (acceptHeaderForcesStream); the parsed body is passed to the chat handler untouched, so the actual stream/JSON framing stays decided bychatCore/resolveStreamFlag— preserving OmniRoute's legacy streaming default whenstreamis omitted and the per-keystreamDefaultMode: "json"opt-in. Regression guard:tests/unit/chat-combo-live-test.test.ts("returns JSON without early SSE framing when stream is omitted and Accept is application/json"). (#5866 by @rdself) -
fix(github): drop a trailing assistant prefill before dispatching to GitHub Copilot chat to avoid 400 errors. (thanks @baslr)
-
fix(oauth): prevent cross-IdP account overwrites by disambiguating OAuth connections on
usernamewhen present, not email alone. (thanks @KunN-21) -
fix(mitm): best-effort revert privileged /etc/hosts entries on exit when a sudo password is cached, instead of always leaving orphaned state. (thanks @manhdzzz)
-
providers (Kiro — Claude Sonnet 5): the Kiro provider's model catalog was missing
claude-sonnet-5, so the model could not be selected or routed even on accounts that already had access to it ("claude-sonnet-5 is not supported"). Added the model to the Kiro registry (open-sse/config/providers/registry/kiro/index.ts) as a 1M-context / 128K-output Claude model, mirroring the existing Claude entries; the registrymodels[]feeds both the model selector and the live CodeWhispererListAvailableModelsfallback, so the model is now selectable and routable. Regression guard:tests/unit/kiro-claude-sonnet-5-2267.test.ts. (thanks @openbioinfo) -
settings (model aliases — self-heal after restart): the Settings → Routing page showed "No exact-match aliases configured" after a server restart even though the aliases were persisted in the DB. Aliases are held in a module-local
_customAliasesmap inmodelDeprecation.tsthat the boot path hydrates, but Next.js compiles the app-route module graph separately from the startup graph (the same webpack chunk-splitting class as #5312), so theGET /api/settings/model-aliaseshandler read a different, un-hydrated copy. The handler now self-heals: when its in-memory alias map is empty it readssettings.modelAliasesfrom the DB (via the existinggetSettings()db module — no raw SQL in the route) and repopulates the map, so the UI reflects the persisted aliases on the first GET after a restart. Follow-up: the root cause is now also fixed — the_customAliasesstore inmodelDeprecation.tsis backed byglobalThis(key__omniroute_customAliases__), so the startup and app-route module graphs share one store and the route reads the boot-hydrated aliases directly (the DB self-heal remains as a harmless fallback), mirroring the sameglobalThissingleton pattern already applied tothinkingBudget.ts/backgroundTaskDetector.ts(#5312). Regression guards:tests/unit/model-aliases-settings-route-selfheal.test.ts+tests/unit/model-aliases-globalthis-5777.test.ts. (#5777 — thanks @jleonar2) -
providers (grok-cli token auto-refresh): grok-cli OAuth tokens were never proactively refreshed before their real expiry.
mapTokenshardcodedexpiresIn: 21600(6 h) regardless of the token's actual lifetime, so the persistedexpiresAtwas always "now + 6 h" and the proactivetokenHealthChecksweep (refresh whenexpiresAt - now < 5 min) fired 6 h after import instead of shortly before the token really expired.mapTokensnow computesexpiresInfrom the authoritativeexpires_atfield in~/.grok/auth.json(ISO → epoch-seconds) with a fallback to the JWTexpclaim (payload-only decode, no signature trust); the hardcoded21600is kept only when neither is present. An already-expired token (realexpires_at/expin the past) is now clamped to a positiveexpiresInviaMath.max(1, …), so the import route stores a near-futureexpiresAtand AutoCombo refreshes the connection instead of reading a past date and excluding it outright. Regression guards: 5 cases intests/unit/grok-cli-oauth.test.ts(JWTexp, JSONexpires_at, the21600fallback, and the two expired-token clamps). (#5775 — thanks @Chewji9875) -
compression (CCR retrieve via MCP HTTP): the
omniroute_ccr_retrieveMCP tool returned"CCR block not found"for blocks stored earlier in the same session when called over the MCP HTTP transports (SSE / Streamable HTTP), e.g. from OpenCode in a Docker deployment. Compression stores each block keyed by the API-key principal (String(apiKeyInfo.id)), but the tool resolved the caller viaextra.authInfo.clientId— which the MCP SDK never populates for API-key auth — so it fell back to"anonymous"and the compound store-key never matched. The retrieve tool now resolves the caller's API-key id from the MCP HTTP auth context (httpAuthContext) using the samegetApiKeyMetadatalookup used at storage time, so retrieval matches storage. Cross-tenant IDOR isolation is preserved: a different key resolves to a different id → miss; no key → the anonymous bucket only. Regression guard:tests/unit/compression/ccr-mcp-principal-5649.test.ts(extraction, distinct-principal isolation, fail-closed, end-to-end store→retrieve). (#5649) -
compression (context-editing telemetry): streaming responses now record Context Editing savings. Anthropic surfaces
context_management.applied_edits[]on the finalmessage_deltasnapshot of an SSE stream, but the streaming reconstruction (buildStreamSummaryFromEvents→ Claude branch) droppedcontext_managemententirely and no telemetry hook was wired into the streaming finalizer — so the delegated server-side context-clear savings (cleared_input_tokens/cleared_tool_uses) surfaced under enginecontext-editingin compression analytics only for non-streaming responses. The collector now preservescontext_managementfrom the final snapshot (last-writer-wins), andonStreamCompletemirrors the non-streamingrecordContextEditingTelemetryHook(best-effort, Claude-only, HTTP 200 only). Purely additive telemetry — no payload mutation, no new env flag, no behavior change when the stream carries nocontext_management. Regression guard:tests/unit/context-editing-streaming-telemetry.test.ts(3). gaps v3.8.42 — T01 (5.1). -
proxy (relay test diagnostics): the Proxy Pool "Test" button showed a bare "failed" with nothing in the server logs when a relay (Vercel / Deno / Cloudflare) responded with a non-200 — e.g. a
401from an auth-token mismatch after aSTORAGE_ENCRYPTION_KEYrotation. The relay success-path response setsuccess: falsebut carried noerrorfield, so the dashboard had no reason to show and the server logged nothing. The test now returns an actionableerror(the HTTP status, plus an auth/encryption-key hint on401/403) and logs the failure server-side; the SOCKS5/HTTP proxy path now logs its failures too. Shaping extracted tobuildRelayTestResultwith a regression guard (tests/unit/proxy-relay-test-error-5716.test.ts). Note: this surfaces why a relay fails — it does not repair a genuinely broken/misconfigured relay. (#5716) -
fix(dashboard): add error boundaries for the Combos and MITM Proxy pages so a render error shows a recoverable fallback instead of a blank page. (thanks @wahyuzero)
-
providers (onboarding wizard — unsupported validation): adding a provider whose credentials have no live validator (LMArena, PiAPI, …) failed silently in the Add-Provider wizard. The
/api/providers/validateendpoint returnsHTTP 400 + { unsupported: true }for these (#5565/#5567), but the wizard'svalidateOnboardingApiKeyran it throughexpectOk, which threw on the non-200 — so the flow jumped to the error step and the connection was never created. The wizard now treatsunsupported: trueas a non-blocking "can't verify" and proceeds to save, mirroringAddApiKeyModal. Regression guard added totests/unit/provider-onboarding-wizard.test.ts. (related to #5692) -
dashboard (Quick Start step 1): the Quick Start "Create API key" step told users to "Go to Endpoint → Registered Keys" and linked to
/dashboard/endpoint, but API keys are created on the API Manager page (/dashboard/api-manager, sidebar "API Keys") — the Endpoint page has no "Registered Keys" section, so users followed the link and could not find where to create a key. Step 1 now reads "Go to API Keys" and links to/dashboard/api-manager. Regression guard:tests/unit/ui/quick-start-api-keys-link-5695.test.ts. (#5695) -
providers (DashScope/Alibaba setup link): the "Get API key" link for the Alibaba and Alibaba (China) providers pointed at the bare API host (
dashscope-intl.aliyuncs.com/dashscope.aliyuncs.com), which returns 404 in a browser — API hostnames have no homepage. Repointed to the consoles where keys are actually issued:bailian.console.alibabacloud.com(international) anddashscope.console.aliyun.com(China). Same class as #5572/#5574/#5576; regression guard added totests/unit/provider-setup-links-5572.test.ts. (#5665) -
thinking / runtime-config (module-graph fix): operator-configured proxy settings that are hydrated at boot but read per-request were silently ignored in production. Next.js compiles
instrumentation.ts(boot hydration viaapplyRuntimeSettings/ restore hooks) as a separate webpack module graph from the app-route / open-sse executors, so a module-locallet _configsingleton is duplicated — the boot copy is hydrated but the request path reads a different, un-hydrated copy. Live VPS validation proved the Thinking-Budget hydration ran to completion at boot yetbase.tsstill saw thepassthroughdefault (this is why #5312 fix A stayed broken even after the boot-wiring fix). Fixed by backing the singletons withglobalThis(the patternsystemPrompt.tsalready uses for the Global System Prompt, #2470), so all module-graph copies share one instance:thinkingBudget.ts(the dashboard Thinking-Budget mode now reaches the executor),backgroundTaskDetector.ts(the opt-in background-model degradation now actually fires on requests), andsystemTransforms.ts(operator pipeline overrides now reach the request path).payloadRules.tswas already safe (it lazily self-loads from the DB per request, #2986). Regression guards:tests/unit/thinking-budget-globalthis-5312.test.ts+tests/unit/runtime-config-globalthis-5312.test.ts(assert globalThis-backed sharing; a module-localletfails them). (#5312) -
thinking (Claude OAuth): restore the proxy-level Thinking-Budget config on startup. The dashboard mode (
auto/custom/adaptive) is persisted undersettings.thinkingBudget, but the boot-time hydration (hydrateThinkingBudgetConfig) was only wired intosrc/server-init.ts— an unused module that never runs in production — so the operator's choice silently reverted to thepassthroughdefault on every restart (#5312 fix A was non-functional, even though its direct unit test passed). The hydration now runs in the real boot path (src/instrumentation-node.ts), alongside the Global System Prompt restore. Surfaced by live Anthropic-OAuth validation on the VPS. Regression guard:tests/unit/thinking-budget-boot-wiring-5312.test.ts(asserts the production boot module calls the hydration, not just the function in isolation). (#5312) -
translator/chatcore (hardening): re-apply two defensive review-fixes that were dropped in a branch rebuild before #5661 / #5662 landed. (1)
mergeConsecutiveSameRoleContents(OpenAI→Gemini) now shallow-copies each entry and itspartsarray instead of pushing the input reference, so the consecutive-same-role merge never mutates the caller's objects. (2)defaultClaudeToolType(Claude tool defaults) now passes any non-object array entry (null/ primitive) through unchanged instead of spreading it into a fabricated{ type: "custom", … }tool. No behavior change on real payloads (Gemini contents are freshly built; Claude tools are always objects); both properties are now locked by regression tests intests/unit/translator-gemini-consecutive-role-2191.test.tsandtests/unit/claude-tool-type-default-2195.test.ts. -
providers (grok-cli): truncate the tool list when it exceeds a provider's hard limit, so grok-cli (
cli-chat-proxy.grok.com, max 200 tools) no longer rejects requests withMaximum tools limit reached. Adds a proactivePROVIDER_TOOL_LIMITSmap (grok-cli: 200, consulted before the reactive cache), a corrected limit-parsing regex that captures the stated maximum (200) instead of the supplied count (427), and removes the broken< MAX_TOOLS_LIMITtruncation gate so truncation now fires whenevertools.lengthexceeds the effective limit. Regression guard:tests/unit/tool-limit-detector.test.ts. (#5563 — thanks @Chewji9875) -
resilience (antigravity): record model lockout for Antigravity
429 rate_limit_exceedederrors. Antigravity's"Resource has been exhausted (e.g. check quota)."text was matched by overly broadQUOTA_PATTERNSand misclassified asQUOTA_EXHAUSTED, so the combo retry path was skipped (providerExhausted) and the model was never cooled down. Classification now prefers the structured error code —classifyErrorText(structuredError?.code || errorText)— so arate_limit_exceededcode is treated as a transient rate-limit (not quota), and the two broad patterns (/resource.*exhaust/i,/check.*quota/i) were replaced with Antigravity-specific ones (individual quota reached,enable overages). (#5579 — thanks @Chewji9875) -
providers (OpenAI-compatible): Codex MCP /
tool_searchdeferred discovery (andapply_patch) now works through a Custom OpenAI-compatible provider. When such a provider received a Responses-API-shaped request that carried MCP /tool_searchtools, OmniRoute downgraded it to/chat/completions, which drops the deferred tool-discovery mechanism — so the MCP namespaces never surfaced to the model andapply_patchwas mis-handled as a JSON tool. The executor now detects a Responses-shaped request (input/previous_response_id/max_output_tokens/reasoning) that carriesnamespace/tool_search*tools and routes it to the upstream/responsesendpoint natively instead of downgrading (it can also be forced viaproviderSpecificData._omnirouteForceResponsesUpstream). This is a distinct code path from the official Codex OAuth backend (#3033 / #4539, which the earlier fix never touched). Regression guard:tests/unit/executor-default-base.test.ts. Thanks to @KooshaPari for the fix. (#5483) -
dashboard (routing): selecting the fusion strategy on the Global Routing defaults tab now reveals fusion-specific config instead of only the generic resilience fields. Fusion's engine knobs —
judgeModel(the model that synthesizes the panel answers) andfusionTuning(minPanel/stragglerGraceMs/panelHardTimeoutMs) — already existed in the schema and the per-combo editor, but the Global Routing tab never surfaced them, so picking "fusion" there was effectively a no-op. The fields are now shown (extracted into a newFusionDefaultsFieldscomponent). Voting / aggregation-mode / per-provider-weight are intentionally not shown — those don't exist in the fusion engine. Regression guard:tests/unit/ui/combo-defaults-fusion-5598.test.tsx. (#5598) -
dashboard (free proxy pool): the free proxy pool "Sync All" no longer fails silently with
Total: 0. Three fixes: (1) the IPLocate source fetched…/protocols/<proto>.jsonand parsed it as JSON, but the upstream list is plain text (<proto>.txt, oneip:portper line) — every protocol 404'd / failed to parse; it now fetches.txtand parses the line list. (2) The sync route isolates each source in its own try/catch, so one provider throwing (e.g. a TLS handshake failure) no longer aborts the whole sync — the working sources still populate the pool. (3) The UI now surfaces the per-source errors the route already returns, instead of discarding the response, so a partial/empty sync explains itself. Regression guards:tests/unit/free-proxy-providers.test.ts,tests/unit/proxy-pool-sync-4878.test.ts,tests/unit/free-pool-tab.test.tsx. (#5595) -
dashboard (memory engine): the memory engine status page no longer mixes English and Portuguese. The embedding / vector-store / rerank status detail strings were hardcoded in Portuguese in the backend (
resolveEmbeddingSource,engineStatus), e.g.auto: nenhuma fonte de embedding disponívelandsqlite-vec ativo, dim=…, while the surrounding UI labels render from the English i18n bundle — so an English user saw a half-translated page. The backend detail strings are now English (auto: no embedding source available,sqlite-vec active, dim=…, etc.), matching the rest of the page. Regression guard:tests/unit/memory-engine-status.test.ts. (#5596) -
providers (cline): stop falsely mapping valid Cline (OAuth) responses to
502 empty_choices+ account cooldown.detectMalformedNonStreamonly recognizedchoices[].message.contentas a string, but some OpenAI-compatible upstreams — Cline via OAuth among them — returncontentas an array of Anthropic-style text blocks inside an OpenAI envelope. A non-empty response (recvBytes > 0) was therefore classified asempty_choicesand turned into a 502 that also cooled the account down. The malformed-response detector now also treats a content array carrying at least one non-emptytextblock as real output. Regression guard:tests/unit/diagnostics.test.ts. (#5559) -
embedded services (Windows): fix CLIProxyAPI install failing instantly with
spawn unzip ENOENTon Windows. The binary extractor spawnedunzip, which is not a Windows system command — it only ships inside Git for Windows'usr/bin, a directory Node'sspawnPATH never sees, so even users with Git installed hit the error. On Windows the extractor now uses PowerShell's built-inExpand-Archive(viaexecFileAsync, no shell — paths pass as a single non-interpreted arg, with''-escaping +-LiteralPathas defense in depth); other platforms keep usingunzip. This is distinct from #5379 (that wasnpm.cmdneedingshell: true). Regression guard:tests/unit/binary-manager-extract-zip-5590.test.ts. (#5590) -
storage (daemon): fix a Node.js out-of-memory crash on startup when
storage.sqlitegrows large (~170 MB+). The boot-time call-log cleanup (cleanupExpiredLogs→rotateCallLogs) ran two unboundedSELECT … FROM call_logs … .all()queries —listReferencedArtifacts(every artifact path) anddeleteCallLogsBefore(every id before the retention cutoff).node:sqlite'sStatementSync.all()materializes the entire result set as JS objects at once, so on a large table the V8 heap blew up and the process crashed before binding (FATAL ERROR: … heap out of memory, native framenode::sqlite::StatementSync::All). Both queries now page throughcall_logsin bounded 5 000-row chunks (newsrc/lib/usage/callLogsBoundedQueries.ts), keeping peak memory flat regardless of table size — no more manual--max-old-space-sizebump required. Regression guard:tests/unit/call-log-oom-unbounded-5618.test.ts. (#5618) -
dashboard (provider setup): fix three provider setup links that pointed at 404 pages. Ollama Cloud / ollama-search linked to
ollama.com/settings/api-keys→ corrected toollama.com/settings/keys(the page moved; Ollama Cloud is a real keyed service, so the field stays). SearchAPI linked to the baresearchapi.io/docs(404) →searchapi.io/docs/google. You.com linked toyou.com/docs/search/overview(404) →you.com/business/api/(the developer portal). All three replacements were verified live. Regression guard:tests/unit/provider-setup-links-5572.test.ts. (#5572, #5574, #5576) -
providers (AI/ML API): the model-import step now loads the live AI/ML API catalog (400+ models) instead of falling back to a stale 6-model seed. The registry had no
modelsUrl, so the route silently used the bundled catalog with an "API unavailable — using local catalog" warning even when the key was valid. AI/ML API exposes its full catalog at the public, auth-freehttps://api.aimlapi.com/modelsendpoint (a bare array of{ id, type, info }, distinct from the OpenAI-compat/v1/models); it's now wired into the models route's discovery config, with the bundled catalog kept as the offline fallback. Regression guard:tests/unit/provider-models-route.test.ts. (#5570) -
providers (CablyAI): mark CablyAI deprecated —
cablyai.comno longer resolves (DNSNXDOMAIN, verified 2026-06-30); the domain is gone. The provider is removed from the models-route discovery config so the import step returns a clean error instead of an unhandled 500 crash (the dead-domain fetch threw with no local-catalog fallback), and the registry entry now carriesdeprecated: true/riskNoticeVariant: "deprecated"so the dashboard flags existing connections (same treatment as the shut-downglhf/kluster.aigateways). Regression guard:tests/unit/provider-models-route.test.ts. (#5568) -
dashboard (provider add): non-LLM search/agent providers no longer fail the model-import step with a red
Provider <id> does not support models listing. Jules (Google Labs coding agent), linkup-search (Linkup web search), ollama-search (Ollama Cloud web search — distinct from the local Ollama LLM), and searchapi-search (SearchAPI SERP) have no/v1/modelsendpoint, so the import surfaced a failure for expected behavior. Each now ships a small static catalog of its selectable capability ids — Linkup'sfast/standard/deepsearch depths, SearchAPI'sgoogle/bing/youtube/… engines, a single Jules/Ollama-web-search entry — so the import step returns a usable list (source: local_catalog) instead of an error. Regression guard:tests/unit/provider-models-route.test.ts. (#5569, #5571, #5573, #5575) -
dashboard (provider add): providers without a live key/cookie validator (e.g. LMArena (Free), PiAPI) can now be saved. The Add-connection modal treated the backend's
"Provider validation not supported"response as a hard Invalid state and blocked Save entirely, leaving those providers impossible to add. The validate route now returnsunsupported: truealongside the message, and the modal treats that as a non-blocking warning — the "Check" badge still shows "validation not supported" (informational), but Save persists the credential as-is. Regression guards:tests/unit/ui/add-api-key-modal-unsupported-save-5565.test.tsx(Save proceeds) andtests/unit/providers-validate-route.test.ts(wire-format). (#5565, #5567) -
providers (codex): fix the Codex Responses WebSocket path (
/v1/responses), which regressed in v3.8.40 with a client-visibleInvalid JSON bodyand bypassed the configured proxy. (1) #5591 — PR #5237 bumped the impersonation TLS profile tochrome_149, butwreq-js@2.3.1only supports up tochrome_147; the unknown profile produced a degenerate fingerprint and ChatGPT rejected the upstream upgrade. The Codex WS path is reverted to the provenchrome_142(the v3.8.39 value), and the over-bumpedgrok-web/claude-webprofiles (masked by their circuit-breaker but silently dropping TLS impersonation) are restored tochrome_146. A new regression guard asserts every configuredchrome_*profile exists in the installedwreq-jstypings (tests/unit/tls-profiles-valid-5591.test.mjs). (2) #5611 — the upstreamwreq-js.websocket()connect ignored the Proxy Registry, so a no-direct-egress Docker container failed with a DNS error; the prepare route now resolves the Global/provider proxy and threads it through to the WS connect. Regression guard intests/unit/responses-ws-proxy.test.mjs. (#5591, #5611) -
providers (GLM): GLM 5.1 / 5.2 now keep the
systemrole instead of having the system prompt folded into the first user turn.roleNormalizer.tsmatched everyglm*id with a blanketstartsWith("glm")/startsWith("glm-")prefix, so the next-generation models — which z.ai documents as supporting thesystemrole (GLM > 5.0) — were normalized as if they rejected it, degrading instruction-following. The matcher is now version-aware: it strips the system role only for bareglm, the 4.x family, and the 5.0 generation, and preserves it forglm-5.1/glm-5.2(and the Fireworksglm-5p1point alias). The ZenMux vendor-prefixedz-ai/glm-*compressed-history rule and the ERNIE rule are unchanged. Regression guards intests/unit/role-normalizer.test.ts. (#5610) -
Security hardening follow-ups (v3.8.15): the
auth_tokencookie now sets an explicit 30-daymaxAgeso sessions persist as intended (Seg3); the management bootstrap warns at boot whenINITIAL_PASSWORDis left at the insecureCHANGEMEdefault (Seg2); VS Code path-token endpoints (/api/v1/vscode/raw/[token]) emit a once-per-process security warning since the API key travels in the URL and can leak via logs/proxies (Seg4); the system version route resolves the real global install path vianpm root -ginstead of a hardcoded/app(Bug3); and auto-update mode detection segment-matchesnode_modulesinstead of substring-matching, eliminating false "global install" positives (Bug1). -
fix(cli): rename the Node process title to
omnirouteso it shows correctly in ps/htop. (thanks @waguriagentic) -
dashboard (model picker): guard against null model-alias values so opening Create Combo for a custom provider node no longer crashes.
ModelSelectModal's custom-provider branch filteredmodelAliasesentries with a rawfullModel.startsWith(...), which threw aTypeErrorwhenever an alias value wasnull/undefined(a stale/partial entry persisted to settings). The filter/map logic is extracted into a newbuildNodeAliasModelshelper (mirroring the sibling passthrough-alias guard, #485) that requirestypeof fullModel === "string"before calling.startsWith. Regression guard:tests/unit/model-select-null-alias-guard-2247.test.ts. (thanks @wahyuzero) -
fix(translator): strip orphaned tool results (results with no matching tool call) across request formats to avoid upstream 400s. (thanks @warelik)
-
fix(kiro): stop injecting a placeholder user turn on trailing tool-result turns so agentic loops aren't disrupted. (thanks @jetmiky)
-
fix(translator): prevent doubled tool arguments in OpenAI-to-Claude responses (duplicate finish_reason guard + string tool-input passthrough). (thanks @vishalrajv)
-
codex (agent goal streams): protect long-running agent goal streams so extended agent runs are no longer cut off prematurely. (#5772 — thanks @nguyenxvotanminh3)
-
sse (zero-width markers): strip zero-width markers from streamed responses, matching the non-streaming path so streamed output is byte-clean parity. (#5857 — thanks @DKotsyuba)
-
usage (om-usage endpoint): restore the
om-usageHTTP endpoint. (#5859 — thanks @Witroch4) -
sse (stream readiness): tune adaptive stream-readiness timeouts so slow-first-token upstreams are handled more reliably. (#5767 — thanks @nguyenxvotanminh3)
-
security (provider node URL): harden provider node URL validation. (#5760 — thanks @nguyenxvotanminh3)
-
cli (Windows doctor): correct
rootDirresolution indoctor.mjson Windows. (#5845 — thanks @arssnndr) -
providers (Antigravity): fix a 429 hang on credit exhaustion and apply a precise reset-time model lockout instead of stalling — cleaned re-implementation of #5823. (#5846 — thanks @Chewji9875 / @diegosouzapw)
-
providers (qwen-web): unblock the validator and chat completion — the retired endpoint is replaced and the missing SPA version header is now sent. (#5855 — thanks @janeza2)
-
providers (kimi-web): migrate to the
www.kimi.comConnect-RPC API afterkimi.moonshot.cnwas retired. (#5858 — thanks @janeza2) -
dashboard (CSRF): unify the dashboard CSRF origin fallback so dynamic/public origins validate correctly. (#5856 — thanks @rdself)
-
db (health check interval): preserve
healthCheckInterval=0across connection create/update instead of coercing it to a default. (#5822 — thanks @atomlong) -
sse (claude→codex streaming): stop the reasoning-summary drop and duplicated deltas on claude→codex streaming — reasoning snapshots are now synthesized in TRANSLATE mode and the sequence-number watermark is tracked per-stream (#5786). (#5832 — thanks @diegosouzapw)
-
deps (runtime): add the missing runtime dependencies
@toon-format/toonandsafe-regexso the published package resolves them at runtime. (#5771 — thanks @chirag127) -
system (Windows auto-update): route in-app auto-update
npmcalls through the win32 shell helper so updates run correctly on Windows (#5542). (#5797 — thanks @diegosouzapw) -
dashboard (validation badge): show a neutral badge for unsupported validation and make OAuth error messages clickable links (#5442, #5486). (#5795 — thanks @diegosouzapw)
-
providers (metadata): correct stale/broken provider metadata (#5487, #5461, #5534, #5470). (#5790 — thanks @diegosouzapw)
-
providers (local-catalog imports): import intentional local-catalog-only providers instead of surfacing a 502 (#5460, #5465). (#5787 — thanks @diegosouzapw)
-
proxyfetch (failover): skip the failover retry for non-replayable request bodies so a consumed stream isn't re-sent empty. (#5770 — thanks @Ardem2025)
-
batch (recovery): persist batch item checkpoints during recovery so an interrupted batch resumes from where it left off. (#5753 — thanks @ag-linden)
-
memory (Qdrant): enabling Qdrant now activates it as the retrieval engine (the
autodefault never selected it) and adds inline guidance (#5597). (#5741 — thanks @diegosouzapw) -
chat (non-streaming aggregation): harden non-streaming SSE aggregation against malformed upstream event sequences. (#5746 — thanks @rdself)
-
sse (cooldown parsing): the anti-thundering-herd guard now tolerates numeric-epoch cooldown values. (#5747 — thanks @diegosouzapw)
-
api (body size): raise the LLM API payload limit for the responses routes so larger requests aren't rejected. (#5652 — thanks @JxnLexn)
-
providers (HuggingChat): fix HuggingChat web-session routing (#5592). (#5592 — thanks @backryun)
-
sse (heap pressure): bound the chat hot-path heap — pressure-aware admission, response cap, and clone reductions — to avoid OOM under load (#5152). (#5425 — thanks @josevictorferreira)
-
providers (M365 Copilot): validate M365 Copilot web credentials. (#5432 — thanks @skyzea1)
-
providers (chatgpt-web): restore the dot-form Pro model ids. (#5549 — thanks @Thinkscape)
-
security (error stacks): avoid rendering error stacks in responses. (#5624 — thanks @KooshaPari)
-
security (linkify): restrict
linkifyTexthrefs to an explicithttp(s)scheme allowlist. (#948d2d7 — thanks @diegosouzapw) -
translator (doubled tool args): prevent doubled tool-call arguments in the OpenAI→Claude translation path. (#5828 — thanks @diegosouzapw)
-
translator (orphaned tool results): strip orphaned tool-result turns across request formats so an upstream doesn't reject a tool result with no matching call. (#5805 — thanks @diegosouzapw)
-
translator (Gemini/Claude hardening): re-apply lost defensive hardening for the Gemini merge path and Claude tool defaults. (#5706 — thanks @diegosouzapw)
-
kiro (tool-result turns): stop injecting a placeholder user turn on tool-result turns, which corrupted otherwise-valid Kiro conversations. (#5807 — thanks @diegosouzapw)
-
providers (Kiro catalog): add
claude-sonnet-5to the Kiro model catalog. (#5796 — thanks @diegosouzapw) -
oauth (connection disambiguation): disambiguate OAuth connections on username so two different identity providers no longer overwrite each other. (#5803 — thanks @diegosouzapw)
-
github (Copilot prefill): drop the trailing assistant prefill for Copilot chat, which some Copilot models rejected. (#5802 — thanks @diegosouzapw)
-
mitm (hosts cleanup): clean up privileged
/etc/hostsentries on exit when possible so a crashed/interrupted run doesn't leave stale redirects behind. (#5808 — thanks @diegosouzapw) -
dashboard (model picker): guard null
modelAliasesvalues in the model picker so a connection with no aliases no longer throws. (#5792 — thanks @diegosouzapw) -
dashboard (error boundaries): add error boundaries for the Combos and MITM Proxy pages so a render error no longer blanks the whole dashboard. (#5788 — thanks @diegosouzapw)
-
cli (process title): rename the running process title to
omniroute. (#5791 — thanks @diegosouzapw) -
compression (context-editing telemetry): record Context Editing telemetry on the streaming path, not just the non-streaming path. (#5761 — thanks @diegosouzapw)
-
security (v3.8.15 hardening follow-ups): land the Seg2/Seg3/Seg4/Bug3 hardening follow-ups from the v3.8.15 security review. (#5512 — thanks @diegosouzapw)
📝 Maintenance
-
docs (architecture): add
docs/architecture/ROUTER_BACKENDS.md— an ADR pinning down how the routing engines (tsnative,bifrost,cliproxy,9router, VibeProxy-compatible) relate to each other along two orthogonal axes (lifecycle: in-process / supervised / external vs. relay selection backend), answering the architecture questions raised in #5603 (backend interface model, why CLIProxy spawns a process, feature-flag swapping, actionable route-contract errors). The typed router-backend registry the ADR describes lands separately via #5868. (#5891) -
tests (autoCombo): stabilize the
getTaskFitnessWithSource identifies fitness_table as source for known modelsunit test, which flaked whenever the models.dev capabilities DB was populated in CI: the fixture modelgpt-4ois a real models.dev catalog id, so the fitness resolution chain returnedmodels_dev_tierinstead of the expected staticfitness_tablesource. The fixture now usesclaude-sonnet(a shortened alias absent from the models.dev catalog, matching the sibling resolution-chain test), which deterministically falls through to the static table — the exactsourceand score assertions are preserved (0.95=FITNESS_TABLE.coding["claude-sonnet"]). (#5890) — thanks @KooshaPari -
oauth (dead-code removal): delete the superseded legacy OAuth service-class hierarchy under
src/lib/oauth/services/. The live OAuth flow runs throughsrc/lib/oauth/providers.ts+src/lib/oauth/providers/(wired into the genericoauth/[provider]/[action]route); the old per-providerclass *Service extends OAuthServiceimplementations plus their barrel had zero production or test references. Removedoauth.ts(base class),openai.ts,github.ts,claude.ts,codex.ts,antigravity.ts,qwen.ts,qoder.ts, and theindex.tsbarrel (−1559 LOC). Kept the three still-live files that routes import directly by path:kiro.ts(Kiro import/exchange routes),cursor.ts(Cursor import route), andcodexImport.ts(utility fns for the Codex bulk-import route). Proven safe bytypecheck:corestaying green (any live reference would fail the build) + a filesystem guardtests/unit/oauth-legacy-services-removed.test.tspinning the removal against re-introduction. Salvage of the closed PR #5039. gaps v3.8.42 — T10 (5.7). -
refactor (god-file decomposition): extracted pure leaf modules across db, sse, usage, api, memory, evals, models, resilience, and dashboard god-files (types/mappers/helpers/pure-transform leaves; behavior-preserving, test-guarded): db/providers, db/proxies, db/models, db/settings, usageAnalytics, migrationRunner (#5714, #5717, #5705, #5709, #5722, #5721); sse openai-to-gemini / cursor-protobuf / rate-limit-headers / reasoning-tag (#5824, #5794, #5736, #5734); usage families / callLogs / usageHistory / providerLimits (#5782, #5725, #5728, #5730); api provider-models discovery / unified-catalog (#5758, #5699); memory retrieval scoring (#5733); evals golden-set suites (#5740); modelsDevSync transform layer (#5743); resilience settings split (#5745); dashboard sidebarVisibility split (#5683); executor shared-utility dedup + tests (#5720 — thanks @pizzav-xyz). — thanks @diegosouzapw
-
chore (Bun script runner): adopt Bun
1.3.10as a locked, allow-listed build/dev script runner for a small set of validated TS gate/generator scripts (Node stays the published runtime): locked runtime dependency, CI script-checks + validated-scripts run under Bun, and a bun-safe pack validator. (#5615, #5617, #5612, #5643 — thanks @KooshaPari; docs #5703 — thanks @diegosouzapw) -
docs (sync & housekeeping): i18n CHANGELOG mirror sync for the [3.8.43] section (#5789); MCP tool count synced to 95 + routing-strategy count (#5732); README faster/leaner install notes, refreshed metrics/badges, 17-strategy + Quota-Share listing, provider counts, and grammar fixes (#5713, #5738 — thanks @chirag127); security docs for banned-keyword/account-ban detection (#5756) and the full LOCAL_ONLY route set + GHSA advisory + audit path (#5748); relay backend-routing contract clarification (#5621 — thanks @KooshaPari); release-freeze scoped to
/generate-releaseonly (#5839);.editorconfigrepository standards (#5879 — thanks @shiva24082). — thanks @diegosouzapw -
test/ci (stabilization & ratchets): guard the tsx/esm→esbuild boot transform (#5773); align t3-web web-session metadata (#5835); repoint the sidebar quota-share placement scan (#5711); lightweight health probe for batch e2e (#5651 — thanks @KooshaPari); make release-green pre-flight gates visible + bounded (#5644); stabilize nightly-mutation (tap.testFiles drift guard + anti-flake eps) (#5682); close the QG v2 tail (#5681); normalize check route paths on Windows (#5613 — thanks @KooshaPari); pass
sonar.projectVersionto the SonarQube scan (#5880); plus strykertap.testFilesregistration, compression-studio smoke re-anchoring,rtk_discoverde-flake, and v3.8.43-cycle ratchet rebaselines (deadExports 225→227, complexity 1981→1982, cognitive-complexity 842→845, eslintWarnings 4121→4158→4199). — thanks @diegosouzapw -
refactor (oauth): remove dead legacy OAuth service classes. (#5838 — thanks @diegosouzapw)
🙌 Contributors
Thanks to everyone whose work landed in v3.8.43:
| Contributor | PRs / Issues |
|---|---|
| @ag-linden | #5753 |
| @Ardem2025 | #5770 |
| @arssnndr | #5845 |
| @atomlong | #5822 |
| @backryun | #5592 |
| @baslr | direct commit / report |
| @Chewji9875 | #5563, #5579, #5846 |
| @chirag127 | #5738, #5771 |
| @DKotsyuba | #5857 |
| @hartmark | #5834 |
| @ishatiwari21 | #5799 |
| @janeza2 | #5855, #5858 |
| @jetmiky | direct commit / report |
| @josevictorferreira | #5425 |
| @JxnLexn | #5652 |
| @KooshaPari | #5613, #5621, #5624, #5629, #5643, #5651, #5890 |
| @KunN-21 | direct commit / report |
| @manhdzzz | direct commit / report |
| @nguyenxvotanminh3 | #5760, #5767, #5772 |
| @noir017 | direct commit / report |
| @pizzav-xyz | #5720 |
| @rdself | #5746, #5856 |
| @shiva24082 | #5879 |
| @skyzea1 | #5432, #5701 |
| @Stazyu | #5557 |
| @Thinkscape | #5549 |
| @vishalrajv | direct commit / report |
| @voravitl | direct commit / report |
| @waguriagentic | direct commit / report |
| @wahyuzero | direct commit / report |
| @warelik | direct commit / report |
| @Witroch4 | #5731, #5859, #5863 |
| @diegosouzapw | maintainer — cycle reconciliation, release-close base-red fixes, god-file decomposition, compression/memory features |
What's Changed
- Correct grammatical errors in the README file. by @chirag127 in #5738
- Release v3.8.43 by @diegosouzapw in #5609
Full Changelog: v3.8.42...v3.8.43