github diegosouzapw/OmniRoute v3.8.43

4 hours ago

[3.8.43] — 2026-07-02

✨ New Features

  • usage (quota percentages + provider USD drilldown): @@om-usage and the HTTP usage endpoint now report personal API-key quotas as remaining percentages (USD amounts stay out of the command output), provider quota remaining is scaled by the configured quota cutoff so the protected reserve reads as 0% left, and the quota dashboard regains a provider USD cost drilldown (/api/usage/provider-window-costs + ProviderUsdCostModal, management-auth gated). Also honors observed provider quota resets: a same-resetAt reset (usage dropping back to the reset floor) is detected and preferred over stale recorded weekly events for provider USD windows and API-key USD quotas. New src/lib/usage/providerWindowCosts.ts. Regression guards: tests/unit/provider-window-costs.test.ts, tests/unit/internal-usage-command.test.ts, tests/unit/api-key-usage-limits.test.ts, tests/unit/lib/quota-reset-events.test.ts. Extracted from #5863 by @Witroch4.

  • dashboard (live WS behind reverse proxy): the live dashboard WebSocket can now be fronted by a reverse proxy or Cloudflare Tunnel via NEXT_PUBLIC_LIVE_WS_PUBLIC_URL (e.g. wss://ws.my-ai.com/live-ws). The URL is honored both at build time (env inlined into the bundle) and at runtime for prebuilt Docker/npm images: the /api/v1/ws?handshake=1 handshake now echoes a lazily-read live.publicUrl (only ws:///wss:// values are accepted; anything else is rejected to null), and useLiveDashboard resolves the URL from that handshake before connecting, falling back to the previous ws(s)://hostname:20129 default. Also documents LIVE_WS_ALLOWED_HOSTS and aligns the GitLab Duo OAuth scopes line in .env.example with the live config (ai_features read_user). Regression guard: tests/unit/live-ws-public-url.test.ts (5). (#5877 by @ianriizky)

  • providers (CLI profile auto-sync): opt-in toggles to auto-regenerate CLI tool profiles after a provider model sync. When enabled, a model-catalog change (re)writes that tool's profile files from the live catalog — Codex (~/.codex/*.config.toml) and now Claude Code (~/.claude/profiles/<name>/settings.json, via an extracted syncClaudeProfilesFromModels + a new claudeProfileAutoSync.ts mirroring the Codex path). Both are off by default and never touch the active/default CLI config; they are backed by the OMNIROUTE_AUTO_SYNC_CODEX_PROFILES / OMNIROUTE_AUTO_SYNC_CLAUDE_PROFILES feature flags (DB/dashboard override > env > default "false") and additionally gated behind the existing CLI_ALLOW_CONFIG_WRITES write-guard. A "CLI profile auto-sync" card on the CLI Code dashboard toggles each (moved from the providers dashboard in #5778 — thanks @rdself). Regression guards: tests/unit/claude-profile-auto-sync-gate.test.ts, tests/unit/codex-profile-auto-sync-gate.test.ts, tests/unit/cli/setup-claude.test.ts (follow-up to #5737).

  • cli (startup banner): the serve startup banner now prints the running OmniRoute version (v3.8.x) beneath the ASCII logo, so the active version is visible at a glance without a separate --version call. Regression guard: tests/unit/cli-serve-version-banner.test.ts. Thanks @chirag127 (#5752).

  • analytics (subscription cost): flat-rate providers now show $0 in cost analytics instead of an inflated per-token estimate. Subscription / coding-plan providers (every cookie-web provider — ChatGPT Web, grok-web, … — plus the dedicated Minimax Coding, Kimi Coding, GLM Coding, Alibaba Coding Plan, and Xiaomi MiMo plans) bill a flat fee, not per token, yet still carry per-token pricing rows used for estimates — so the analytics dashboard over-reported their cost. A new flat-rate classifier (src/lib/usage/flatRateProviders.ts) is consulted by the analytics surfaces (analytics route, usage stats, usage analytics) via an opt-in flatRateAsZero cost option, so those providers read $0 while budget / quota / routing keep estimating unchanged. Deliberately NOT zeroed: codex/cx (OmniRoute actively tracks Codex token cost — Fast-tier multipliers, GPT-5.x pricing — and Codex can be a metered account), byteplus (metered ModelArk), minimax-cn (metered China API). Regression guard: tests/unit/flat-rate-cost-5552.test.ts. (#5552)

  • mcp (RTK): expose the RTK tool-output learn/discover workflow as two new MCP tools so an agent can grow the RTK filter catalog without leaving the protocol. omniroute_rtk_discover analyzes recently captured raw tool output (discoverRepeatedNoise / suggestFilter) and returns candidate noise patterns plus a suggested filter; omniroute_rtk_learn lists the captured command samples (listRtkCommandSamples) and resolves a command to its RTK filter id (commandToId). Both are read-only (scope read:compression), wrap the existing RTK discovery primitives (no new logic in the engine), and log to the MCP audit trail. Regression guard: tests/unit/compression/rtk-mcp-tools.test.ts (4). gaps v3.8.42 — T07.

  • compression (LLM tier): add an opt-in, default-off LLM-tier compression engine (llm) that condenses the prose of non-system messages via a pluggable chat-completion backend. It mirrors the llmlingua engine's contract but is safe by construction: the default backend is a no-op pass-through (the engine never mutates the payload until an operator both enables it and wires a real backend via setLlmCompressorBackend()), it is not part of the default stacked pipeline, enabled defaults to false, fenced code blocks and system messages are never sent to the model, and every backend error fails open (the original segment/body is kept, never thrown). A minTokens floor skips small prompts. The real production backend is intentionally a VPS-validated follow-up (Hard Rule #18), exactly as the llmlingua worker backend is gated. New open-sse/services/compression/engines/llm/index.ts. Regression guard: tests/unit/compression/llm-compressor-engine.test.ts (8). gaps v3.8.42 — T05/C3.

  • memory (typed decay): add opt-in typed memory decay (TV6) so the conversational memory store stops accumulating stale episodic noise. Each injected memory now tracks an access_count + last_accessed_at (always-on, non-destructive telemetry; migration 111_memory_typed_decay), and an opt-in, default-off sweep (MEMORY_TYPED_DECAY_ENABLED, default false) deletes memories that are past a per-type TTL and not immune. Only episodic decays by default (30d, env-tunable); factual/procedural/semantic are immune, and any memory accessed >= 3 times earns access immunity (mirroring "guardrail/convention/decision never decay"). The decay clock re-bases on the last access, so used memories survive. Deletions reuse deleteMemory (SQLite + sqlite-vec + Qdrant stay in sync) and fail open; an optional periodic sweep is doubly opt-in (also needs MEMORY_TYPED_DECAY_SWEEP_INTERVAL>0). With the flag off nothing is ever deleted (Rule #20 spirit). New src/lib/memory/typedDecay.ts. Regression guard: tests/unit/memory/typed-decay.test.ts (15). gaps v3.8.42 — T10/TV6.

  • dashboard (combos): the named-combos editor now lets you drag to reorder the stacked-compression pipeline instead of only editing fixed-position steps. A new pure model (src/shared/components/compression/compressionPipelineModel.ts) owns add/remove/move/update with the engine→intensity invariant and a never-empty guarantee, and a @dnd-kit/sortable editor (CompressionPipelineEditor.tsx, matching the sidebar reorder pattern) replaces the inline list in CompressionCombosPageClient. Order persists through the existing combos endpoint. Regression guards: tests/unit/compression-pipeline-model.test.ts (11) + tests/unit/ui/compression-pipeline-editor.test.tsx (4). A dedicated tests/e2e/compression-studio.spec.ts (Tela A render + tab switch) closes the studios e2e gap the combo-live spec did not cover. gaps v3.8.42 — T06 + T03.

  • compression (pipeline): add an opt-in, default-off per-engine circuit-breaker to the stacked compression pipeline (T02). When an engine throws repeatedly across requests, its breaker opens and the stacked loops skip that engine (keeping the body verbatim for that step — fail-open) for a cooldown, then probe once (lazy half-open); success closes it, a failed probe re-opens it. This is distinct from the provider circuit-breaker (src/shared/utils/circuitBreaker.ts, provider-scoped + DB-persisted) — the new pipelineEngineBreaker.ts is engine-scoped, process-local, and adds zero DB/IO on the hot path. It composes with the existing per-request TV1 bail-out (which skips within a single request); the breaker adds cross-request memory. Default off (COMPRESSION_PIPELINE_BREAKER_ENABLED=false) → byte-identical to the pre-breaker pipeline (a throwing engine still propagates unless TV1 is separately enabled). Configurable per-call, per-CompressionConfig, or via env (_THRESHOLD/_COOLDOWN_MS). Regression guard: tests/unit/compression/pipeline-circuit-breaker.test.ts (9, incl. a throwing-engine integration); existing strategySelector/bail-out suites stay green. gaps v3.8.42 — T02 (2.2).

  • compression (CCR): the CCR retrieval-feedback (H8) is now graduated instead of a binary cliff. Previously a block retrieved >= 3 times was flagged do-not-compress and everything below that stayed fully compressible. Now each prior retrieval raises a block's effective minChars linearly (effectiveMinChars), so frequently-retrieved content is compressed progressively less; the >= 3 exclusion is preserved (as Infinity). The ramp is controlled by a retrievalRampFactor (default 2, per-combo config or COMPRESSION_CCR_RETRIEVAL_RAMP_FACTOR); 1 reproduces the exact legacy binary behavior. Per-(principal, hash) isolation is unchanged. Regression guard: tests/unit/compression/ccr-retrieval-ramp.test.ts (12); existing CCR suites (51) stay green. gaps v3.8.42 — T08/H8.

  • compression (cache-aware): add an opt-in, default-off usage-observed prefix freeze (H5). The cache-aware guard previously preserved the system prompt only for providers a static heuristic recognized as caching. It now also learns which system prompts actually recur: once a system prompt has been observed >= a threshold across requests, it is treated as a stable cacheable prefix and preserved from compression even for providers the static check misses — recovering prompt-cache hits that a prefix-compressing mode would otherwise bust. Content-addressed by a hash of the system prompt (OpenAI / Claude / Gemini shapes), in-memory + bounded, zero DB/IO; a "freeze" only preserves the prefix, so it never mutates a payload. Default OFF (COMPRESSION_PREFIX_FREEZE_ENABLED, threshold _THRESHOLD); respects the never preserve-mode (never freezes). New open-sse/services/compression/prefixFreeze.ts, wired into resolveCacheAwareConfig. Regression guard: tests/unit/compression/prefix-freeze.test.ts (10); 44 existing cache-aware / preserve-mode tests stay green. gaps v3.8.42 — T08/H5.

  • compression (read-lifecycle): add a new opt-in, default-off read-lifecycle engine (H7) that collapses stale/superseded file-Read tool results. In agentic conversations the same file is Read repeatedly; an earlier Read becomes stale once the same path is re-read (superseded by a newer view) or modified by a later Write/Edit. The engine replaces those earlier Read results with a short stub — keeping only the current (last, un-superseded) Read intact — recovering the tokens the model no longer needs. Unlike session-dedup (identical-content) or ccr (reversible markers), this is semantic + lossy, so it is opt-in (enabled defaults false). Conservative by construction: matches only well-known Read/Write tool names, compares exact paths, collapses a Read only when a strictly-later invocation touches the same path, and fail-opens on any unexpected shape. Supports both the Anthropic (tool_use/tool_result) and OpenAI (tool_calls + role:"tool") shapes. New open-sse/services/compression/engines/readLifecycle/index.ts. Regression guard: tests/unit/compression/read-lifecycle.test.ts (10). gaps v3.8.42 — T08/H7.

  • observability (correlation IDs): requests now carry a correlation id threaded through logs so a single request can be traced end-to-end across the pipeline. (#5834 — thanks @hartmark)

  • cli (startup banner — boot time): the serve ready banner now shows how long startup took, so slow-boot conditions are visible at a glance. (#5799 — thanks @ishatiwari21)

  • api (quota-policy bypass scope): add an opt-in API-key provider quota-policy bypass scope, so a designated key can be exempted from provider quota enforcement without disabling quotas globally. (#5731 — thanks @Witroch4)

  • providers (Ollama local): add a first-class Ollama local-provider card to the providers dashboard so the local LLM runtime can be configured like any other provider. (#5712 — thanks @diegosouzapw)

  • codex (fallback profiles): generate fallback CLI profiles for Codex-compatible models so compatible models get a usable profile automatically. (#5701 — thanks @skyzea1)

  • api (response-body validation + failover): add a configurable response-body validation step that can fail a target over to the next candidate when the upstream returns a structurally-invalid body (routing/#4985). (#5684 — thanks @diegosouzapw)

  • providers (SenseNova): complete the SenseNova free Token Plan — chat completions plus Text-to-Image (ported from 9router#2233). (#5679 — thanks @diegosouzapw)

  • db (self-correcting context windows): add self-correcting model context-window overrides so a model whose advertised context length is wrong is corrected automatically (models/#5004). (#5667 — thanks @diegosouzapw)

  • routing (latency strategy): optimize the latency routing strategy using observed per-target performance metrics for better candidate selection. (#5629 — thanks @KooshaPari)

  • compression (preserveSystemPrompt mode): add a preserveSystemPrompt mode enum (always | whenNoCache | never) with legacy back-compat, giving operators explicit control over when the system prompt is protected from compression (T05/C5). (#5653 — thanks @diegosouzapw)

  • commandCode (vision): add multimodal image support for Command Code vision models. (#5557 — thanks @Stazyu)

  • compression (read-lifecycle engine): T08/H7 (2.5) — an opt-in read-lifecycle engine that collapses superseded file reads so stale earlier reads of the same file are pruned from the context. (#5754 — thanks @diegosouzapw)

  • compression (usage-observed prefix freeze): T08/H5 (2.4) — opt-in prefix freeze driven by observed usage, keeping a stable cached prefix from being rewritten by downstream engines. (#5744 — thanks @diegosouzapw)

  • compression (CCR retrieval-feedback ramp): T08/H8 (2.3) — a graduated Context-Compression-Ratio retrieval-feedback ramp that tunes compression aggressiveness from retrieval signals. (#5739 — thanks @diegosouzapw)

  • compression (per-engine circuit breaker): T02 — an opt-in per-engine pipeline circuit-breaker that disables a misbehaving compression engine without failing the whole pipeline. (#5735 — thanks @diegosouzapw)

  • compression (LLM-tier engine): T05/C3 — an opt-in LLM-tier compression engine that uses a model pass for higher-ratio semantic compression. (#5702 — thanks @diegosouzapw)

  • dashboard (compression pipeline editor): T06/T03 — a drag-to-reorder compression pipeline editor plus a compression-studio e2e flow. (#5727 — thanks @diegosouzapw)

  • memory (typed decay): T10/TV6 — opt-in typed memory decay so aged, low-value memories fade on a per-type schedule. (#5723 — thanks @diegosouzapw)

  • mcp (RTK tools): T07 — expose the RTK learn/discover capabilities as first-class MCP tools. (#5691 — thanks @diegosouzapw)

  • providers (CLI profile auto-sync): opt-in CLI profile auto-sync toggles, including Claude Code auto-sync, so generated CLI profiles can track provider changes automatically. (#5755 — thanks @diegosouzapw)

🔧 Bug Fixes

  • fix(opencode): stop fabricating User-Agent: opencode/local and x-opencode-client: cli headers when the client sends none — the executor-dedup refactor (#5720) accidentally re-introduced header fabrication, violating the forward-only contract (inventing opencode-internal values risks upstream rejection). Restored to forward-only: those headers are emitted only when a real client source is present. Regression guard: tests/unit/opencode-executor.test.ts. (thanks @diegosouzapw)

  • fix(executors): resolveEffectiveKey returns undefined (not "") when no API key is present — a type-coercion cleanup (#5798) changed apiKey ?? "" to satisfy the typechecker, silently mutating auth-key resolution semantics. Widened the return type to string | undefined and reverted the coercion so OAuth-only credentials resolve correctly. Regression guard: tests/unit/refactor-buildHeaders-preamble.test.ts. (thanks @diegosouzapw)

  • fix(translator): restore the terminal message_delta + message_stop on Responses→Claude streams — the doubled-tool-args dedup (#5828) guarded the finish handler on the shared state.finishReason, which the openai-responses→openai leg sets first in the hub path, so the openai→claude leg dropped its terminal events and the stream ended after content_block_delta. The dedup now uses a dedicated state.claudeFinishEmitted flag. Regression guard: tests/unit/claude-code-rendering-fixes.test.ts. (thanks @diegosouzapw)

  • fix(pricing): add the Kiro claude-sonnet-5 pricing row so the newly-catalogued model (#5796) no longer reports $0.00 usage. Regression guard: tests/unit/catalog-updates-v3x.test.ts. (thanks @diegosouzapw)

  • fix(github): keep Copilot access-token sessions active. GitHub Copilot device-flow accounts can hold a GitHub access token plus a short-lived Copilot token without a refresh token; the proactive health check treated that as terminal no_refresh_token and marked the connection expired minutes after login. The health check now keeps those sessions active, clears stale no_refresh_token state, and refreshes the Copilot sub-token when needed. Regression guard: tests/unit/token-health-no-refresh-token-expired-5326.test.ts. Extracted from #5863 by @Witroch4.

  • fix(kiro): bound the Claude model-id dash→dot normalization to a 1–2 digit minor so date-suffixed ids (e.g. claude-opus-4-20250514) are no longer corrupted. (thanks @voravitl)

  • fix(usage): preserve (bounded) tool definitions in request logs even when the request body is truncated, so the request-details view can still show available tools. (thanks @noir017)

  • fix(providers): route OpenAI responses-only models to /v1/responses instead of 404ing on /v1/chat/completions. The curated gpt-5.5-pro / gpt-5.4-pro entries never worked (OpenAI only serves *-pro reasoning models via the Responses API), and "Test all models" surfaced the same 404s. The registry entries now carry targetFormat: "openai-responses" (reusing the existing per-model translation plumbing shared with gh/codex), DefaultExecutor.buildUrl swaps the openai endpoint to /responses in lockstep (honoring custom base URLs), and a -pro suffix heuristic covers dynamically-synced ids such as o1-pro / gpt-5.2-pro (same spirit as the gh executor's /codex/i routing, 9router#102). Legacy completions-only ids (e.g. gpt-3.5-turbo-instruct) are out of scope — they are not in the catalog and OmniRoute has no legacy /v1/completions upstream. Regression guard: tests/unit/openai-responses-only-models-5842.test.ts (8). Thanks @maikokan. (#5842)

  • fix(image): keep bare codex image aliases (e.g. gpt-5.5) resolving to the codex image pipeline even when a combo shares the same name. A chat combo named gpt-5.5 used to shadow the bare image alias in resolveImageRouteModel, hijacking /v1/images/* requests to a chat target (regression path adjacent to #5887); codex bare models are now reserved before bare-combo resolution, while non-codex aliases (e.g. gpt-image-2) remain user-shadowable (#3214/#3215 behavior preserved). Regression guard: tests/unit/image-routes-combo-edits-3214-3215.test.ts (9). (#5902 by @KooshaPari)

  • fix(ci): re-green the release/v3.8.43 fast-gates queue — every PR→release was inheriting base-reds (#5798). Five distinct blockers cleared: (1) stale modelContextOverrides entry in the check:db-rules intentionally-internal allowlist (#5827 allowlisted it while the #5609 fix re-exported it from localDb.ts; the re-export stays, the obsolete entry goes, classification guard re-pinned to 33); (2) LIVE_WS_ALLOWED_HOSTS / NEXT_PUBLIC_LIVE_WS_PUBLIC_URL documented in docs/reference/ENVIRONMENT.md (env/docs contract, from #5877); (3) the Router Backends ADR's references to the not-yet-merged registry (#5868) marked as landing-with-PR so check:fabricated-docs --strict passes; (4) antigravity-429-quota-tdd + middleware-header-strip-5849 added to stryker tap.testFiles (check:mutation-test-coverage); (5) file-size / complexity / cognitive-complexity ratchets rebaselined with justification notes — all drift measured identical on the pristine tip and this PR (net-zero). Regression guard: tests/unit/check-db-rules-classification.test.ts. (#5798)

  • providers (codex image auto-routing regression): an unprefixed gpt-5.5 request from a codex-only setup (no OpenAI connection) now correctly infers the codex provider again — the OpenAI static-catalog short-circuit in resolveModelByProviderInference was preempting the codex-preference block, so gpt-5.5 (added to the OpenAI catalog) stopped auto-routing to Codex image generation. Users with an active OpenAI connection are unaffected (OpenAI stays default). Regression guard: tests/unit/codex-gpt55-routing-5887.test.ts. (#5887)

  • api (proxy header hygiene): upstream x-middleware-* control headers (emitted by providers hosted behind Next.js, e.g. synthetic.new) are now stripped from proxied responses instead of forwarded verbatim — forwarding x-middleware-rewrite made Next 16 throw NextResponse.rewrite() was used in a app route handler and return 500 despite a successful upstream call. Applies to both streaming and JSON paths. Regression guard: tests/unit/middleware-header-strip-5849.test.ts. (#5849)

  • docs (pnpm global install): replaced the unsupported pnpm approve-builds -g step with the install-time pnpm add -g omniroute@latest --allow-build=better-sqlite3 flag across README + Setup Guide (and i18n mirrors), fixing native-build approval for pnpm v11 global installs. (#5554)

  • dashboard (token badge): the red "Token Expired" connection badge no longer flashes for OAuth refresh-capable providers (Antigravity/Gemini) whose access token merely lapsed but is auto-refreshed — it now shows only when the connection is terminally expired (testStatus === "expired"). Continuation of #5326. Regression guard: tests/unit/ui/connection-row-token-badge-5836.test.tsx. (#5836)

  • db (auto backup toggle): full pre-write SQLite backups now honor the persisted backup.autoBackupEnabled dashboard setting — previously only the DISABLE_SQLITE_AUTO_BACKUP env var was checked, so disabling auto-backup in the UI had no effect and ~70MB pre-write snapshots kept firing. Manual and pre-restore backups still always run. Regression guard: tests/unit/db-backup-autobackup-setting-5871.test.ts. (#5871)

  • providers (auto/ routing for custom providers): custom OpenAI-/Anthropic-compatible providers (dynamic *-compatible-* connection IDs) are no longer excluded from auto/ routing — the Auto-Combo virtual factory previously skipped any connection whose provider was absent from the static registry. It now falls back to the connection's defaultModel. Regression guard: tests/unit/auto-custom-provider-5873.test.ts. (#5873)

  • middleware (hook sandbox): operator-authored pre-request hook code now runs inside a hardened Node vm sandbox (minimal context, no ambient globals/process.env, execution timeout, no require) instead of new Function() in the main process — closing the Hard Rule #3 / SonarCloud S1523 exposure. Regression guard: tests/unit/middleware-hook-sandbox-5872.test.ts. (#5872)

  • mcp-server (auth forwarding): the per-caller MCP identity forwarded via withMcpHttpAuthContext now wins over the static OMNIROUTE_API_KEY env fallback in the internal-fetch helpers (apiFetch, omniRouteFetch) — previously the env key was spread after the forwarded headers and clobbered the caller's Authorization. Regression guard: open-sse/mcp-server/__tests__/httpAuthContext.test.ts. (#5819)

  • dashboard (Modal provider — two-field auth): the Modal provider connection form now exposes two fields — Token ID + Token Secret — instead of a single API-key input, since Modal authenticates with Authorization: Bearer <token-id>:<token-secret>. The dashboard combines the two fields into the id:secret credential before saving (combineModalCredential, trims both parts), while a value pasted in the legacy single-field format keeps working verbatim (empty secret → passthrough), so existing saved connections need no migration; the key-help link points at Modal's token settings. Regression guard: tests/unit/modal-credential-combine.test.ts (5). (#5881, closes #5446) Follow-up: the Validation Model Id field is now pre-filled for Modal with the same model the server-side validator probes (Qwen/Qwen3-4B-Thinking-2507-FP8, shared via MODAL_DEFAULT_VALIDATION_MODEL_ID in src/shared/constants/modal.ts), closing the last checklist item of #5446. Regression guard: tests/unit/modal-validation-model-prefill.test.ts.

  • api (chat completions — early SSE keepalive gate): the /v1/chat/completions route wrapped the response in the early-stream keepalive whenever stream was not explicitly false, so a client that omitted stream and asked for JSON (Accept: application/json) could receive premature SSE framing. The keepalive wrapper is now gated on an explicit stream: true in the body or an Accept header that forces SSE (acceptHeaderForcesStream); the parsed body is passed to the chat handler untouched, so the actual stream/JSON framing stays decided by chatCore/resolveStreamFlag — preserving OmniRoute's legacy streaming default when stream is omitted and the per-key streamDefaultMode: "json" opt-in. Regression guard: tests/unit/chat-combo-live-test.test.ts ("returns JSON without early SSE framing when stream is omitted and Accept is application/json"). (#5866 by @rdself)

  • fix(github): drop a trailing assistant prefill before dispatching to GitHub Copilot chat to avoid 400 errors. (thanks @baslr)

  • fix(oauth): prevent cross-IdP account overwrites by disambiguating OAuth connections on username when present, not email alone. (thanks @KunN-21)

  • fix(mitm): best-effort revert privileged /etc/hosts entries on exit when a sudo password is cached, instead of always leaving orphaned state. (thanks @manhdzzz)

  • providers (Kiro — Claude Sonnet 5): the Kiro provider's model catalog was missing claude-sonnet-5, so the model could not be selected or routed even on accounts that already had access to it ("claude-sonnet-5 is not supported"). Added the model to the Kiro registry (open-sse/config/providers/registry/kiro/index.ts) as a 1M-context / 128K-output Claude model, mirroring the existing Claude entries; the registry models[] feeds both the model selector and the live CodeWhisperer ListAvailableModels fallback, so the model is now selectable and routable. Regression guard: tests/unit/kiro-claude-sonnet-5-2267.test.ts. (thanks @openbioinfo)

  • settings (model aliases — self-heal after restart): the Settings → Routing page showed "No exact-match aliases configured" after a server restart even though the aliases were persisted in the DB. Aliases are held in a module-local _customAliases map in modelDeprecation.ts that the boot path hydrates, but Next.js compiles the app-route module graph separately from the startup graph (the same webpack chunk-splitting class as #5312), so the GET /api/settings/model-aliases handler read a different, un-hydrated copy. The handler now self-heals: when its in-memory alias map is empty it reads settings.modelAliases from the DB (via the existing getSettings() db module — no raw SQL in the route) and repopulates the map, so the UI reflects the persisted aliases on the first GET after a restart. Follow-up: the root cause is now also fixed — the _customAliases store in modelDeprecation.ts is backed by globalThis (key __omniroute_customAliases__), so the startup and app-route module graphs share one store and the route reads the boot-hydrated aliases directly (the DB self-heal remains as a harmless fallback), mirroring the same globalThis singleton pattern already applied to thinkingBudget.ts/backgroundTaskDetector.ts (#5312). Regression guards: tests/unit/model-aliases-settings-route-selfheal.test.ts + tests/unit/model-aliases-globalthis-5777.test.ts. (#5777 — thanks @jleonar2)

  • providers (grok-cli token auto-refresh): grok-cli OAuth tokens were never proactively refreshed before their real expiry. mapTokens hardcoded expiresIn: 21600 (6 h) regardless of the token's actual lifetime, so the persisted expiresAt was always "now + 6 h" and the proactive tokenHealthCheck sweep (refresh when expiresAt - now < 5 min) fired 6 h after import instead of shortly before the token really expired. mapTokens now computes expiresIn from the authoritative expires_at field in ~/.grok/auth.json (ISO → epoch-seconds) with a fallback to the JWT exp claim (payload-only decode, no signature trust); the hardcoded 21600 is kept only when neither is present. An already-expired token (real expires_at/exp in the past) is now clamped to a positive expiresIn via Math.max(1, …), so the import route stores a near-future expiresAt and AutoCombo refreshes the connection instead of reading a past date and excluding it outright. Regression guards: 5 cases in tests/unit/grok-cli-oauth.test.ts (JWT exp, JSON expires_at, the 21600 fallback, and the two expired-token clamps). (#5775 — thanks @Chewji9875)

  • compression (CCR retrieve via MCP HTTP): the omniroute_ccr_retrieve MCP tool returned "CCR block not found" for blocks stored earlier in the same session when called over the MCP HTTP transports (SSE / Streamable HTTP), e.g. from OpenCode in a Docker deployment. Compression stores each block keyed by the API-key principal (String(apiKeyInfo.id)), but the tool resolved the caller via extra.authInfo.clientId — which the MCP SDK never populates for API-key auth — so it fell back to "anonymous" and the compound store-key never matched. The retrieve tool now resolves the caller's API-key id from the MCP HTTP auth context (httpAuthContext) using the same getApiKeyMetadata lookup used at storage time, so retrieval matches storage. Cross-tenant IDOR isolation is preserved: a different key resolves to a different id → miss; no key → the anonymous bucket only. Regression guard: tests/unit/compression/ccr-mcp-principal-5649.test.ts (extraction, distinct-principal isolation, fail-closed, end-to-end store→retrieve). (#5649)

  • compression (context-editing telemetry): streaming responses now record Context Editing savings. Anthropic surfaces context_management.applied_edits[] on the final message_delta snapshot of an SSE stream, but the streaming reconstruction (buildStreamSummaryFromEvents → Claude branch) dropped context_management entirely and no telemetry hook was wired into the streaming finalizer — so the delegated server-side context-clear savings (cleared_input_tokens / cleared_tool_uses) surfaced under engine context-editing in compression analytics only for non-streaming responses. The collector now preserves context_management from the final snapshot (last-writer-wins), and onStreamComplete mirrors the non-streaming recordContextEditingTelemetryHook (best-effort, Claude-only, HTTP 200 only). Purely additive telemetry — no payload mutation, no new env flag, no behavior change when the stream carries no context_management. Regression guard: tests/unit/context-editing-streaming-telemetry.test.ts (3). gaps v3.8.42 — T01 (5.1).

  • proxy (relay test diagnostics): the Proxy Pool "Test" button showed a bare "failed" with nothing in the server logs when a relay (Vercel / Deno / Cloudflare) responded with a non-200 — e.g. a 401 from an auth-token mismatch after a STORAGE_ENCRYPTION_KEY rotation. The relay success-path response set success: false but carried no error field, so the dashboard had no reason to show and the server logged nothing. The test now returns an actionable error (the HTTP status, plus an auth/encryption-key hint on 401/403) and logs the failure server-side; the SOCKS5/HTTP proxy path now logs its failures too. Shaping extracted to buildRelayTestResult with a regression guard (tests/unit/proxy-relay-test-error-5716.test.ts). Note: this surfaces why a relay fails — it does not repair a genuinely broken/misconfigured relay. (#5716)

  • fix(dashboard): add error boundaries for the Combos and MITM Proxy pages so a render error shows a recoverable fallback instead of a blank page. (thanks @wahyuzero)

  • providers (onboarding wizard — unsupported validation): adding a provider whose credentials have no live validator (LMArena, PiAPI, …) failed silently in the Add-Provider wizard. The /api/providers/validate endpoint returns HTTP 400 + { unsupported: true } for these (#5565/#5567), but the wizard's validateOnboardingApiKey ran it through expectOk, which threw on the non-200 — so the flow jumped to the error step and the connection was never created. The wizard now treats unsupported: true as a non-blocking "can't verify" and proceeds to save, mirroring AddApiKeyModal. Regression guard added to tests/unit/provider-onboarding-wizard.test.ts. (related to #5692)

  • dashboard (Quick Start step 1): the Quick Start "Create API key" step told users to "Go to Endpoint → Registered Keys" and linked to /dashboard/endpoint, but API keys are created on the API Manager page (/dashboard/api-manager, sidebar "API Keys") — the Endpoint page has no "Registered Keys" section, so users followed the link and could not find where to create a key. Step 1 now reads "Go to API Keys" and links to /dashboard/api-manager. Regression guard: tests/unit/ui/quick-start-api-keys-link-5695.test.ts. (#5695)

  • providers (DashScope/Alibaba setup link): the "Get API key" link for the Alibaba and Alibaba (China) providers pointed at the bare API host (dashscope-intl.aliyuncs.com / dashscope.aliyuncs.com), which returns 404 in a browser — API hostnames have no homepage. Repointed to the consoles where keys are actually issued: bailian.console.alibabacloud.com (international) and dashscope.console.aliyun.com (China). Same class as #5572/#5574/#5576; regression guard added to tests/unit/provider-setup-links-5572.test.ts. (#5665)

  • thinking / runtime-config (module-graph fix): operator-configured proxy settings that are hydrated at boot but read per-request were silently ignored in production. Next.js compiles instrumentation.ts (boot hydration via applyRuntimeSettings / restore hooks) as a separate webpack module graph from the app-route / open-sse executors, so a module-local let _config singleton is duplicated — the boot copy is hydrated but the request path reads a different, un-hydrated copy. Live VPS validation proved the Thinking-Budget hydration ran to completion at boot yet base.ts still saw the passthrough default (this is why #5312 fix A stayed broken even after the boot-wiring fix). Fixed by backing the singletons with globalThis (the pattern systemPrompt.ts already uses for the Global System Prompt, #2470), so all module-graph copies share one instance: thinkingBudget.ts (the dashboard Thinking-Budget mode now reaches the executor), backgroundTaskDetector.ts (the opt-in background-model degradation now actually fires on requests), and systemTransforms.ts (operator pipeline overrides now reach the request path). payloadRules.ts was already safe (it lazily self-loads from the DB per request, #2986). Regression guards: tests/unit/thinking-budget-globalthis-5312.test.ts + tests/unit/runtime-config-globalthis-5312.test.ts (assert globalThis-backed sharing; a module-local let fails them). (#5312)

  • thinking (Claude OAuth): restore the proxy-level Thinking-Budget config on startup. The dashboard mode (auto/custom/adaptive) is persisted under settings.thinkingBudget, but the boot-time hydration (hydrateThinkingBudgetConfig) was only wired into src/server-init.ts — an unused module that never runs in production — so the operator's choice silently reverted to the passthrough default on every restart (#5312 fix A was non-functional, even though its direct unit test passed). The hydration now runs in the real boot path (src/instrumentation-node.ts), alongside the Global System Prompt restore. Surfaced by live Anthropic-OAuth validation on the VPS. Regression guard: tests/unit/thinking-budget-boot-wiring-5312.test.ts (asserts the production boot module calls the hydration, not just the function in isolation). (#5312)

  • translator/chatcore (hardening): re-apply two defensive review-fixes that were dropped in a branch rebuild before #5661 / #5662 landed. (1) mergeConsecutiveSameRoleContents (OpenAI→Gemini) now shallow-copies each entry and its parts array instead of pushing the input reference, so the consecutive-same-role merge never mutates the caller's objects. (2) defaultClaudeToolType (Claude tool defaults) now passes any non-object array entry (null / primitive) through unchanged instead of spreading it into a fabricated { type: "custom", … } tool. No behavior change on real payloads (Gemini contents are freshly built; Claude tools are always objects); both properties are now locked by regression tests in tests/unit/translator-gemini-consecutive-role-2191.test.ts and tests/unit/claude-tool-type-default-2195.test.ts.

  • providers (grok-cli): truncate the tool list when it exceeds a provider's hard limit, so grok-cli (cli-chat-proxy.grok.com, max 200 tools) no longer rejects requests with Maximum tools limit reached. Adds a proactive PROVIDER_TOOL_LIMITS map (grok-cli: 200, consulted before the reactive cache), a corrected limit-parsing regex that captures the stated maximum (200) instead of the supplied count (427), and removes the broken < MAX_TOOLS_LIMIT truncation gate so truncation now fires whenever tools.length exceeds the effective limit. Regression guard: tests/unit/tool-limit-detector.test.ts. (#5563 — thanks @Chewji9875)

  • resilience (antigravity): record model lockout for Antigravity 429 rate_limit_exceeded errors. Antigravity's "Resource has been exhausted (e.g. check quota)." text was matched by overly broad QUOTA_PATTERNS and misclassified as QUOTA_EXHAUSTED, so the combo retry path was skipped (providerExhausted) and the model was never cooled down. Classification now prefers the structured error code — classifyErrorText(structuredError?.code || errorText) — so a rate_limit_exceeded code is treated as a transient rate-limit (not quota), and the two broad patterns (/resource.*exhaust/i, /check.*quota/i) were replaced with Antigravity-specific ones (individual quota reached, enable overages). (#5579 — thanks @Chewji9875)

  • providers (OpenAI-compatible): Codex MCP / tool_search deferred discovery (and apply_patch) now works through a Custom OpenAI-compatible provider. When such a provider received a Responses-API-shaped request that carried MCP / tool_search tools, OmniRoute downgraded it to /chat/completions, which drops the deferred tool-discovery mechanism — so the MCP namespaces never surfaced to the model and apply_patch was mis-handled as a JSON tool. The executor now detects a Responses-shaped request (input / previous_response_id / max_output_tokens / reasoning) that carries namespace / tool_search* tools and routes it to the upstream /responses endpoint natively instead of downgrading (it can also be forced via providerSpecificData._omnirouteForceResponsesUpstream). This is a distinct code path from the official Codex OAuth backend (#3033 / #4539, which the earlier fix never touched). Regression guard: tests/unit/executor-default-base.test.ts. Thanks to @KooshaPari for the fix. (#5483)

  • dashboard (routing): selecting the fusion strategy on the Global Routing defaults tab now reveals fusion-specific config instead of only the generic resilience fields. Fusion's engine knobs — judgeModel (the model that synthesizes the panel answers) and fusionTuning (minPanel / stragglerGraceMs / panelHardTimeoutMs) — already existed in the schema and the per-combo editor, but the Global Routing tab never surfaced them, so picking "fusion" there was effectively a no-op. The fields are now shown (extracted into a new FusionDefaultsFields component). Voting / aggregation-mode / per-provider-weight are intentionally not shown — those don't exist in the fusion engine. Regression guard: tests/unit/ui/combo-defaults-fusion-5598.test.tsx. (#5598)

  • dashboard (free proxy pool): the free proxy pool "Sync All" no longer fails silently with Total: 0. Three fixes: (1) the IPLocate source fetched …/protocols/<proto>.json and parsed it as JSON, but the upstream list is plain text (<proto>.txt, one ip:port per line) — every protocol 404'd / failed to parse; it now fetches .txt and parses the line list. (2) The sync route isolates each source in its own try/catch, so one provider throwing (e.g. a TLS handshake failure) no longer aborts the whole sync — the working sources still populate the pool. (3) The UI now surfaces the per-source errors the route already returns, instead of discarding the response, so a partial/empty sync explains itself. Regression guards: tests/unit/free-proxy-providers.test.ts, tests/unit/proxy-pool-sync-4878.test.ts, tests/unit/free-pool-tab.test.tsx. (#5595)

  • dashboard (memory engine): the memory engine status page no longer mixes English and Portuguese. The embedding / vector-store / rerank status detail strings were hardcoded in Portuguese in the backend (resolveEmbeddingSource, engineStatus), e.g. auto: nenhuma fonte de embedding disponível and sqlite-vec ativo, dim=…, while the surrounding UI labels render from the English i18n bundle — so an English user saw a half-translated page. The backend detail strings are now English (auto: no embedding source available, sqlite-vec active, dim=…, etc.), matching the rest of the page. Regression guard: tests/unit/memory-engine-status.test.ts. (#5596)

  • providers (cline): stop falsely mapping valid Cline (OAuth) responses to 502 empty_choices + account cooldown. detectMalformedNonStream only recognized choices[].message.content as a string, but some OpenAI-compatible upstreams — Cline via OAuth among them — return content as an array of Anthropic-style text blocks inside an OpenAI envelope. A non-empty response (recvBytes > 0) was therefore classified as empty_choices and turned into a 502 that also cooled the account down. The malformed-response detector now also treats a content array carrying at least one non-empty text block as real output. Regression guard: tests/unit/diagnostics.test.ts. (#5559)

  • embedded services (Windows): fix CLIProxyAPI install failing instantly with spawn unzip ENOENT on Windows. The binary extractor spawned unzip, which is not a Windows system command — it only ships inside Git for Windows' usr/bin, a directory Node's spawn PATH never sees, so even users with Git installed hit the error. On Windows the extractor now uses PowerShell's built-in Expand-Archive (via execFileAsync, no shell — paths pass as a single non-interpreted arg, with ''-escaping + -LiteralPath as defense in depth); other platforms keep using unzip. This is distinct from #5379 (that was npm.cmd needing shell: true). Regression guard: tests/unit/binary-manager-extract-zip-5590.test.ts. (#5590)

  • storage (daemon): fix a Node.js out-of-memory crash on startup when storage.sqlite grows large (~170 MB+). The boot-time call-log cleanup (cleanupExpiredLogsrotateCallLogs) ran two unbounded SELECT … FROM call_logs … .all() queries — listReferencedArtifacts (every artifact path) and deleteCallLogsBefore (every id before the retention cutoff). node:sqlite's StatementSync.all() materializes the entire result set as JS objects at once, so on a large table the V8 heap blew up and the process crashed before binding (FATAL ERROR: … heap out of memory, native frame node::sqlite::StatementSync::All). Both queries now page through call_logs in bounded 5 000-row chunks (new src/lib/usage/callLogsBoundedQueries.ts), keeping peak memory flat regardless of table size — no more manual --max-old-space-size bump required. Regression guard: tests/unit/call-log-oom-unbounded-5618.test.ts. (#5618)

  • dashboard (provider setup): fix three provider setup links that pointed at 404 pages. Ollama Cloud / ollama-search linked to ollama.com/settings/api-keys → corrected to ollama.com/settings/keys (the page moved; Ollama Cloud is a real keyed service, so the field stays). SearchAPI linked to the bare searchapi.io/docs (404) → searchapi.io/docs/google. You.com linked to you.com/docs/search/overview (404) → you.com/business/api/ (the developer portal). All three replacements were verified live. Regression guard: tests/unit/provider-setup-links-5572.test.ts. (#5572, #5574, #5576)

  • providers (AI/ML API): the model-import step now loads the live AI/ML API catalog (400+ models) instead of falling back to a stale 6-model seed. The registry had no modelsUrl, so the route silently used the bundled catalog with an "API unavailable — using local catalog" warning even when the key was valid. AI/ML API exposes its full catalog at the public, auth-free https://api.aimlapi.com/models endpoint (a bare array of { id, type, info }, distinct from the OpenAI-compat /v1/models); it's now wired into the models route's discovery config, with the bundled catalog kept as the offline fallback. Regression guard: tests/unit/provider-models-route.test.ts. (#5570)

  • providers (CablyAI): mark CablyAI deprecated — cablyai.com no longer resolves (DNS NXDOMAIN, verified 2026-06-30); the domain is gone. The provider is removed from the models-route discovery config so the import step returns a clean error instead of an unhandled 500 crash (the dead-domain fetch threw with no local-catalog fallback), and the registry entry now carries deprecated: true / riskNoticeVariant: "deprecated" so the dashboard flags existing connections (same treatment as the shut-down glhf/kluster.ai gateways). Regression guard: tests/unit/provider-models-route.test.ts. (#5568)

  • dashboard (provider add): non-LLM search/agent providers no longer fail the model-import step with a red Provider <id> does not support models listing. Jules (Google Labs coding agent), linkup-search (Linkup web search), ollama-search (Ollama Cloud web search — distinct from the local Ollama LLM), and searchapi-search (SearchAPI SERP) have no /v1/models endpoint, so the import surfaced a failure for expected behavior. Each now ships a small static catalog of its selectable capability ids — Linkup's fast/standard/deep search depths, SearchAPI's google/bing/youtube/… engines, a single Jules/Ollama-web-search entry — so the import step returns a usable list (source: local_catalog) instead of an error. Regression guard: tests/unit/provider-models-route.test.ts. (#5569, #5571, #5573, #5575)

  • dashboard (provider add): providers without a live key/cookie validator (e.g. LMArena (Free), PiAPI) can now be saved. The Add-connection modal treated the backend's "Provider validation not supported" response as a hard Invalid state and blocked Save entirely, leaving those providers impossible to add. The validate route now returns unsupported: true alongside the message, and the modal treats that as a non-blocking warning — the "Check" badge still shows "validation not supported" (informational), but Save persists the credential as-is. Regression guards: tests/unit/ui/add-api-key-modal-unsupported-save-5565.test.tsx (Save proceeds) and tests/unit/providers-validate-route.test.ts (wire-format). (#5565, #5567)

  • providers (codex): fix the Codex Responses WebSocket path (/v1/responses), which regressed in v3.8.40 with a client-visible Invalid JSON body and bypassed the configured proxy. (1) #5591 — PR #5237 bumped the impersonation TLS profile to chrome_149, but wreq-js@2.3.1 only supports up to chrome_147; the unknown profile produced a degenerate fingerprint and ChatGPT rejected the upstream upgrade. The Codex WS path is reverted to the proven chrome_142 (the v3.8.39 value), and the over-bumped grok-web/claude-web profiles (masked by their circuit-breaker but silently dropping TLS impersonation) are restored to chrome_146. A new regression guard asserts every configured chrome_* profile exists in the installed wreq-js typings (tests/unit/tls-profiles-valid-5591.test.mjs). (2) #5611 — the upstream wreq-js.websocket() connect ignored the Proxy Registry, so a no-direct-egress Docker container failed with a DNS error; the prepare route now resolves the Global/provider proxy and threads it through to the WS connect. Regression guard in tests/unit/responses-ws-proxy.test.mjs. (#5591, #5611)

  • providers (GLM): GLM 5.1 / 5.2 now keep the system role instead of having the system prompt folded into the first user turn. roleNormalizer.ts matched every glm* id with a blanket startsWith("glm") / startsWith("glm-") prefix, so the next-generation models — which z.ai documents as supporting the system role (GLM > 5.0) — were normalized as if they rejected it, degrading instruction-following. The matcher is now version-aware: it strips the system role only for bare glm, the 4.x family, and the 5.0 generation, and preserves it for glm-5.1/glm-5.2 (and the Fireworks glm-5p1 point alias). The ZenMux vendor-prefixed z-ai/glm-* compressed-history rule and the ERNIE rule are unchanged. Regression guards in tests/unit/role-normalizer.test.ts. (#5610)

  • Security hardening follow-ups (v3.8.15): the auth_token cookie now sets an explicit 30-day maxAge so sessions persist as intended (Seg3); the management bootstrap warns at boot when INITIAL_PASSWORD is left at the insecure CHANGEME default (Seg2); VS Code path-token endpoints (/api/v1/vscode/raw/[token]) emit a once-per-process security warning since the API key travels in the URL and can leak via logs/proxies (Seg4); the system version route resolves the real global install path via npm root -g instead of a hardcoded /app (Bug3); and auto-update mode detection segment-matches node_modules instead of substring-matching, eliminating false "global install" positives (Bug1).

  • fix(cli): rename the Node process title to omniroute so it shows correctly in ps/htop. (thanks @waguriagentic)

  • dashboard (model picker): guard against null model-alias values so opening Create Combo for a custom provider node no longer crashes. ModelSelectModal's custom-provider branch filtered modelAliases entries with a raw fullModel.startsWith(...), which threw a TypeError whenever an alias value was null/undefined (a stale/partial entry persisted to settings). The filter/map logic is extracted into a new buildNodeAliasModels helper (mirroring the sibling passthrough-alias guard, #485) that requires typeof fullModel === "string" before calling .startsWith. Regression guard: tests/unit/model-select-null-alias-guard-2247.test.ts. (thanks @wahyuzero)

  • fix(translator): strip orphaned tool results (results with no matching tool call) across request formats to avoid upstream 400s. (thanks @warelik)

  • fix(kiro): stop injecting a placeholder user turn on trailing tool-result turns so agentic loops aren't disrupted. (thanks @jetmiky)

  • fix(translator): prevent doubled tool arguments in OpenAI-to-Claude responses (duplicate finish_reason guard + string tool-input passthrough). (thanks @vishalrajv)

  • codex (agent goal streams): protect long-running agent goal streams so extended agent runs are no longer cut off prematurely. (#5772 — thanks @nguyenxvotanminh3)

  • sse (zero-width markers): strip zero-width markers from streamed responses, matching the non-streaming path so streamed output is byte-clean parity. (#5857 — thanks @DKotsyuba)

  • usage (om-usage endpoint): restore the om-usage HTTP endpoint. (#5859 — thanks @Witroch4)

  • sse (stream readiness): tune adaptive stream-readiness timeouts so slow-first-token upstreams are handled more reliably. (#5767 — thanks @nguyenxvotanminh3)

  • security (provider node URL): harden provider node URL validation. (#5760 — thanks @nguyenxvotanminh3)

  • cli (Windows doctor): correct rootDir resolution in doctor.mjs on Windows. (#5845 — thanks @arssnndr)

  • providers (Antigravity): fix a 429 hang on credit exhaustion and apply a precise reset-time model lockout instead of stalling — cleaned re-implementation of #5823. (#5846 — thanks @Chewji9875 / @diegosouzapw)

  • providers (qwen-web): unblock the validator and chat completion — the retired endpoint is replaced and the missing SPA version header is now sent. (#5855 — thanks @janeza2)

  • providers (kimi-web): migrate to the www.kimi.com Connect-RPC API after kimi.moonshot.cn was retired. (#5858 — thanks @janeza2)

  • dashboard (CSRF): unify the dashboard CSRF origin fallback so dynamic/public origins validate correctly. (#5856 — thanks @rdself)

  • db (health check interval): preserve healthCheckInterval=0 across connection create/update instead of coercing it to a default. (#5822 — thanks @atomlong)

  • sse (claude→codex streaming): stop the reasoning-summary drop and duplicated deltas on claude→codex streaming — reasoning snapshots are now synthesized in TRANSLATE mode and the sequence-number watermark is tracked per-stream (#5786). (#5832 — thanks @diegosouzapw)

  • deps (runtime): add the missing runtime dependencies @toon-format/toon and safe-regex so the published package resolves them at runtime. (#5771 — thanks @chirag127)

  • system (Windows auto-update): route in-app auto-update npm calls through the win32 shell helper so updates run correctly on Windows (#5542). (#5797 — thanks @diegosouzapw)

  • dashboard (validation badge): show a neutral badge for unsupported validation and make OAuth error messages clickable links (#5442, #5486). (#5795 — thanks @diegosouzapw)

  • providers (metadata): correct stale/broken provider metadata (#5487, #5461, #5534, #5470). (#5790 — thanks @diegosouzapw)

  • providers (local-catalog imports): import intentional local-catalog-only providers instead of surfacing a 502 (#5460, #5465). (#5787 — thanks @diegosouzapw)

  • proxyfetch (failover): skip the failover retry for non-replayable request bodies so a consumed stream isn't re-sent empty. (#5770 — thanks @Ardem2025)

  • batch (recovery): persist batch item checkpoints during recovery so an interrupted batch resumes from where it left off. (#5753 — thanks @ag-linden)

  • memory (Qdrant): enabling Qdrant now activates it as the retrieval engine (the auto default never selected it) and adds inline guidance (#5597). (#5741 — thanks @diegosouzapw)

  • chat (non-streaming aggregation): harden non-streaming SSE aggregation against malformed upstream event sequences. (#5746 — thanks @rdself)

  • sse (cooldown parsing): the anti-thundering-herd guard now tolerates numeric-epoch cooldown values. (#5747 — thanks @diegosouzapw)

  • api (body size): raise the LLM API payload limit for the responses routes so larger requests aren't rejected. (#5652 — thanks @JxnLexn)

  • providers (HuggingChat): fix HuggingChat web-session routing (#5592). (#5592 — thanks @backryun)

  • sse (heap pressure): bound the chat hot-path heap — pressure-aware admission, response cap, and clone reductions — to avoid OOM under load (#5152). (#5425 — thanks @josevictorferreira)

  • providers (M365 Copilot): validate M365 Copilot web credentials. (#5432 — thanks @skyzea1)

  • providers (chatgpt-web): restore the dot-form Pro model ids. (#5549 — thanks @Thinkscape)

  • security (error stacks): avoid rendering error stacks in responses. (#5624 — thanks @KooshaPari)

  • security (linkify): restrict linkifyText hrefs to an explicit http(s) scheme allowlist. (#948d2d7 — thanks @diegosouzapw)

  • translator (doubled tool args): prevent doubled tool-call arguments in the OpenAI→Claude translation path. (#5828 — thanks @diegosouzapw)

  • translator (orphaned tool results): strip orphaned tool-result turns across request formats so an upstream doesn't reject a tool result with no matching call. (#5805 — thanks @diegosouzapw)

  • translator (Gemini/Claude hardening): re-apply lost defensive hardening for the Gemini merge path and Claude tool defaults. (#5706 — thanks @diegosouzapw)

  • kiro (tool-result turns): stop injecting a placeholder user turn on tool-result turns, which corrupted otherwise-valid Kiro conversations. (#5807 — thanks @diegosouzapw)

  • providers (Kiro catalog): add claude-sonnet-5 to the Kiro model catalog. (#5796 — thanks @diegosouzapw)

  • oauth (connection disambiguation): disambiguate OAuth connections on username so two different identity providers no longer overwrite each other. (#5803 — thanks @diegosouzapw)

  • github (Copilot prefill): drop the trailing assistant prefill for Copilot chat, which some Copilot models rejected. (#5802 — thanks @diegosouzapw)

  • mitm (hosts cleanup): clean up privileged /etc/hosts entries on exit when possible so a crashed/interrupted run doesn't leave stale redirects behind. (#5808 — thanks @diegosouzapw)

  • dashboard (model picker): guard null modelAliases values in the model picker so a connection with no aliases no longer throws. (#5792 — thanks @diegosouzapw)

  • dashboard (error boundaries): add error boundaries for the Combos and MITM Proxy pages so a render error no longer blanks the whole dashboard. (#5788 — thanks @diegosouzapw)

  • cli (process title): rename the running process title to omniroute. (#5791 — thanks @diegosouzapw)

  • compression (context-editing telemetry): record Context Editing telemetry on the streaming path, not just the non-streaming path. (#5761 — thanks @diegosouzapw)

  • security (v3.8.15 hardening follow-ups): land the Seg2/Seg3/Seg4/Bug3 hardening follow-ups from the v3.8.15 security review. (#5512 — thanks @diegosouzapw)

📝 Maintenance

  • docs (architecture): add docs/architecture/ROUTER_BACKENDS.md — an ADR pinning down how the routing engines (ts native, bifrost, cliproxy, 9router, VibeProxy-compatible) relate to each other along two orthogonal axes (lifecycle: in-process / supervised / external vs. relay selection backend), answering the architecture questions raised in #5603 (backend interface model, why CLIProxy spawns a process, feature-flag swapping, actionable route-contract errors). The typed router-backend registry the ADR describes lands separately via #5868. (#5891)

  • tests (autoCombo): stabilize the getTaskFitnessWithSource identifies fitness_table as source for known models unit test, which flaked whenever the models.dev capabilities DB was populated in CI: the fixture model gpt-4o is a real models.dev catalog id, so the fitness resolution chain returned models_dev_tier instead of the expected static fitness_table source. The fixture now uses claude-sonnet (a shortened alias absent from the models.dev catalog, matching the sibling resolution-chain test), which deterministically falls through to the static table — the exact source and score assertions are preserved (0.95 = FITNESS_TABLE.coding["claude-sonnet"]). (#5890) — thanks @KooshaPari

  • oauth (dead-code removal): delete the superseded legacy OAuth service-class hierarchy under src/lib/oauth/services/. The live OAuth flow runs through src/lib/oauth/providers.ts + src/lib/oauth/providers/ (wired into the generic oauth/[provider]/[action] route); the old per-provider class *Service extends OAuthService implementations plus their barrel had zero production or test references. Removed oauth.ts (base class), openai.ts, github.ts, claude.ts, codex.ts, antigravity.ts, qwen.ts, qoder.ts, and the index.ts barrel (−1559 LOC). Kept the three still-live files that routes import directly by path: kiro.ts (Kiro import/exchange routes), cursor.ts (Cursor import route), and codexImport.ts (utility fns for the Codex bulk-import route). Proven safe by typecheck:core staying green (any live reference would fail the build) + a filesystem guard tests/unit/oauth-legacy-services-removed.test.ts pinning the removal against re-introduction. Salvage of the closed PR #5039. gaps v3.8.42 — T10 (5.7).

  • refactor (god-file decomposition): extracted pure leaf modules across db, sse, usage, api, memory, evals, models, resilience, and dashboard god-files (types/mappers/helpers/pure-transform leaves; behavior-preserving, test-guarded): db/providers, db/proxies, db/models, db/settings, usageAnalytics, migrationRunner (#5714, #5717, #5705, #5709, #5722, #5721); sse openai-to-gemini / cursor-protobuf / rate-limit-headers / reasoning-tag (#5824, #5794, #5736, #5734); usage families / callLogs / usageHistory / providerLimits (#5782, #5725, #5728, #5730); api provider-models discovery / unified-catalog (#5758, #5699); memory retrieval scoring (#5733); evals golden-set suites (#5740); modelsDevSync transform layer (#5743); resilience settings split (#5745); dashboard sidebarVisibility split (#5683); executor shared-utility dedup + tests (#5720 — thanks @pizzav-xyz). — thanks @diegosouzapw

  • chore (Bun script runner): adopt Bun 1.3.10 as a locked, allow-listed build/dev script runner for a small set of validated TS gate/generator scripts (Node stays the published runtime): locked runtime dependency, CI script-checks + validated-scripts run under Bun, and a bun-safe pack validator. (#5615, #5617, #5612, #5643 — thanks @KooshaPari; docs #5703 — thanks @diegosouzapw)

  • docs (sync & housekeeping): i18n CHANGELOG mirror sync for the [3.8.43] section (#5789); MCP tool count synced to 95 + routing-strategy count (#5732); README faster/leaner install notes, refreshed metrics/badges, 17-strategy + Quota-Share listing, provider counts, and grammar fixes (#5713, #5738 — thanks @chirag127); security docs for banned-keyword/account-ban detection (#5756) and the full LOCAL_ONLY route set + GHSA advisory + audit path (#5748); relay backend-routing contract clarification (#5621 — thanks @KooshaPari); release-freeze scoped to /generate-release only (#5839); .editorconfig repository standards (#5879 — thanks @shiva24082). — thanks @diegosouzapw

  • test/ci (stabilization & ratchets): guard the tsx/esm→esbuild boot transform (#5773); align t3-web web-session metadata (#5835); repoint the sidebar quota-share placement scan (#5711); lightweight health probe for batch e2e (#5651 — thanks @KooshaPari); make release-green pre-flight gates visible + bounded (#5644); stabilize nightly-mutation (tap.testFiles drift guard + anti-flake eps) (#5682); close the QG v2 tail (#5681); normalize check route paths on Windows (#5613 — thanks @KooshaPari); pass sonar.projectVersion to the SonarQube scan (#5880); plus stryker tap.testFiles registration, compression-studio smoke re-anchoring, rtk_discover de-flake, and v3.8.43-cycle ratchet rebaselines (deadExports 225→227, complexity 1981→1982, cognitive-complexity 842→845, eslintWarnings 4121→4158→4199). — thanks @diegosouzapw

  • refactor (oauth): remove dead legacy OAuth service classes. (#5838 — thanks @diegosouzapw)

🙌 Contributors

Thanks to everyone whose work landed in v3.8.43:

Contributor PRs / Issues
@ag-linden #5753
@Ardem2025 #5770
@arssnndr #5845
@atomlong #5822
@backryun #5592
@baslr direct commit / report
@Chewji9875 #5563, #5579, #5846
@chirag127 #5738, #5771
@DKotsyuba #5857
@hartmark #5834
@ishatiwari21 #5799
@janeza2 #5855, #5858
@jetmiky direct commit / report
@josevictorferreira #5425
@JxnLexn #5652
@KooshaPari #5613, #5621, #5624, #5629, #5643, #5651, #5890
@KunN-21 direct commit / report
@manhdzzz direct commit / report
@nguyenxvotanminh3 #5760, #5767, #5772
@noir017 direct commit / report
@pizzav-xyz #5720
@rdself #5746, #5856
@shiva24082 #5879
@skyzea1 #5432, #5701
@Stazyu #5557
@Thinkscape #5549
@vishalrajv direct commit / report
@voravitl direct commit / report
@waguriagentic direct commit / report
@wahyuzero direct commit / report
@warelik direct commit / report
@Witroch4 #5731, #5859, #5863
@diegosouzapw maintainer — cycle reconciliation, release-close base-red fixes, god-file decomposition, compression/memory features

What's Changed

Full Changelog: v3.8.42...v3.8.43

Don't miss a new OmniRoute release

NewReleases is sending notifications on new releases.