diegosouzapw/OmniRoute v3.8.32 on GitHub

✨ New Features

feat(oauth): import accounts from CLIProxyAPI — Settings → CLIProxyAPI now has an "Import accounts" button that reads the OAuth accounts CLIProxyAPI already saved in ~/.cli-proxy-api/ and imports them as OmniRoute connections, so you don't have to log into every account individually. CLIProxyAPI's unified auth-file format is parsed by type discriminator and the supported account types (Gemini, Codex, Claude/Anthropic, Antigravity, Qwen, Kimi) are upserted; unknown types are skipped. The preview never exposes tokens to the client. (thanks @powellnorma)
feat(routing): opt-in setting to echo the requested alias/combo name in the response model field — Settings → Routing now has an "Echo requested model name in responses" toggle (default off). When enabled, the response model field (non-streaming and every streamed SSE chunk) reports the alias or combo name the client requested instead of the upstream model name, so strict clients such as Claude Desktop — which reject a response whose model does not match the request with a 401 — work with aliases and combos. (thanks @thaiphuong1202)
feat(providers): expand the openai and gemini direct registries with first-class variants already known elsewhere — the openai provider entry now exposes gpt-4.1-mini, gpt-4.1-nano, o3-mini, and o4-mini (the latter two carry REASONING_UNSUPPORTED like o3), and the gemini entry now exposes gemini-2.0-flash-lite and gemini-3-flash-lite-preview. These models were already first-class throughout sibling subsystems (cost estimator, task fitness, free-model catalog, multiple aggregator registries) but happened to be missing from the direct openai/gemini namespaces. Embedding/TTS/image-gen models stay in their dedicated registries (embeddingRegistry.ts, audioRegistry.ts, imageRegistry.ts); legacy ids OmniRoute curated out (o1, gpt-4-turbo, …) are not restored. (thanks @East-rayyy)
feat(translator): OpenAI SSE → Gemini SSE conversion for /v1beta/models/{model}:streamGenerateContent — the @google/genai SDK (Gemini CLI) always calls :streamGenerateContent?alt=sse for chat and expects Gemini SSE chunks (no [DONE] sentinel — the stream just closes). The v1beta route was forwarding OpenAI SSE from handleChat unchanged, so the SDK crashed on the OpenAI [DONE] line with SyntaxError: Unexpected token 'D', "[DONE]" is not valid JSON. A new transformOpenAISSEToGeminiSSE() (in open-sse/translator/response/openai-to-gemini-sse.ts) rewrites each OpenAI delta into candidates[].content.parts[], maps finish_reason → finishReason (STOP / MAX_TOKENS / SAFETY), attaches usageMetadata + modelVersion on the final chunk, and surfaces reasoning_content as { thought: true } parts for thinking models. The non-streaming :generateContent action gets a sibling convertOpenAIResponseToGemini() for the JSON path. Streaming intent is now keyed off the URL action suffix (canonical Gemini convention) rather than the non-standard generationConfig.stream body field. (thanks @SteelMorgan)
feat(compression): unified compression configuration panel (Phase 1) — /dashboard/context/settings is now the single source of truth for compression: a master toggle plus per-engine on/off and level controls, with the dispatch pipeline derived from a stored engines map on CompressionConfig. A gate (enginesExplicit) ensures the new map only drives dispatch when an engines row was actually saved from the panel, so legacy/backfilled installs (the seeded default combo from migrations 042/043) keep their existing defaultMode behavior unchanged. The default-combo and per-engine routes are shimmed (410). (#4432 — thanks @diegosouzapw)
feat(mcp): register the web-session pool observability tools — the poolTools MCP tool set (web-session pool stats/health) was defined but never wired into createMcpServer(), so it was dead. It is now registered in server.ts with withScopeEnforcement against the typed read:health / write:resilience scopes (no enum inflation), giving MCP clients visibility into the pooled web-session lifecycle. (#4399, #3368 — thanks @diegosouzapw)
feat(providers): stronger no-auth and web-cookie provider validation (AUTH_007) — provider connection validation now handles no-auth and web-cookie providers explicitly: instead of returning a generic "Provider validation not supported", these providers report a precise AUTH_007 status so the dashboard surfaces actionable validation feedback for cookie/no-auth flows. (#4023 — thanks @oyi77)
feat(combo): per-combo stickyRoundRobinLimit override on the combos page — the round-robin sticky-affinity limit can now be set per combo from the combos page UI, overriding the global default, so a combo can pin (or loosen) how many consecutive requests stick to the same round-robin member independently of the others. (#4472 — thanks @adivekar-utexas)
feat(usage): quota fetch for kimi-coding-apikey — usage/quota tracking now supports the kimi-coding-apikey provider, so its remaining quota is fetched and surfaced like the other quota-aware providers. (#4435 — thanks @janeza2)
feat(cluster): opt-in memory + Bifrost cluster profiles — adds opt-in cluster profiles that wire the memory subsystem and the Bifrost Go sidecar into a clustered deployment (follow-up to #3932). (#4433 — thanks @KooshaPari)
feat(models): opt-in low-noise /v1/models catalog mode — a new opt-in mode trims the /v1/models response to a quieter, lower-noise catalog for clients that choke on or don't need the full provider/model list. (#4427 — thanks @Rahulsharma0810)

🐛 Fixed

fix(embeddings): forward output dimensions to Gemini for consistent embedding dims. (thanks @nguyenha935)
fix(translator): sanitize Read tool args from non-Anthropic models to prevent retry loops. (thanks @GodrezJr2)
fix(usage): reuse Gemini CLI project ID for quota checks (avoid re-discovery). (thanks @Delcado19)
fix(dashboard): surface manual config CTA when Claude CLI detection fails (remote deployments). (thanks @anuragg-saxenaa)
fix(executors): granular reasoning_effort handling for Claude models on GitHub Copilot. (thanks @baslr)
fix(translator): strip Claude output_config before MiniMax (rejected upstream). (thanks @hiepau1231)
fix(translator): OpenAI audio input now reaches Gemini/Antigravity instead of being silently dropped — input_audio/audio content parts on the OpenAI→Gemini path matched no handler in convertOpenAIContentToParts and were discarded with no error. They are now mapped to a Gemini inlineData part with an audio/<format> mime type (wav, mp3, …). (thanks @mugnimaestra)
fix(combo): model lockout now honors a long upstream quota reset instead of retrying within minutes — when a combo target returned a quota error carrying an explicit long reset (e.g. Antigravity Resets in 160h27m24s, a Retry-After header), the per-model lockout capped at the short base cooldown (~minutes) and discarded the parsed reset, so the exhausted model kept being retried far too early. The lockout now applies the parsed reset when it exceeds the base cooldown, and the Antigravity error-message parser also matches the plural Resets in … phrasing. (thanks @Ansh7473)
fix(antigravity): Claude models no longer 400 with Unknown name "output_config" — Anthropic/Claude-Code-only fields (output_config, legacy output_format) leaked into the Google Cloud Code request envelope via its top-level field passthrough, and Google rejects unknown envelope fields with 400 Invalid JSON payload received. Unknown name "output_config" — breaking every Claude model served through Antigravity in IDEs. Those fields are now dropped before the envelope is built. (thanks @Duongkhanhtool)
fix(combo): round-robin members fail over faster under concurrency saturation via a configurable queue depth — when a round-robin combo member was saturated, requests sat in the per-model semaphore's unbounded queue and only failed over to the next member after the full queueTimeoutMs (default 30s) elapsed — so a burst of agentic requests deep-queued one hot member instead of spilling to healthy ones. The per-model semaphore now accepts a bounded queue depth and emits SEMAPHORE_QUEUE_FULL once it is full (the round-robin loop already cascades on that code), so a configured low depth fails over immediately. A new queueDepth combo-config knob (global default / provider override / per-combo, default 20 for backward compatibility; 0 = never queue → fail over now) is exposed in Settings → Combo Defaults. (#3872 — thanks @KooshaPari)
fix(pricing): align Claude Code (cc) pricing with current Anthropic per-MTok rates — the cc provider block in the default pricing table had stale numbers across every Claude 4.x family entry — most visibly, claude-opus-4-5-20251101 was billed at the deprecated Opus 4.1 rate (input $15 / output $75), and claude-haiku-4-5-20251001 was at half the current Haiku 4.5 rate. The cached (cache hit) and cache_creation (5-minute cache write) multipliers were also off across Opus 4.6/4.7/4.8, Sonnet 4.5/4.6, Haiku 4.5, and Fable 5. All eight entries now match the rates Anthropic publishes (input, 5m cache write at 1.25x input, cache hit at 0.1x input, output; reasoning billed at the output rate), so cost accounting on the dashboard and per-request usage events stop under- or over-reporting Claude Code spend. (thanks @chulanpro5)
fix(executors): sanitize Anthropic-shape content parts before GitHub Copilot /chat/completions — Claude models on GitHub Copilot driven from clients like Cursor IDE (e.g. gh/claude-sonnet-4.6) failed with Provider returned error: type has to be either 'image_url' or 'text' (reset after 30s) because the client passed through Anthropic-shape content parts (tool_use, tool_result, thinking) untouched, and the Copilot chat-completions endpoint only accepts text/image_url. GithubExecutor.transformRequest now serializes any unsupported part type as text (preserving the model's context), drops empty parts, and collapses to null when an assistant message's only content was tool_calls — tool_calls ride alongside untouched. Codex-family models still route through /responses unchanged. (thanks @cngznNN)
fix(sse): refactor stall detection to reduce false positives on slow but progressing streams. (thanks @zakirkun)
fix(executors): synthesize x-opencode-request for custom-named OpenCode providers — the OpenCode CLI only emits the x-opencode-* header set when the provider id starts with opencode; a custom-named provider (e.g. omniroute) instead sends x-session-affinity / x-session-id (mapped to x-opencode-session since #4022) but no request-correlation id, so x-opencode-request was silently dropped. OpencodeExecutor now synthesizes a fresh x-opencode-request on that session-affinity fallback path so custom-named providers are not disadvantaged on the opencode.ai upstream. x-opencode-client / x-opencode-project are intentionally not fabricated (no valid client source — an invented value risks upstream rejection) and remain forward-only; DefaultExecutor is untouched. (#4465 — thanks @pizzav-xyz)
fix(compression): RTK now compresses Anthropic-shape tool_result blocks — applyRtkCompression only compressed OpenAI-shape tool results (role:"tool"); Anthropic-shape tool results (tool_result content blocks inside a role:"user" message) were skipped, so coding agents speaking the Anthropic Messages format got zero RTK savings even though RTK's command-aware filters (e.g. git-status) would have compressed the output. RTK now treats a message containing a tool_result block as eligible (gated by applyToToolResults), captures Anthropic tool_use blocks for command resolution, and compresses each block's inner text (string or nested text-block array) while preserving type + tool_use_id exactly — matching what caveman/aggressive already did. (#4468 — thanks @diegosouzapw)
fix(dashboard): request-log auto-refresh no longer dies from a "ghost" load-more on first page load — the request-log viewer's infinite-scroll IntersectionObserver uses a 200px rootMargin, so its sentinel was already intersecting on mount whenever the first page didn't fill the scroll container. That fired a loadMore() with no user interaction, growing the window past PAGE_SIZE — and auto-refresh only polls while on the first page (limit <= pageSize), so it stayed permanently paused (only a manual filter change re-armed it). The observer now grows the window only after a genuine user scroll (new pure shouldTriggerInfiniteScroll guard), and a filter change re-arms the guard, so the default first-page view resumes its ~10s auto-refresh. (#4269 — thanks @tjengbudi)
fix(sse): large /v1/chat/completions requests no longer crash the server with a Node heap OOM — the chat request body was parsed multiple times along the route (route guard, injection guard, handler), buffering very large payloads several times and pushing concurrent agentic traffic into an out-of-memory crash. The body is now parsed once at the route guard and threaded through, so each request is buffered a single time. (#4380 — thanks @NakHalal)
fix(guardrails): tighten the system_prompt_leak heuristic to stop false positives on agent traffic — the leak detector flagged normal agent/tool conversations as prompt-leak attempts; it now requires an additional qualifier before flagging, so legitimate agent traffic is no longer blocked. (#4041 — thanks @KooshaPari)
fix(translator): drop orphan tool results on the Claude→OpenAI request path — a tool_result with no preceding matching tool_use (orphan) produced upstream 500/502 errors for Command Code / Custom OpenAI clients on ≥3.8.26. Orphan tool results are now filtered before the request is sent. (#4385 — thanks @adityapnusantara)
fix(providers): register API-key validators for Firecrawl and Jina Reader — both providers returned "Provider validation not supported" when validating their API key; they now have proper validators registered in SEARCH_VALIDATOR_CONFIGS. (#4401 — thanks @ponkcore)
fix(providers): generic web-cookie validator must not shadow per-provider validators — a follow-up to the AUTH_007 validation work (#4023): the generic web-cookie validator was matching before more specific per-provider validators, so provider-specific validation was skipped. Validator resolution now prefers the per-provider validator. (#4467 — thanks @diegosouzapw)
fix(translator): inject a placeholder message when the Responses API input[] is empty — a POST /v1/responses with input: [] translated to messages: [], which every upstream Chat-Completions provider rejects (surfaced as a confusing 406); a single placeholder user message is now injected, mirroring the existing empty-string handling. (#4393 — thanks @diegosouzapw)
fix(providers): serve the api.airforce live /models catalog instead of the stale seed — the api.airforce provider listed a stale hard-coded seed; it now serves the upstream live /models catalog. (#4395 — thanks @diegosouzapw)
fix(cli): non-interactive-safe prompts + context alias — the CLI's confirm()/prompt helpers no longer hang in non-interactive (piped/CI) contexts, and a singular context alias is accepted alongside contexts; the contexts workflow is documented. (#4439, #4397 — thanks @diegosouzapw)
fix(cli): omniroute update no longer reports a stale "latest" version from npm's cache — getLatestVersion() ran npm view omniroute version without --prefer-online, so npm could serve a cached value from its HTTP cache and tell users on an older build (e.g. 3.8.30) they were already "running the latest version" even after a newer one (3.8.31) was published. The version check now passes --prefer-online to force npm to revalidate against the registry. (#4376 — thanks @akbardwi)
fix(sse): web_search_20250305 no longer 400s on MiniMax's Anthropic-compatible endpoint — PR #2960 added a Claude→Claude bypass that forwards Anthropic's typed server tool web_search_20250305 untouched, assuming the Claude-format upstream implements Anthropic server tools. MiniMax's /anthropic endpoint does not, so claude → minimax requests carrying that tool got HTTP 400 "invalid params, function name or parameters is empty (2013)". supportsNativeWebSearchFallbackBypass now consults the (already-plumbed) provider and excludes providers known not to implement server tools (currently minimax) from the bypass, so the built-in web-search tool is converted to the omniroute_web_search function fallback — which MiniMax accepts as a normal function tool. (#4481 — thanks @shafqatevo)
fix(command-code): pass reasoning / thinking fields through to upstream params — Command Code requests carrying reasoning/thinking controls had those fields dropped before the upstream call, so reasoning-effort and extended-thinking settings were silently ignored; they are now forwarded to the upstream params. (#4473 — thanks @adivekar-utexas)
fix(usage): keep Kiro overage-enabled accounts routable after base quota hits zero — a Kiro account with overage enabled was excluded from routing once its base quota reached zero, even though overage billing should keep it serving; such accounts now stay routable past base-quota exhaustion. (#4469 — thanks @heaven321357 / @CleanDev-Fix)
fix(providers): model-aware supportsRedactedThinking for mixed-format providers — the redacted-thinking capability was resolved per provider rather than per model, so a mixed-format provider (some models support redacted thinking, others don't) got the wrong answer for some models; the check is now model-aware. (#4479 — thanks @TF0rd)

🔒 Security

fix(sse): use a crypto-secure RNG for combo/deck load-balancing selection — random combo/deck member selection used a non-cryptographic PRNG, flagged by CodeQL (#665); it now uses a crypto-secure RNG. (#4457 — thanks @diegosouzapw)
fix(sse): unbiased crypto.randomInt for combo selection (follow-up to #4457) — the initial crypto-secure conversion used modulo reduction over the secure bytes, which introduces a small modulo bias; selection now uses crypto.randomInt (rejection sampling) for a uniform, unbiased distribution across combo/deck members. (#4462 — thanks @diegosouzapw)

📝 Maintenance

refactor(chatCore): extract resolveChatCoreRequestSetup (first setup-phase slice) toward modularizing the chatCore god-file. (#4392 — thanks @diegosouzapw)
refactor(chatCore): extract the Codex service-tier resolvers into a pure chatCore/serviceTier.ts leaf (continues the god-file split). (#4477, #3501 — thanks @diegosouzapw)
perf(dashboard): lazy-load the usage analytics charts so the dashboard's initial bundle/paint is lighter (charts hydrate on demand). (#4466 — thanks @KooshaPari)
perf(kiro): cut request-completion hot-path CPU and cap the DB-lock event-loop block so Kiro request completion does not stall the event loop under load. (#4459 — thanks @artickc)
fix(catalog): restore-green — add OpenAI gpt-4.1-mini/gpt-4.1-nano + o3-mini/o4-mini pricing rows to keep the static-parity gate green after the registry expansion (#4394), plus the web-cookie validator shadowing fix. (#4447 — thanks @diegosouzapw)
chore(quality): reconcile file-size + complexity baselines after the /review-prs round, and the server.ts file-size baseline after the pool-tools registration (#3368). (#4461, #4423 — thanks @diegosouzapw)
docs(remote-mode): add a copy-paste end-to-end verification example. (#4430 — thanks @diegosouzapw)
docs: add operational documentation (usage/quota, database, open-sse architecture, monitoring). (#3455 — thanks @oyi77)

What's Changed

Release v3.8.32 by @diegosouzapw in #4418

Full Changelog: v3.8.31...v3.8.32