diegosouzapw/OmniRoute v3.8.21 on GitHub

✨ Added

feat(cli): omniroute autostart now accepts the shorthand the headless / omniroute serve path was missing — omniroute autostart on / ... true (aliases of enable), ... off / ... false (aliases of disable), a new ... toggle, and a default ... status (bare omniroute autostart is a safe read-only). Previously autostart could only be toggled from the tray (serve --tray) or the Electron Appearance tab, so a plain omniroute serve user had no way to enable it. (The cross-platform launchd/systemd/registry logic is unchanged — this only wires the ergonomic CLI surface.) (#3331 — thanks @uniQta)

♻️ Code Quality

refactor(chatCore): extract the chatCore request phases — idempotency check, semantic cache check, common request sanitization, and memory/skills injection — into dedicated open-sse/handlers/chatCore/ modules (idempotency.ts, semanticCache.ts, sanitization.ts, memorySkillsInjection.ts), slimming the monolithic handler with no behavior change. (Maintainer follow-up: re-derive idempotencyKey at the Phase 9.2 save site after the check moved into the module, fixing a ReferenceError on successful non-cached responses.) (#3598 — thanks @oyi77)
docs(opencode-provider): soft-deprecate @omniroute/opencode-provider in favour of @omniroute/opencode-plugin. The provider package writes a static model list to opencode.json that drifts behind the live OmniRoute catalog, whereas the plugin fetches /v1/models at OpenCode startup. The package keeps working (no code/behavior change), but its npm description and README now carry a deprecation banner with the one-line migration, and a guard test pins the notice. (#3419 — thanks @herjarsa)
chore(review): pre-release hardening from a multi-reviewer /review-reviews battery over the v3.8.21 diff (7 Opus reviewers; zero blocker/high). Resolved findings: npm tarball no longer ships co-located test files (files[] negations + reconciled .npmignore; the #3578 closure gate now asserts the real npm pack output in both directions); getSanitizedCachedProviderLimitsMap scopes its connection scan to antigravity/agy instead of decrypting every active connection on each dashboard poll; the Antigravity quota-tier remap (toClientAntigravityQuotaModelId) is centralized in antigravityModelAliases.ts (was an inline if-ladder in usage.ts); the chatCore idempotency check returns its resolved key so the save site reuses a single derivation; and new tests pin the chatCore extracted modules, the Antigravity usage_history fallback contract, the reasoning-wrapper prefix-preservation heuristic, the Antigravity SSE markdown branch, and the upstream-ca/test no-persist guarantee. (Live-verified that agy consumer tokens are accepted by the non-daily cloudcode-pa host used by retrieveUserQuota, so #3604 is not agy-host-limited.)

🔧 Bug Fixes

fix(routing): reasoning models (deepseek-v4-flash, nemotron, etc.) no longer return empty content in combo routing when they spend all of max_tokens on reasoning — validateResponseQuality now rejects an empty-content-but-reasoning_content response when reasoning consumed ≥90% of completion tokens (so the combo loop retries/falls back), and reasoning models receive a max_tokens buffer (+50%, +1000 floor) so reasoning and content both fit. (Maintainer follow-up: the round-robin buffer is applied to a per-attempt copy so it does not compound across models/retries — 4096 → 6144 → 9216 → ….) (#3588 — thanks @herjarsa)
fix(routing): a valid max_tokens-truncated upstream response is no longer misclassified as empty content and rewritten into a fake 502 — isEmptyContentResponse() flagged any Claude content:[] / OpenAI empty-choice payload regardless of stop_reason/finish_reason, so a Claude Code max_tokens: 1 connectivity ping (HTTP 200, stop_reason:"max_tokens", empty content) became a synthetic 502 "Provider returned empty content" and triggered a needless family fallback. The guard now treats a terminal truncation/tool signal (Claude stop_reason max_tokens/tool_use, OpenAI finish_reason length/tool_calls) as a legitimate completion; genuinely empty responses (no terminal reason, or stop/end_turn with empty content) are still caught. (#3572)
fix(api): /v1/completions now returns the legacy OpenAI Completions shape (object:"text_completion", choices[].text) instead of chat payloads (choices[].message|delta.content) — the endpoint routes internally through the chat pipeline, so legacy Completion clients like TabbyML's openai/completion backend crashed with missing field "text". The response (both non-streaming JSON and the SSE stream) is now translated back to the text-completion shape; [DONE] and error bodies pass through unchanged. (#3571)
fix(usage): the z.ai/GLM coding-plan quota card no longer shows "Monthly 0%" — coding plans have no monthly cap (only 5-hour windows), so the quota API reports the TIME_LIMIT ("Monthly") entry with total=0, and the total>0 ? … : 0 fallback rendered a misleading 0% remaining (which can skew downstream model-choice). With no absolute cap the remaining percentage now falls back to the percentage-derived value (full/100% when 0% used). (#3580)
docs(discovery): mark DISCOVERY_TOOL_DESIGN.md's API Endpoints table with an explicit "⚠️ Not yet implemented — Phase 2" banner — the discovery routes are a design proposal (Phase-1 stub only), and the banner makes clear the KNOWN_STALE_DOC_REFS gate suppression is intentional, not stale drift. (#3498)
fix(agent-bridge): add the missing POST /api/tools/agent-bridge/upstream-ca/test route — the UpstreamCaField "Test" button POSTed to it but it didn't exist (404). The new validate-only route checks the CA file exists and is a parseable PEM certificate (returns the subject/expiry) without persisting the path or activating it; it inherits the /api/tools/agent-bridge/ LOCAL_ONLY classification. (#3488)
fix(gamification): the dashboard Profile page no longer hits three 404s — added the missing GET /api/gamification/{level,badges,badges/earned} routes (management-scoped). The page is operator-wide (no apiKeyId), so level/badges/earned aggregate across all keys (with an optional ?apiKeyId for a single key), and badges seeds the built-in catalog first (idempotent) so the grid is populated even on installs that never seeded it (see #3472). (#3484)
security(oauth): migrate the five public OAuth client_ids (Claude, Codex, Qwen, Kimi, GitHub Copilot — 9 server-side call-sites in providerRegistry.ts + oauth.ts) from string literals to resolvePublicCred() (Hard Rule #11), matching the existing Gemini/Antigravity pattern. The values decode byte-for-byte to the same public client_ids (env overrides still win), so OAuth flows are unchanged; the check-public-creds allowlist is now empty. The browser-bundled codexDeviceFlow.ts copy stays a literal by necessity (it cannot import open-sse). (#3493)
fix(mcp): omniroute --mcp no longer crashes on npm installs with ERR_MODULE_NOT_FOUND (e.g. src/lib/combos/steps.ts) — the MCP server runs from raw TypeScript and imports across src/ + open-sse/, but the published files allowlist only shipped a handful of cherry-picked paths, so the transitive closure (~400 files) was absent from the tarball. files now ships the backend source the MCP server needs (open-sse/ + src/{domain,lib,mitm,server,shared,sse,types}/, excluding the src/app UI), and a new regression test computes the MCP import closure and fails if any reachable source file is not covered by files. (#3578)
fix(api): API_REFERENCE.md no longer documents a non-existent /api/guardrails* / /api/shadow* surface (doc-fiction flagged by check-docs-symbols, frozen in KNOWN_STALE_DOC_REFS). The guardrail pipeline is real (src/lib/guardrails), so the two routes that map to actual behavior are now implemented — GET /api/guardrails (list the registered guardrails + status) and POST /api/guardrails/test (dry-run the pre-call pipeline over a sample input), both management-scoped — while the fictional enable/disable/logs rows and the entire /api/shadow* table (shadow A-B comparison is combo-config + /api/combos/metrics) were removed from the doc and dropped from the allowlist. (#3496)
fix(agent-bridge): the MITM "Start" button no longer reports a misleading "port 443 may be in use" for every failure cause — startMitm() only matched the EADDRINUSE stderr line and always threw the port-443 message, so a missing ROUTER_API_KEY or an EACCES permission error sent users debugging the wrong thing. The startup watcher now buffers the MITM child's stderr and interpretMitmStartupError() maps the real server.cjs ❌ cause (port-in-use / permission-denied / missing API key / any other diagnostic line) into the surfaced error; with no captured output it stays generic instead of guessing port 443. (#3606)
fix(oauth): Kiro "Import Token" no longer reports a bare Internal server error that hides the real cause — the import validates/refreshes the pasted refresh token against AWS, and the catch returned a generic 500 string, so an invalid_grant, an expired token, or a region mismatch all surfaced identically in the dashboard. The import error now carries the sanitized upstream cause via sanitizeErrorMessage() (Hard Rule #12 — no stack, no secrets), keeping the same { error: <string> } response shape, and still falls back to the generic message when there is nothing to report. (#3589)
fix(antigravity): the Antigravity/agy Gemini 3.5 Flash catalog now exposes clean public tier IDs (gemini-3.5-flash-low/-medium/-high, matching Antigravity 2.0.4's Low/Medium/High selector) and maps them to the live upstream IDs at the executor boundary, instead of the old confusing -preview/-agent names. Antigravity model-id normalization moved out of the global model resolver into the executor so client-visible IDs are no longer rewritten before account/credential routing and logging. (Maintainer follow-up: kept gemini-3.5-flash-preview as a hidden backward-compat alias routing to the High tier so saved combos/configs keep working; live-validated the tier set via the agy CLI catalog.) (#3603 — thanks @dhaern)
fix(usage): Antigravity/agy Provider Limits now report accurate consumption — retrieveUserQuota (live usage) is preferred over the fetchAvailableModels catalog view (which keeps reporting full buckets after real usage), with a local usage_history fallback for buckets that are only catalog-visible; cached entries are sanitized so retired upstream IDs are not re-exposed, and a deduplicated post-usage refresh keeps the dashboard fresh after each request. (Maintainer follow-up: the post-usage refresh is decoupled through a lightweight usageEvents bus so usageHistory no longer imports providerLimits/the executors graph, keeping the typecheck:core surface stable.) (#3604 — thanks @dhaern)
fix(gemini): textual reasoning wrappers emitted as assistant prose (<think>/<thinking>/<thought>/<internal_thought>, including malformed/open tags like <thought\n… before a tool call) are now routed to reasoning_content instead of leaking into visible content, in both the non-streaming sanitizer and the Gemini streaming translator (with split-chunk buffering so a tag fragmented across SSE chunks stays hidden). Structured tool calls and the existing textual tool-call conversion are preserved. (#3605 — thanks @dhaern)
fix(gemini): a signed native functionCall arriving while a textual reasoning wrapper opened in an earlier streaming chunk is still buffered now flushes that buffered reasoning to reasoning_content before the tool call, instead of silently discarding it. (Pre-release /review-reviews finding.)
fix(api): /v1/completions now drops a stale upstream content-length on the SSE branch too (the JSON branch already did) — re-serialization changes the byte length, so a buffered SSE body could otherwise advertise the pre-rewrite length and truncate/hang the client. (Pre-release /review-reviews finding.)

🙌 Contributors

Thanks to all who contributed to this release:
@uniQta · @herjarsa · @oyi77 · @dhaern · @diegosouzapw

What's Changed

fix: add reasoning token buffer for combo routing (fixes #3587) by @herjarsa in #3588
Release v3.8.21 by @diegosouzapw in #3593

Full Changelog: v3.8.20...v3.8.21

What's Changed

fix: pass through valid max_tokens-truncated responses instead of fake 502 (#3572) by @diegosouzapw in #3595
fix: /v1/completions returns legacy text-completion format, not chat (#3571) by @diegosouzapw in #3596
fix: z.ai/GLM coding plan no longer shows Monthly 0% when no monthly cap (#3580) by @diegosouzapw in #3597
docs: mark DISCOVERY_TOOL_DESIGN endpoints as Phase-2 not-yet-implemented (#3498) by @diegosouzapw in #3599
fix(agent-bridge): add validate-only upstream-ca/test route (#3488) by @diegosouzapw in #3600
Refactor: Extract chatCore phases into modular files by @oyi77 in #3598
fix(api): implement GET /api/guardrails + POST /api/guardrails/test, drop shadow/guardrails doc-fiction (#3496) by @diegosouzapw in #3602
fix(gemini): isolate textual reasoning wrappers by @dhaern in #3605
fix(antigravity): normalize Gemini 3.5 Flash tier IDs by @dhaern in #3603
fix(agent-bridge): surface real MITM startup-failure cause, not always port 443 (#3606) by @diegosouzapw in #3608
fix(oauth): surface real Kiro import-token failure cause, not a bare 500 (#3589) by @diegosouzapw in #3609
docs(opencode-provider): soft-deprecate in favor of @omniroute/opencode-plugin (#3419) by @diegosouzapw in #3613
fix(usage): normalize Antigravity and agy provider quotas by @dhaern in #3604
feat(cli): add autostart on/off/toggle shorthand for headless serve mode (#3331) by @diegosouzapw in #3614
fix(review): resolve findings from /review-reviews battery (v3.8.21 hardening) by @diegosouzapw in #3618

Full Changelog: v3.8.20...v3.8.21