✨ Added
- feat(cli):
omniroute autostartnow accepts the shorthand the headless /omniroute servepath was missing —omniroute autostart on/... true(aliases ofenable),... off/... false(aliases ofdisable), a new... toggle, and a default... status(bareomniroute autostartis a safe read-only). Previously autostart could only be toggled from the tray (serve --tray) or the Electron Appearance tab, so a plainomniroute serveuser had no way to enable it. (The cross-platform launchd/systemd/registry logic is unchanged — this only wires the ergonomic CLI surface.) (#3331 — thanks @uniQta)
♻️ Code Quality
- refactor(chatCore): extract the chatCore request phases — idempotency check, semantic cache check, common request sanitization, and memory/skills injection — into dedicated
open-sse/handlers/chatCore/modules (idempotency.ts,semanticCache.ts,sanitization.ts,memorySkillsInjection.ts), slimming the monolithic handler with no behavior change. (Maintainer follow-up: re-deriveidempotencyKeyat the Phase 9.2 save site after the check moved into the module, fixing aReferenceErroron successful non-cached responses.) (#3598 — thanks @oyi77) - docs(opencode-provider): soft-deprecate
@omniroute/opencode-providerin favour of@omniroute/opencode-plugin. The provider package writes a static model list toopencode.jsonthat drifts behind the live OmniRoute catalog, whereas the plugin fetches/v1/modelsat OpenCode startup. The package keeps working (no code/behavior change), but its npm description and README now carry a deprecation banner with the one-line migration, and a guard test pins the notice. (#3419 — thanks @herjarsa) - chore(review): pre-release hardening from a multi-reviewer
/review-reviewsbattery over the v3.8.21 diff (7 Opus reviewers; zero blocker/high). Resolved findings: npm tarball no longer ships co-located test files (files[]negations + reconciled.npmignore; the #3578 closure gate now asserts the realnpm packoutput in both directions);getSanitizedCachedProviderLimitsMapscopes its connection scan to antigravity/agy instead of decrypting every active connection on each dashboard poll; the Antigravity quota-tier remap (toClientAntigravityQuotaModelId) is centralized inantigravityModelAliases.ts(was an inline if-ladder inusage.ts); the chatCore idempotency check returns its resolved key so the save site reuses a single derivation; and new tests pin the chatCore extracted modules, the Antigravityusage_historyfallback contract, the reasoning-wrapper prefix-preservation heuristic, the Antigravity SSEmarkdownbranch, and the upstream-ca/test no-persist guarantee. (Live-verified that agy consumer tokens are accepted by the non-dailycloudcode-pahost used byretrieveUserQuota, so #3604 is not agy-host-limited.)
🔧 Bug Fixes
- fix(routing): reasoning models (deepseek-v4-flash, nemotron, etc.) no longer return empty content in combo routing when they spend all of
max_tokenson reasoning —validateResponseQualitynow rejects an empty-content-but-reasoning_contentresponse when reasoning consumed ≥90% of completion tokens (so the combo loop retries/falls back), and reasoning models receive amax_tokensbuffer (+50%, +1000 floor) so reasoning and content both fit. (Maintainer follow-up: the round-robin buffer is applied to a per-attempt copy so it does not compound across models/retries —4096 → 6144 → 9216 → ….) (#3588 — thanks @herjarsa) - fix(routing): a valid
max_tokens-truncated upstream response is no longer misclassified as empty content and rewritten into a fake 502 —isEmptyContentResponse()flagged any Claudecontent:[]/ OpenAI empty-choice payload regardless ofstop_reason/finish_reason, so a Claude Codemax_tokens: 1connectivity ping (HTTP 200,stop_reason:"max_tokens", empty content) became a synthetic502 "Provider returned empty content"and triggered a needless family fallback. The guard now treats a terminal truncation/tool signal (Claudestop_reasonmax_tokens/tool_use, OpenAIfinish_reasonlength/tool_calls) as a legitimate completion; genuinely empty responses (no terminal reason, orstop/end_turnwith empty content) are still caught. (#3572) - fix(api):
/v1/completionsnow returns the legacy OpenAI Completions shape (object:"text_completion",choices[].text) instead of chat payloads (choices[].message|delta.content) — the endpoint routes internally through the chat pipeline, so legacy Completion clients like TabbyML'sopenai/completionbackend crashed withmissing field "text". The response (both non-streaming JSON and the SSE stream) is now translated back to the text-completion shape;[DONE]and error bodies pass through unchanged. (#3571) - fix(usage): the z.ai/GLM coding-plan quota card no longer shows "Monthly 0%" — coding plans have no monthly cap (only 5-hour windows), so the quota API reports the
TIME_LIMIT("Monthly") entry withtotal=0, and thetotal>0 ? … : 0fallback rendered a misleading 0% remaining (which can skew downstream model-choice). With no absolute cap the remaining percentage now falls back to the percentage-derived value (full/100% when 0% used). (#3580) - docs(discovery): mark
DISCOVERY_TOOL_DESIGN.md's API Endpoints table with an explicit "⚠️ Not yet implemented — Phase 2" banner — the discovery routes are a design proposal (Phase-1 stub only), and the banner makes clear theKNOWN_STALE_DOC_REFSgate suppression is intentional, not stale drift. (#3498) - fix(agent-bridge): add the missing
POST /api/tools/agent-bridge/upstream-ca/testroute — the UpstreamCaField "Test" button POSTed to it but it didn't exist (404). The new validate-only route checks the CA file exists and is a parseable PEM certificate (returns the subject/expiry) without persisting the path or activating it; it inherits the/api/tools/agent-bridge/LOCAL_ONLY classification. (#3488) - fix(gamification): the dashboard Profile page no longer hits three 404s — added the missing
GET /api/gamification/{level,badges,badges/earned}routes (management-scoped). The page is operator-wide (noapiKeyId), solevel/badges/earnedaggregate across all keys (with an optional?apiKeyIdfor a single key), andbadgesseeds the built-in catalog first (idempotent) so the grid is populated even on installs that never seeded it (see #3472). (#3484) - security(oauth): migrate the five public OAuth client_ids (Claude, Codex, Qwen, Kimi, GitHub Copilot — 9 server-side call-sites in
providerRegistry.ts+oauth.ts) from string literals toresolvePublicCred()(Hard Rule #11), matching the existing Gemini/Antigravity pattern. The values decode byte-for-byte to the same public client_ids (env overrides still win), so OAuth flows are unchanged; thecheck-public-credsallowlist is now empty. The browser-bundledcodexDeviceFlow.tscopy stays a literal by necessity (it cannot importopen-sse). (#3493) - fix(mcp):
omniroute --mcpno longer crashes on npm installs withERR_MODULE_NOT_FOUND(e.g.src/lib/combos/steps.ts) — the MCP server runs from raw TypeScript and imports acrosssrc/+open-sse/, but the publishedfilesallowlist only shipped a handful of cherry-picked paths, so the transitive closure (~400 files) was absent from the tarball.filesnow ships the backend source the MCP server needs (open-sse/+src/{domain,lib,mitm,server,shared,sse,types}/, excluding thesrc/appUI), and a new regression test computes the MCP import closure and fails if any reachable source file is not covered byfiles. (#3578) - fix(api):
API_REFERENCE.mdno longer documents a non-existent/api/guardrails*//api/shadow*surface (doc-fiction flagged bycheck-docs-symbols, frozen inKNOWN_STALE_DOC_REFS). The guardrail pipeline is real (src/lib/guardrails), so the two routes that map to actual behavior are now implemented —GET /api/guardrails(list the registered guardrails + status) andPOST /api/guardrails/test(dry-run the pre-call pipeline over a sample input), both management-scoped — while the fictionalenable/disable/logsrows and the entire/api/shadow*table (shadow A-B comparison is combo-config +/api/combos/metrics) were removed from the doc and dropped from the allowlist. (#3496) - fix(agent-bridge): the MITM "Start" button no longer reports a misleading "port 443 may be in use" for every failure cause —
startMitm()only matched the EADDRINUSE stderr line and always threw the port-443 message, so a missingROUTER_API_KEYor anEACCESpermission error sent users debugging the wrong thing. The startup watcher now buffers the MITM child's stderr andinterpretMitmStartupError()maps the realserver.cjs❌cause (port-in-use / permission-denied / missing API key / any other diagnostic line) into the surfaced error; with no captured output it stays generic instead of guessing port 443. (#3606) - fix(oauth): Kiro "Import Token" no longer reports a bare
Internal server errorthat hides the real cause — the import validates/refreshes the pasted refresh token against AWS, and the catch returned a generic 500 string, so aninvalid_grant, an expired token, or a region mismatch all surfaced identically in the dashboard. The import error now carries the sanitized upstream cause viasanitizeErrorMessage()(Hard Rule #12 — no stack, no secrets), keeping the same{ error: <string> }response shape, and still falls back to the generic message when there is nothing to report. (#3589) - fix(antigravity): the Antigravity/agy Gemini 3.5 Flash catalog now exposes clean public tier IDs (
gemini-3.5-flash-low/-medium/-high, matching Antigravity 2.0.4's Low/Medium/High selector) and maps them to the live upstream IDs at the executor boundary, instead of the old confusing-preview/-agentnames. Antigravity model-id normalization moved out of the global model resolver into the executor so client-visible IDs are no longer rewritten before account/credential routing and logging. (Maintainer follow-up: keptgemini-3.5-flash-previewas a hidden backward-compat alias routing to the High tier so saved combos/configs keep working; live-validated the tier set via theagyCLI catalog.) (#3603 — thanks @dhaern) - fix(usage): Antigravity/agy Provider Limits now report accurate consumption —
retrieveUserQuota(live usage) is preferred over thefetchAvailableModelscatalog view (which keeps reporting full buckets after real usage), with a localusage_historyfallback for buckets that are only catalog-visible; cached entries are sanitized so retired upstream IDs are not re-exposed, and a deduplicated post-usage refresh keeps the dashboard fresh after each request. (Maintainer follow-up: the post-usage refresh is decoupled through a lightweightusageEventsbus sousageHistoryno longer importsproviderLimits/the executors graph, keeping thetypecheck:coresurface stable.) (#3604 — thanks @dhaern) - fix(gemini): textual reasoning wrappers emitted as assistant prose (
<think>/<thinking>/<thought>/<internal_thought>, including malformed/open tags like<thought\n…before a tool call) are now routed toreasoning_contentinstead of leaking into visiblecontent, in both the non-streaming sanitizer and the Gemini streaming translator (with split-chunk buffering so a tag fragmented across SSE chunks stays hidden). Structured tool calls and the existing textual tool-call conversion are preserved. (#3605 — thanks @dhaern) - fix(gemini): a signed native
functionCallarriving while a textual reasoning wrapper opened in an earlier streaming chunk is still buffered now flushes that buffered reasoning toreasoning_contentbefore the tool call, instead of silently discarding it. (Pre-release/review-reviewsfinding.) - fix(api):
/v1/completionsnow drops a stale upstreamcontent-lengthon the SSE branch too (the JSON branch already did) — re-serialization changes the byte length, so a buffered SSE body could otherwise advertise the pre-rewrite length and truncate/hang the client. (Pre-release/review-reviewsfinding.)
🙌 Contributors
Thanks to all who contributed to this release:
@uniQta · @herjarsa · @oyi77 · @dhaern · @diegosouzapw
What's Changed
- fix: add reasoning token buffer for combo routing (fixes #3587) by @herjarsa in #3588
- Release v3.8.21 by @diegosouzapw in #3593
Full Changelog: v3.8.20...v3.8.21
What's Changed
- fix: pass through valid max_tokens-truncated responses instead of fake 502 (#3572) by @diegosouzapw in #3595
- fix: /v1/completions returns legacy text-completion format, not chat (#3571) by @diegosouzapw in #3596
- fix: z.ai/GLM coding plan no longer shows Monthly 0% when no monthly cap (#3580) by @diegosouzapw in #3597
- docs: mark DISCOVERY_TOOL_DESIGN endpoints as Phase-2 not-yet-implemented (#3498) by @diegosouzapw in #3599
- fix(agent-bridge): add validate-only upstream-ca/test route (#3488) by @diegosouzapw in #3600
- Refactor: Extract chatCore phases into modular files by @oyi77 in #3598
- fix(api): implement GET /api/guardrails + POST /api/guardrails/test, drop shadow/guardrails doc-fiction (#3496) by @diegosouzapw in #3602
- fix(gemini): isolate textual reasoning wrappers by @dhaern in #3605
- fix(antigravity): normalize Gemini 3.5 Flash tier IDs by @dhaern in #3603
- fix(agent-bridge): surface real MITM startup-failure cause, not always port 443 (#3606) by @diegosouzapw in #3608
- fix(oauth): surface real Kiro import-token failure cause, not a bare 500 (#3589) by @diegosouzapw in #3609
- docs(opencode-provider): soft-deprecate in favor of @omniroute/opencode-plugin (#3419) by @diegosouzapw in #3613
- fix(usage): normalize Antigravity and agy provider quotas by @dhaern in #3604
- feat(cli): add autostart on/off/toggle shorthand for headless serve mode (#3331) by @diegosouzapw in #3614
- fix(review): resolve findings from /review-reviews battery (v3.8.21 hardening) by @diegosouzapw in #3618
Full Changelog: v3.8.20...v3.8.21