diegosouzapw/OmniRoute v3.8.26 on GitHub

✨ New Features

feat(media): Vertex AI (Google) speech, transcription, music & video generation — Vertex AI's Google media models are now routable through dynamic discovery: speech synthesis, audio transcription, music generation, and video generation. (#3929 — thanks @artickc)
feat(glm): add GLM-5.2 with effort-tier routing (high/max) — GLM-5.2 is registered with high/max effort-tier routing. (#3885 — thanks @dhaern)
feat(combo): add a sticky round-robin target limit — round-robin combos can cap how many targets stay "sticky" within a session (stickyRoundRobinLimit), balancing stickiness against spread. (#3846 — thanks @adivekar-utexas)
feat(openrouter): connection presets — OpenRouter connections support reusable presets (provider routing / sort / quantization preferences), selectable when adding a connection. (#3878 — thanks @rdself)

🐛 Fixed

fix(compression/memory): stop memory + compression from poisoning the upstream prompt cache — with compression and/or memory enabled, requests to caching providers (Anthropic-family) missed the prompt cache on every turn, multiplying cost. Two root causes: (1) memory injection prepended the retrieved memories — which vary per user query — at index 0 of the message array, shifting the entire cacheable prefix every turn; memory is now inserted just before the last user message when the request carries cache_control breakpoints, keeping the cacheable prefix (system prompt + prior turns) byte-stable. (2) the cache-aware skipSystemPrompt flag computed by getCacheAwareStrategy() was dropped by selectCompressionStrategy() (which can only return a mode), so the system prompt could still be compressed under caching; a new resolveCacheAwareConfig() now forces preserveSystemPrompt on for caching requests. (#3936, closes #3890 — thanks @xenstar / @diegosouzapw)
fix(providers): register BytePlus ModelArk so its API key can be added — adding a BytePlus (ark-…) key reported "invalid". byteplus was present in the provider catalog (APIKEY_PROVIDERS) but never registered in the routing registry, so key validation fell through to { unsupported: true } → HTTP 400 → the UI rendered every key as invalid (and the provider was unusable for inference). Added a registry entry modeled on the existing Volcengine Ark provider: OpenAI-compatible format, base https://ark.ap-southeast.bytepluses.com/api/v3 (region ap-southeast-1), Authorization: Bearer auth, seeded with the catalog's advertised models (Seed 2.0, Kimi K2 Thinking, GLM 4.7, GPT-OSS-120B). (#3935, closes #3877 — thanks @nikohd12 / @diegosouzapw)
fix(providers): Nous Research key validation no longer fails on a stale probe model — adding a valid Nous Research API key reported "invalid" even though the same key worked via the portal's copy-shell curl. The validation probe sent model: "nousresearch/hermes-4-70b", which Nous does not serve, so the API returned 400 and the validator (which only treated 200/429 as success) reported the key invalid. The probe now uses the real Hermes-4-70B slug, and any non-auth 4xx (400/404/422) is treated as a valid key (the request shape was wrong, not the credentials) — mirroring the longcat/nvidia validators so a future model rename can't re-break key validation. (#3934, closes #3881 — thanks @FerLuisxd / @diegosouzapw)
fix(stream): persist mid-stream upstream failures — when an upstream stream fails partway through, the partial response and incremental usage are now finalized and persisted instead of lost; extracts a shared streamFailureFinalization path and merges incremental Claude usage (follow-up to #3879). (#3937 — thanks @rdself)
fix(perplexity-web): update the request payload to schema v2.18 (HTTP 400) — Perplexity web requests started returning HTTP 400; the request payload was updated to Perplexity's v2.18 schema. (#3938 — thanks @artickc)
fix(stream): keep the in-flight request payload in sync — the pending-by-id request record is now updated in place (Object.assign) so the in-flight payload stays consistent with what was dispatched (coexists with #3937). (#3940 — thanks @rdself)
fix: stabilize reasoning streams and request logs — reasoning-token streaming and the request-log capture path were stabilized to avoid dropped/duplicated reasoning frames and inconsistent log entries. (#3879 — thanks @rdself)
fix(opencode-plugin): include nested combo-refs in the LCD context window — the OpenCode plugin now follows nested combo references when computing the least-common-denominator context window, so a combo nested inside another no longer reports an inflated window. (#3910 — thanks @herjarsa)
fix(models): correct the failed-model auto-hide defaults — the defaults governing when a failed model is auto-hidden were corrected, and auto-hide is now opt-in so models are no longer dropped unexpectedly. (#3930 — thanks @rdself)
fix(openrouter): show the preset field when editing a connection — the connection-preset field appeared only when creating a connection, not when editing one; it now appears in both (follow-up to #3878). (#3921 — thanks @rdself)
fix(sse): announce the assistant role on the first delta (Responses→Chat) — the first SSE delta of a Responses-API→Chat-Completions stream now carries role: "assistant", which strict OpenAI-compatible clients expect before content deltas. (#3911 — thanks @diego-anselmo)
fix(vertex): add the generative-language scope so SA-JSON model discovery works — Vertex service-account (SA-JSON) model discovery failed without the generative-language OAuth scope; the scope is now requested. (#3922 — thanks @artickc)
fix(proxy): direct-connection fallback for control-plane ops when a pinned proxy is unreachable — control-plane operations (validation, discovery) now fall back to a direct connection when a connection's pinned proxy is unreachable, instead of failing outright. (#3906 — thanks @zhiru)
fix(providers): prevent zombie-socket hangs for zai/glm and tighten the default keepAlive — zai/glm could hang on dead keep-alive sockets; the default keepAlive was tightened to evict zombie sockets. (#3907 — thanks @insoln)
fix(setup): remove the stale CJS bundle check from setup-open-code — the OpenCode setup helper no longer checks for a CJS bundle that the now ESM-only plugin no longer ships. (#3908 — thanks @herjarsa)
fix(opencode-plugin): drop the CJS bundle to fix the OpenCode plugin loader — the plugin is now ESM-only, fixing the OpenCode loader which failed on the dual CJS/ESM build. (#3883 — thanks @herjarsa)
fix(mcp): fall back to node:sqlite when the better-sqlite3 binding is missing — the MCP server now falls back to Node's built-in node:sqlite when the native better-sqlite3 binding is unavailable, instead of crashing. (#3887 — thanks @megamen32)
fix(models): correct the generate-models alias lookup — alias resolution during model generation was corrected so aliased model ids resolve to their canonical entry. (#3870 — thanks @YunyunZhai)
fix(combo): guard the candidate pool against an empty array — combo candidate-pool selection no longer throws when the pool resolves to an empty array. (#3871 — thanks @YunyunZhai)

🔒 Security & Hardening

fix(security): bump form-data + vite (2 HIGH), harden workflow template-injection & allowlist guarded workflow_run — two HIGH Dependabot advisories (form-data, vite) were upgraded; GitHub Actions workflows were hardened against ${{ }} template-injection (untrusted values now passed via env:); and the guarded workflow_run trigger was allowlisted. (#3949 — thanks @diegosouzapw)

🧹 Internal / Quality / Docs

fix(ci): grant contents: write to the npm publish job for SBOM attach — the v3.8.25 TokenPermissions hardening set the npm-publish publish job to contents: read, but its "Attach SBOM to GitHub Release" step (gh release upload) needs contents: write and failed with HTTP 403 on the v3.8.25 release (npm / GitHub Packages / opencode-plugin / Docker / Electron all published fine; only the SBOM attach broke — the v3.8.25 SBOM was attached manually). (#3874 — thanks @diegosouzapw)
fix(providers): keep the /v1/models catalog alias-only (release-time follow-up to #3870) — #3870 made generateModels() also key the registry by each provider's raw id, which surfaced phantom opencode/* entries in /v1/models that collide with the opencode/ → opencode-zen route (a regression vs v3.8.25, caught by the #2798 catalog regression test). getProviderModels() now resolves a raw provider id to its alias at lookup time instead of mirroring raw-id keys into the model namespace, preserving #3870's intent (getProviderModels("github") returns the same models as the gh alias) without polluting the public catalog. (#3870 — thanks @diegosouzapw / @YunyunZhai)
ci(quality): make zizmor / gitleaks / osv scanners functional + freeze advisory baselines — the supply-chain scanners are now actually executed (correct install + invocation) with frozen advisory baselines so new findings surface as diffs. (#3947 — thanks @diegosouzapw)
ci(quality): fix scanner install + size-limit preset, promote codeqlAlerts to blocking — corrected the scanner install and the size-limit preset, and promoted the codeqlAlerts ratchet from advisory to blocking. (#3945 — thanks @diegosouzapw)
ci(quality): add an OpenAPI breaking-change gate (oasdiff, advisory) + fix dangling $refs — a CI gate diffs the OpenAPI spec against the base branch (BASE_REF) with oasdiff to surface breaking API changes, and the spec's dangling $refs were repaired. (#3951 — thanks @diegosouzapw)
ci(quality): add a schemathesis API-fuzz nightly (advisory) — a nightly schemathesis property/fuzz pass against the OpenAPI spec (Quality Gates Fase 8 · Bloco B.4, advisory). (#3956 — thanks @diegosouzapw)
ci(quality): flip the secret / workflow / bundle-size scanners to ratchet-blocking — the secret-scan, workflow-lint and bundle-size gates moved from advisory to ratchet-blocking, with their baselines frozen and unit coverage for each scanner (Etapa 2). (#3961 — thanks @diegosouzapw)
chore(quality): re-baseline the ESLint-warning ratchet (3760 → 3769) — absorbs the v3.8.26-cycle warning drift into quality-baseline.json (manual re-baseline, never an automatic upward ratchet). (#3962 — thanks @diegosouzapw)
ci(quality): wire Stryker mutation testing as an advisory nightly — Stryker mutation testing runs nightly (advisory) — Quality Gates Fase 7 · Task 11. (#3898 — thanks @diegosouzapw)
ci(quality): freeze per-module coverage floors + wire require-tighten (advisory) — per-module coverage floors are frozen with an advisory "require-tighten" check that flags modules drifting below their floor. (#3901 — thanks @diegosouzapw)
ci(quality): enforce the stale-allowlist check on check-known-symbols — stale allowlist entries (suppressing a symbol that no longer exists) now fail the gate — Fase 6A.3 follow-up. (#3899 — thanks @diegosouzapw)
test(ci): de-flake pipeline-payloads via per-test re-seed + honest reset — the pipeline-payloads suite now re-seeds per test and performs an honest cache reset, eliminating a cross-test ordering flake. (#3893 — thanks @diegosouzapw)
fix(ci): drop the secrets-in-job-if from nightly-llm-security — referencing secrets in a job-level if caused a startup_failure on push; the gating was moved so the workflow starts cleanly. (#3892 — thanks @diegosouzapw)
test: reconcile the runtime-timeouts keepAlive baseline to 4000 after the #3907 source revert — the keepAlive assertion was realigned to the source value (4000) after #3907's source-side revert. (#3933 — thanks @diegosouzapw)
chore(repo): nest quality-gate state under config/quality, declutter the repo root — baselines / allowlists / metrics moved under config/quality/, trimming the tracked root file count. (#3896 — thanks @diegosouzapw)
docs: refresh the provider count to 226 + regenerate PROVIDER_REFERENCE.md — the README advertised a stale 177 providers; the canonical generator (scripts/docs/gen-provider-reference.ts) now reports 226 unique provider IDs, so the README badges/anchors and the generated provider reference were brought in sync. Also adds a documentation audit/sync report. (thanks @diegosouzapw)
docs: sync all documentation to v3.8.24 + count-guard & wiki/prose CI — a full documentation sync with a strict provider/locale count-guard plus Vale / markdownlint prose CI. (#3804 — thanks @diegosouzapw)
docs: regenerate stale counts to canonical values — 226 providers / 87 MCP tools / 15 strategies / 42 locales. (#3904 — thanks @diegosouzapw)
docs(quality): correct the stale gate count + add an opt-in agent-lsp scaffold — (#3902 — thanks @diegosouzapw)
docs(mcp): correct the MCP tool-inventory diagram source + text to 87 tools — (#3909 — thanks @diegosouzapw)
docs: update the compression section to the 9-engine multi-layer stack — (#3894 — thanks @diegosouzapw)
ci(docs): automate GitHub wiki sync (add missing pages + cover counts) — (#3900 — thanks @diegosouzapw)
docs: require a dedicated git worktree + branch per development task (Hard Rule #19) — codifies the worktree-isolation rule after the shared-checkout incidents. (#3939 — thanks @diegosouzapw)
fix(docs): add MDX frontmatter to DOCUMENTATION_AUDIT_REPORT so the fumadocs build passes — the audit report lacked the title: frontmatter MDX pages require. (thanks @diegosouzapw)

What's Changed

fix(ci): grant contents:write to npm publish job for SBOM attach by @diegosouzapw in #3874
deps: bump electron-builder from 26.15.2 to 26.15.3 in /electron by @dependabot[bot] in #3913
deps: bump electron from 42.3.3 to 42.4.0 in /electron by @dependabot[bot] in #3914
Release v3.8.26 by @diegosouzapw in #3875
fix(release): bring post-merge quality gates (#3961, #3962) to main before tagging v3.8.26 by @diegosouzapw in #3964

Full Changelog: v3.8.25...v3.8.26

What's Changed

fix(ci): grant contents:write to npm publish job for SBOM attach by @diegosouzapw in #3874
deps: bump electron-builder from 26.15.2 to 26.15.3 in /electron by @dependabot[bot] in #3913
deps: bump electron from 42.3.3 to 42.4.0 in /electron by @dependabot[bot] in #3914
Release v3.8.26 by @diegosouzapw in #3875
fix(release): bring post-merge quality gates (#3961, #3962) to main before tagging v3.8.26 by @diegosouzapw in #3964

Full Changelog: v3.8.25...v3.8.26