OAuth resilience & observability release: spaced/sequential quota sync for OAuth accounts, a per-provider proactive-refresh skip list to keep short-TTL providers (Kimi) alive without re-exposing the Codex Auth0 cascade, token-expiry visibility on the provider cards, a new provider-stats dashboard, plus a wide batch of provider fixes (DeepSeek-web tool calls, Antigravity, Qoder, MiniMax, GitHub Copilot, Fireworks, llama.cpp, t3.chat-web, Kiro, Kilocode) and Podman deployment support.
✨ New Features
- dashboard: new Provider Stats page +
/api/provider-statsendpoint — per-provider and per-model aggregates fromcall_logsplus live combo/telemetry/tool-latency overlays. (#3175 — thanks @pizzav-xyz / @diegosouzapw) - metrics: cross-request TTFT and gap-after-tool-call latency tracking, aggregated per provider. (#3173 — thanks @pizzav-xyz / @diegosouzapw)
- quota: show the OAuth token expiry on provider cards (small, blue, informative — "Token expires in …" / "Token expired"). (#3178 — thanks @diegosouzapw)
- responses: strip
previous_response_idfor stateless Responses upstreams, with an auto/strip/preserve setting + UI so stateless clients (e.g. VS Code Custom Endpoint) keep context. (#3143 — thanks @JxnLexn) - deploy: Podman/rootless deployment support (contrib units +
CONTAINER_HOSThint) and larger upload body-size limits for/v1/files. (#3128 — thanks @hartmark)
🔧 Bug Fixes
- usage: sequential + spaced OAuth quota sync (
PROVIDER_LIMITS_SYNC_SPACING_MS) so a host no longer bursts simultaneous usage/refresh requests; reactive forced re-mint after a 401 on the per-card refresh (recovers imported accounts); a genuine 401 now surfaces a re-authenticate hint. (#3156 — thanks @diegosouzapw) - healthcheck: per-provider proactive-refresh skip list (
OMNIROUTE_HEALTHCHECK_SKIP_PROVIDERS) — keep rotating-cascade providers (Codex/OpenAI) reactive-only while short-TTL providers (Kimi-coding) keep refreshing proactively. (#3159 — thanks @diegosouzapw) - providers: on
?refresh=truewith no remote models, don't resurface the just-cleared synced cache into the local-catalog fallback. (#3181 — thanks @diegosouzapw) - providers: use synced models as the authoritative local catalog across all providers (even on connections that didn't run the sync). (#3148 — thanks @herjarsa)
- web-tools: parse bare-JSON tool calls for DeepSeek-web with fuzzy tool-name matching scoped to the requested tools. (#3157 — thanks @wilsonicdev)
- responses: normalize
image_urlparts across every Responses input path (message content, replayed output items,function_call_output) to avoid upstream 400s. (#3150 — thanks @wilsonicdev) - antigravity: dynamic upstream model resolution via the MITM alias table (server-only executor), with a guard against corrupted alias values. (#3144 — thanks @herjarsa)
- qoder: bifurcate validation by token type — PAT (
pt-) → Cosy, regular API key → dashscope — matching the executor's routing. (#3149 — thanks @herjarsa) - api-manager: preserve API key expiration in local time (the
datetime-localinput no longer silently shifts to UTC) + a clear button. (#3146 — thanks @xz-dev) - opencode-plugin: map
caps.thinking → ModelV2.capabilities.interleavedfor single models and combos. (#3138 — thanks @mrmm) - kiro: optional
targetProvideron the social-OAuth exchange so Kiro-based providers can reuse the social login flow. (#3176 — thanks @pizzav-xyz) - misc: broaden the DeepSeek reasoning-replay regex (
-free/zen/deepseek-v4), exportProviderProfile, and guard a non-string directory entry in the binary manager. (#3177 — thanks @pizzav-xyz) - providerRegistry: point kilocode at the OpenAI format + default executor (matching its sibling
kilo-gateway). (#3166 — thanks @androw) - fireworks: preserve fully-qualified router/model IDs so Fire Pass router IDs (
accounts/fireworks/routers/...) are no longer double-prefixed into an upstream 404. (#3133 — thanks @KooshaPari) - llama-cpp: route requests to the configured local baseUrl instead of OpenAI's API (which returned an OpenAI-worded 401). (#3136 — thanks @tjengbudi)
- t3-chat-web: parse cookies + convexSessionId from the single stored credential so t3.chat web connections work (the executor previously read fields the credential pipeline never produced). (#3007 — thanks @minhtran162)
- minimax: stop capping MiniMax-M3 / MiniMax-M2.7
max_tokensat the 8192 default — add the M3 model spec (512K output) and make model-spec lookups case-insensitive. (#3141 — thanks @totaltube) - github-copilot: discover the model catalog live from
api.githubcopilot.com/modelsso Import Models refreshes and only entitled models are listed (with fallback to the static catalog). (#3120, #3121 — thanks @gabrielmoreira) - combo: invalidate the nested-combo cache on combo edits so removed targets/models stop being served within the 10s window; log the resolved DATA_DIR at startup to diagnose multi-replica volume mismatches. (#3147 — thanks @ViFigueiredo)
- providers: resolve web-provider alias collisions. (thanks @diegosouzapw)
📝 Maintenance
- deps: bump hono from 4.12.18 to 4.12.23. (#3179 — thanks @dependabot)
- ci(electron): make the macOS-arm64 smoke step best-effort (headless GPU crash). (#3137 — thanks @diegosouzapw)
- chore(release): open the v3.8.10 development cycle. (thanks @diegosouzapw)
What's Changed
- ci(electron): make macos-arm64 smoke best-effort (headless GPU crash) by @diegosouzapw in #3137
- Release v3.8.10 by @diegosouzapw in #3140
- fix(security/quality): clear CodeQL high + SonarCloud reliability gate (v3.8.10 hardening) by @diegosouzapw in #3186
Full Changelog: v3.8.9...v3.8.10