github can1357/oh-my-pi v16.1.22

6 hours ago

@oh-my-pi/pi-ai

Fixed

  • Fixed llama.cpp / LM Studio / vLLM (and any local OpenAI-compatible server on a loopback or RFC1918 baseUrl) re-processing the full prompt on every assistant continuation when the prior turn produced reasoning_content: the openai-completions encoder dropped the preserved thinking block on re-serialization for compat profiles without requiresReasoningContentForToolCalls / thinkingFormat: "zai", so the chat template re-rendered the assistant turn without <think>…</think> and the rendered tokens diverged from the slot's KV cache state. The auto-learn capture-at-stop nudge made it reproduce on every turn. The encoder now replays preserved thinking as reasoning_content (honoring the streamed signature when it identifies a recognized wire field — reasoning_content / reasoning / reasoning_text — and falling back to the configured reasoningContentField for opaque signatures) whenever the new compat.replayReasoningContent flag is set, and the cross-API transformMessages predicate (openAICompletionsReplaysUnsignedThinking) honors the same flag ahead of the model.reasoning gate so a switch into a discovered local target (where the spec carries reasoning: false because the upstream /models endpoints don't advertise the capability) still preserves the prior turn's thinking block as signature-stripped reasoning instead of demoting it to conversation text. The chat-template-rendered prefix stays byte-stable across turns and llama.cpp's prefix KV cache survives. (#3528)

@oh-my-pi/pi-catalog

Added

  • Added OpenAICompat.replayReasoningContent — auto-enabled for the built-in local OpenAI-compatible providers (llama.cpp, lm-studio, vllm, ollama on openai-completions) and for any provider pointed at a loopback / RFC1918 / *.local baseUrl. NOT gated on spec.reasoning: the runtime discovery paths for llama.cpp / lm-studio / openai-models-list hardcode reasoning: false because the upstream /models endpoints don't advertise the capability, while the stream parser still records incoming reasoning_content deltas as thinking blocks — gating on the spec flag would leave every discovered local Qwen / DeepSeek model re-triggering #3528. The encoder only writes reasoning_content when a thinking block actually exists on the turn, so the flag is a no-op on pure-text histories. Built-in proxy providers (currently litellm) are excluded from both checks because they forward to an unrelated upstream that gains no KV-cache benefit and may 400 on the extra field; users running a custom proxy in front of a llama.cpp-style backend can opt in via the sparse compat.replayReasoningContent: true override. Signals to the openai-completions encoder that preserved thinking blocks must be re-emitted as reasoning_content on every assistant turn so chat templates that reconstruct <think>…</think> from the field (Qwen3, DeepSeek-R1, GLM-5.x) keep the prior turn's tokens byte-stable and llama.cpp's prefix KV cache survives. (#3528)

@oh-my-pi/pi-coding-agent

Fixed

  • Fixed Windows stdio MCP server launches showing a separate cmd.exe window for direct executable servers; MCP subprocesses now set windowsHide on every Windows spawn path. (#3535)
  • Fixed MCP OAuth still failing with Authorization failed: An unexpected error occurred against Plane's https://mcp.plane.so/http/mcp endpoint after #3502. The protected-resource metadata advertised the path-scoped issuer https://mcp.plane.so/http, but discoverOAuthEndpoints (packages/coding-agent/src/mcp/oauth-discovery.ts) probed /.well-known/oauth-authorization-server at the origin root first and accepted the metadata served there — which describes a different issuer (https://mcp.plane.so/) whose /authorize endpoint rejects every grant. Discovery now honors RFC 8414 §3.3 and skips authorization-server / OpenID Connect well-known documents whose issuer field doesn't match the queried base URL (trailing-slash insensitive). Servers that omit issuer keep today's permissive behavior, so legacy flows are unaffected. (#3537)

What's Changed

  • fix(ai): replay reasoning_content on assistant turns for local llama.cpp servers by @roboomp in #3532
  • fix(mcp): hide Windows stdio server consoles by @roboomp in #3536
  • fix(mcp/oauth): skip discovery metadata with mismatched issuer by @roboomp in #3539

Full Changelog: v16.1.21...v16.1.22

Don't miss a new oh-my-pi release

NewReleases is sending notifications on new releases.