github can1357/oh-my-pi v15.13.0

6 hours ago

@oh-my-pi/pi-ai

Fixed

  • Fixed OpenAI Responses/Realtime SSE stream handler crashing with "Error Code undefined: undefined" when parsing error events with nested error details by falling back to the nested error object fields.

  • Fixed OpenAI-compatible providers that reject forced tool_choice on thinking-required models by downgrading unsupported forced choices to auto while keeping tools available (#2546).

  • Fixed GitHub Copilot Anthropic transport (api.githubcopilot.com/v1/messages) returning 400 tools.0.custom.eager_input_streaming: Extra inputs are not permitted on every tool-bearing turn by stopping the emission of the per-tool eager_input_streaming flag and the fine-grained-tool-streaming-2025-05-14 beta header on the Copilot transport — the proxy whitelists neither (#2558).

  • Disabled Bun's native ~300s pre-response fetch timeout in every streaming provider (OpenAI completions/responses, Azure responses, Anthropic, Codex SSE, Bedrock, Gemini CLI, Ollama). The configurable first-event/idle/SDK watchdogs (PI_STREAM_FIRST_EVENT_TIMEOUT_MS, PI_OPENAI_STREAM_IDLE_TIMEOUT_MS, compat.streamIdleTimeoutMs) were silently capped by Bun's hidden ceiling, so cold large-context streams (e.g. self-hosted vLLM at multi-hundred-K prompts) died at exactly 300s with TimeoutError: The operation timed out. Direct callers of ./providers/{amazon-bedrock,google-gemini-cli,ollama,openai-codex-responses} (which bypass register-builtins' iterator-level watchdog) now install a pre-response AbortSignal.timeout(firstEventTimeoutMs) alongside the disable, so a stalled upstream still fails within the configured budget instead of hanging forever (#2422)

  • Fixed Gemini / Antigravity streams (Google Cloud Code Assist API) creating a trailing empty text block and emitting redundant text_start/text_delta/text_end events at the end of the turn when the final SSE chunk contains an empty text part (text: ""). The parser now ignores empty text parts, preserving the active transcript block state and ensuring proper nesting and rendering of subsequent background jobs or new turns.

  • Preserved terminal Google thoughtSignatures by still extracting and applying the signature on the active block even when the text part is empty or undefined.

  • Stopped Gemini Antigravity sessions (gemini-3* / Claude under Cloud Code Assist) from leaking system rule reminders and personality preambles into the final response, by appending an explicit 'do not output rule checks' instruction to the injected system parts.

  • Fixed Gemini / Antigravity streams (Google Cloud Code Assist API) letting a functionCall part's own thoughtSignature clobber the preceding text or thinking block's signature on think → tool and text → tool turns. A signed function-call part has text: undefined, so it fell into the terminal-signature branch while the prior block was still active; that branch now skips function-call parts, leaving the tool call's signature on the tool call where it belongs and preventing corrupted signatures on same-model replay.

  • Fixed MiniMax-M3 OpenAI-compatible streams rendering reasoning twice when the same chunk carried both <think>…</think> content and structured reasoning_content; structured reasoning now wins and cumulative MiniMax reasoning snapshots are collapsed to deltas using a per-signature snapshot tracker that survives the </think>-to-text block transition (so post-answer cumulative snapshots don't reinstate a duplicate thinking block). (#2433)

@oh-my-pi/pi-coding-agent

Breaking Changes

  • Replaced the omp setup stt command with omp setup speech. The old stt setup component is gone (no alias); omp setup speech now provisions the full speech stack — audio recorder, speech-to-text model, and text-to-speech model.
  • Renamed the tts.enabled setting to speechgen.enabled (same boolean, default off; no alias). It still gates the on-demand tts speech-generation tool, now labelled "Speech Generation" in the settings panel.

Added

  • Added paste.largeMenuThreshold setting (0/100/250/500/1000, default 100) to control when large pasted content triggers the large-paste menu or stays as a normal [Paste] marker

  • Added a large-paste editor menu for pasted text over the threshold that lets users choose to wrap the paste in a fenced code block, wrap it in <pasted_text> XML tags, or save it as local://attachment-N for on-demand reading

  • Added snapcompact-savings.jsonl journaling for snapcompact tool-result compaction, recording session, provider, model, tool call, and estimated token savings whenever tool output is rendered as image frames

  • Added subagent:<id> loop-phase breadcrumbs around in-process subagent event dispatch and finalization so the TUI event-loop watchdog can attribute a main-thread stall to subagent execution (#2485)

  • highlightMagicKeywords(text, resetTo?, phase?) now accepts an optional phase ∈ [0, 1) that rotates the gradient cyclically; sent bubbles omit it (static palette unchanged). hasMagicKeyword(text) exported from modes/magic-keywords is the cheap shimmer-gate the editor uses on every render.

  • Added a fastModeScope setting (both | openai | claude, default both) controlling which providers /fast on (and the fast-mode toggle) target. both keeps the prior unscoped priority behavior; openai/claude scope fast mode to one family. /fast status now reports the active scope.

  • Added the mnemopi.embeddingVariant setting (en | multilingual) selecting a stronger SOTA local embedding model — enBAAI/bge-base-en-v1.5 (768d), multilingualintfloat/multilingual-e5-large (1024d). Resolution precedence is mnemopi.embeddingModel setting > MNEMOPI_EMBEDDING_MODEL env > variant default, so the documented env override is still honored. Changing the active model wipes and rebuilds stored embeddings on the next writable start (#2476)

  • Added a /guided-goal slash command that interviews you to refine an objective before enabling goal mode, then seeds goal mode with the agreed objective. The bounded interview (up to six turns) runs on the plan or slow model and falls back with a hint when the goal is still too vague (#2502).

  • Added a large-paste menu: when a paste reaches paste.largeMenuThreshold lines (default 100; 0 disables), the editor offers to wrap it in a code block, wrap it in <pasted_text> XML tags (both collapse to a [Paste] marker that expands on submit), or save it to the session's local:// store and insert a clean local://attachment-N reference the agent can read on demand. Esc keeps the previous inline-paste behavior, so the content is never lost.

  • Added 8on22-bw (leading) and 11on16-bw (tracking) options to the snapcompact.shape setting, the spacing-tuned cells that are now the per-provider defaults (Anthropic → tracking, OpenAI/Google → leading)

  • Added a local on-device neural TTS backend for the tts tool and a providers.tts switch (auto | local | xai, default auto). local synthesizes speech with Kokoro-82M — SoTA on-device TTS quality — via kokoro-js on the shared ONNX runtime (@huggingface/transformers + onnxruntime-node) in a subprocess worker (mirroring the tiny-model worker), keeping the model warm across calls and emitting 24 kHz WAV/PCM16 with no network call; xai keeps the existing Grok Voice cloud path; auto prefers local but routes .mp3 requests to xAI when credentials exist (no local MP3 encoder is bundled, so a local .mp3 request is written as a sibling .wav). kokoro-js is never a hard dependency: it is lazily bun installed into a version-keyed runtime dir on first use (with onnxruntime-node force-pinned to a Bun-safe version), so its transformers@3.x graph never pollutes the main tree. New tts.localModel (default kokoro) and tts.localVoice (default af_heart; American/British, female/male voices) settings select the on-device voice.

  • Added a unified, interactive omp setup speech command that walks one reusable flow across all three speech dependencies: it lets you pick and persist the speech-to-text (stt.modelName) and text-to-speech (tts.localModel) models from a TUI list, then downloads both models plus an audio recorder with live progress. The recorder is now auto-provisioned cross-platform (a static ffmpeg binary is fetched via the shared tools-manager when no SoX/FFmpeg/arecord is present, with the PowerShell fallback on Windows) instead of dead-ending with "install sox manually". --check and --json report recorder + STT-model + TTS-model readiness without installing.

  • Added omp say <text>, which synthesizes text with the local on-device TTS engine and plays it through the speakers (cross-platform: afplay on macOS, paplay/aplay/bundled ffmpeg on Linux, PowerShell Media.SoundPlayer on Windows). --out <file> writes a WAV instead of playing, --voice/--model override the tts.localVoice/tts.localModel settings, and an uninstalled model prints an actionable omp setup speech hint.

  • Added streaming speech vocalization: with speech.enabled on, the assistant speaks its reply through the speakers as it streams. Assistant text deltas are fed directly into the engine's incremental text input (Kokoro's TextSplitterStream via the worker) as they arrive, rather than pre-chunked in JS and synthesized one batch call per sentence — the engine owns sentence segmentation and emits one audio chunk per sentence. A single persistent player (StreamingAudioPlayer) drains those chunks gaplessly (raw 32-bit-float PCM piped to one ffmpeg→PulseAudio/ALSA process on Linux; interruptible per-file afplay/PowerShell SoundPlayer on macOS/Windows), replacing the spawn-a-player-per-sentence path that added latency and audible gaps. Overspeech is handled end to end: a new turn, a sent message, or an Esc/Ctrl+C interrupt stops playback instantly (the player process is killed rather than letting the current sentence finish); holding the push-to-talk key ducks the volume while you speak and restores it when you stop; and sequential utterances queue and drain in order instead of overlapping. speech.mode (all | assistant | yield, default assistant) picks what is spoken — all adds thinking, yield speaks only the final message at turn end — and speech.voice selects the Kokoro voice. ask-tool questions are spoken in every mode. Synthesis reuses the local Kokoro engine (tts.localModel) through a new streaming synthesis path (TtsClient.synthesizeStream) that pushes text in and streams audio chunks back over the worker protocol.

  • Added live (streaming) speech-to-text: with stt.enabled on, transcription now appears in the composer as you speak instead of all at once after you stop. The recorder streams raw 16 kHz mono PCM from sox/ffmpeg/arecord stdout to the warm STT worker, where an energy-based endpointer (no extra model) splits speech into segments at natural pauses; each finalized segment is committed into the editor while the in-progress segment shows a live volatile preview that refreshes in place and is kept out of the undo history. Works with both the default Parakeet (sherpa-onnx) and the Whisper (transformers.js) tiers. Recorders that cannot stream to a pipe (the Windows PowerShell mci fallback) transparently fall back to single-shot transcription.

  • Added skills.enableAgentsUser and skills.enableAgentsProject settings (default on) so the canonical OMP-native ~/.agent[s]/skills and project-walkup .agent[s]/skills are configurable independently from the third-party Claude/Codex/Pi toggles.

  • Added a read-only ctx.models facade for extensions: list() (authenticated models), current() (live session model), resolve(spec) (a model string or role alias → Model, using the same settings-backed aliases and match preferences as core selection), and family(model) (opaque canonical-identity lineage token for cross-family comparisons). Lets extension tools select models the way core does without reaching into the mutable registry (#2406)

  • Added RPC prompt lifecycle hints so hosts can distinguish scheduled agent turns from local-only slash commands via data.agentInvoked and prompt_result.

  • Added extension lifecycle events for tool approval prompts: tool_approval_requested before the approval wait and tool_approval_resolved after approve, deny, or approval prompt failure.

Changed

  • Changed handoff custom messages (customType: "handoff") to render in the transcript as a compaction-style expandable divider in both the main session and Agent Hub views, and expanded handoff details now show the handoff context body without <handoff-context> tags

  • Changed the double-tap-← gesture (empty editor, main session) to stay inert when there are no subagents to show, instead of opening an empty Agent Hub roster. The explicit Agent Hub / observe keybindings still open the empty roster. The gating reuses the hub's own row count (after its persisted-subagent scan), so it matches exactly what the hub would display.

  • Changed the job tool's async.pollWaitDuration setting (relabeled Max Poll Time) to add a smart value, now the default. A fixed value (5s5m) still blocks for exactly that long; smart adapts: a blocking poll starts at a 5s floor and climbs a ladder (5s → 10s → 30s → 1m → 5m) with each back-to-back poll, so a tight poll loop backs off and stops spending turns on "still running" frames, then resets to the 5s floor after ~1 minute without polling (i.e. when the agent steps away to do real work). Escalation is tracked per agent (owner-scoped on AsyncJobManager).

  • Added the compat.supportsForcedToolChoice custom-model flag for OpenAI-compatible models whose endpoints accept tools but reject forced tool_choice values (#2546).

  • Changed speech-to-text to run fully local on-device with a tiered, multi-engine model picker. Transcription runs in a subprocess worker (mirroring the tiny-model worker; the native ONNX addons are hard-killed on shutdown to dodge the Bun NAPI-finalizer segfault) instead of shelling out to Python openai-whisper, keeps the model warm across recordings, and decodes WAV to 16 kHz mono float32 in-process. stt.modelName now selects on-device tiers across two engines: parakeet (default) — NVIDIA Parakeet TDT 0.6B v3 (25 languages) via the native sherpa-onnx-node, the Open ASR Leaderboard accuracy + throughput leader (lower WER than, and ~20× faster decoding than, Whisper large-v3) — plus fast/balanced/turbo mapping to Whisper base/small/large-v3-turbo (multilingual, up to 99 languages) via @huggingface/transformers. omp setup speech no longer mentions pip/python-whisper and reports recorder + model-cache readiness.

  • Changed the speech-to-text trigger from the Alt+H keybinding to a hold-Space push-to-talk gesture. Holding the space bar emits an OS auto-repeat burst; once more than 5 spaces land in the editor it recognizes the hold, deletes (tracks back) those inserted spaces, and starts recording, then stops and transcribes when the repeats stop (the space bar is released). app.stt.toggle is now unbound by default but can be rebound to a chord for press-to-toggle; the gesture is gated on stt.enabled, and Shift+Space still inserts a literal space.

  • task.eager ("Prefer Task Delegation") and todo.eager ("Create Todos Automatically") are now three-level enums (default / preferred / always) instead of booleans. For todo.eager, preferred renders a soft first-message reminder while always forces the todo tool (the previous "on" behavior); for task.eager, preferred adds a soft (SHOULD) delegation nudge to the system prompt while always uses hard (MUST/ONLY) wording plus a first-turn delegation reminder. Existing boolean configs migrate automatically (true → always, false → default). On models that cannot be forced to call todo, todo.eager: "always" now emits the first-turn reminder without forcing the call (previously such models received nothing) (#2539, #2540 by @metaphorics).

  • Fixed /model-switching to a non-default OpenRouter model returning 404 No route: POST /chat/completions when the provider is routed through the auth-gateway broker. The background catalog refresh re-ran mergeDiscoveredModel on every openrouter entry; for models whose bundled record already existed, the merge re-applied baseUrl, headers, and compat but dropped transport: pi-native because the raw /v1/models payload carries no transport hint. The next /model switch then picked the now-transport-less entry and routed through the default openai-completions client to ${baseUrl}/chat/completions — a path the auth-gateway never serves. mergeDiscoveredModel now propagates the override/existing transport on the rediscovery branch (#2555).

Fixed

  • Fixed npm plugin installs to reject packages whose declared extension entry points cannot load because imports or nested dependencies are unresolved (#2312).

  • Fixed the deferred MCP discovery banner (Connecting to MCP servers: …) overdrawing the chat input bar. onMCPConnecting wrote the banner straight to process.stderr while the TUI owned the terminal; it now emits on an mcp:connecting event channel that InteractiveMode renders through showStatus (the status container), mirroring the existing LSP-startup pattern so the banner can never paint over the input box border (#2483)

  • Fixed Win+Shift+S screenshot paste on Windows dead-ending on Image not found: Windows Terminal forwards a bracketed paste of the Snipping Tool's transient …\MicrosoftWindows.Client.Core_*\TempState\… file path, which is already gone (or never materialized) by the time omp reads it, so handleImagePathPaste failed with ENOENT even though the screenshot bitmap was still on the clipboard. The handler now falls back to the clipboard image across every local read failure (missing file, undecodable file, or generic error) before degrading, and the fallback is skipped over SSH where the clipboard lives on the remote host rather than the terminal holding the screenshot.

  • Fixed npm prebuilt extension compatibility shims deriving their own package root through bare @oh-my-pi/pi-coding-agent resolution, which could select an older Bun cache copy in global installs and reintroduce mixed-runtime plugin loading stack overflows.

  • Honor the context_length reported by OpenAI-compatible /v1/models discovery (discovery: { type: "proxy" } and discovery: { type: "openai-models-list" }) when present, so aggregator-reported windows override the stale bundled reference; the value is validated through the positive-number guard so a 0/negative/stringly-typed upstream value cleanly falls back to the bundled reference (then 128000) instead of pinning a broken window (#2466 by @androw).

  • Fixed web_search SearXNG fallback when HTTP 200 responses contain no usable results plus unresponsive_engines; SearXNG now raises a transient provider error, and the fallback loop rejects any provider response with no renderable content before formatting an invisible success (#2571).

  • Fixed read on a GitHub commit URL (github.com/<owner>/<repo>/commit/<sha>) returning the raw commit HTML page instead of structured content. parseGitHubUrl had no commit case, so commit URLs fell through to generic HTML rendering; they now resolve via the commits API and render as markdown (subject, author, stats, parents, full commit message, and a per-file unified diff), matching the existing blob/tree/issue/PR handling.

  • Fixed release runs being silently cancelled by a later main push, which left tagged versions (v15.12.6 in the wild) without a GitHub Release or npm publish. The CI workflow's concurrency group was ${{ github.workflow }}-${{ github.ref }}, and since the release-script commit + v* tag are pushed atomically to refs/heads/main, the release run shared the CI-refs/heads/main group with every subsequent push; cancel-in-progress: true then killed it before release_binary / release_github / release_npm could run, and no future run carried the release tag at HEAD. The group now resolves to a per-sha release-<sha> slot with cancel-in-progress: false whenever the push subject matches chore: bump version to (the release-script convention) or github.ref is a v* tag (workflow_dispatch recovery), so release runs are isolated from PR/main churn (#2564).

  • Allowed compat.streamIdleTimeoutMs: 0 in models.yml. The schema was .positive(), so the documented "set to 0 to disable" escape hatch was only reachable via the global env var (#2422)

  • Fixed LaTeX math delimiters ($/$$) and commands (such as \text, \times) rendering raw in the terminal by instructing the model to write equations as plain text / Unicode in its replies. The instruction is scoped to conversational output, so it does not constrain LaTeX or Markdown/KaTeX content the agent is asked to write to files (#2550 by @usr-bin-roygbiv).

  • Fixed the MCP stdio transport close() potentially hanging while awaiting a read loop that can block indefinitely; teardown now detaches the read loop instead of awaiting it (#2550 by @usr-bin-roygbiv).

  • Fixed per-project memory isolation pulling other projects' memories into recall. Legacy (pre-#2412) mnemopi banks were rescued into recall when any working-memory row tagged the active cwd, but recall reads a bank wholesale and cannot filter rows by cwd, so a mixed-cwd bank leaked unrelated projects' memories. A legacy bank is now rescued only when every row tags the active cwd (#2412).

  • Fixed a prompt template whose name collides with a builtin slash-command alias (e.g. a models template beside the builtin /model, which owns the models alias) appearing as a duplicate entry in the slash-command autocomplete picker. The picker's reserved-name filter now also excludes command aliases, matching runtime resolution (slash commands expand before prompt templates, so the template was already unreachable).

  • Fixed the tool-result renderer re-shaping on every invalidate() (spinner tick, stream chunk, resize, keystroke), which made large grep/find/read results block the main thread for seconds and made typing sluggish. ToolExecutionComponent.#updateDisplay() now memoizes on a dirty key (result version, expand state, partial flag, spinner frame, image visibility, theme epoch, background-task freeze state, the resolved terminal image protocol, and a display-input version that covers streamed call args, the async edit-diff preview, and Kitty image conversions) and a #displayBuilt guard that also fast-paths the #contentText fallback, so the O(result-size) shaping runs once per change instead of every frame without freezing streamed args, previews, converted images, a backgrounded task settling to its static form, or images that arrive before the async image-protocol probe resolves. Image-bearing results also re-shape on terminal resize (keyed on the resolved image dimensions only when images are present) so inline images rescale, while image-free results never re-shape on resize (#2484)

  • Fixed setTheme() not bumping the theme epoch on its invalid-theme fallback path: a failed theme load swaps the active theme to the dark fallback, so memoized renderers (the tool-result renderer above) must re-shape — previously they kept the failed theme's stale colors until some other state changed (#2484)

  • Fixed tool-call spinners animating out of phase across parallel tool calls — each live tool block advanced its glyph from its own per-instance start time, so concurrent spinners showed different frames. Glyphs now derive from a single shared monotonic clock (sharedSpinnerFrame), keeping every live block in lockstep.

  • Fixed the per-turn token-usage row (display.showTokenUsage) churning and duplicating in scrollback, most visibly with parallel tool calls. The row was rendered inside the assistant block above the turn's tool blocks, so finalizing the block was deferred and late appends recommitted the already-committed tool rows. The assistant block now always finalizes as soon as a tool-call appears, and the usage row is emitted as a standalone finalized block below the turn's tool blocks across all three render paths (live, transcript rebuild, agent-hub).

  • Fixed the editor input box claiming a disproportionate share of small terminals (<=18 rows): the editor max-height floor (6 rows) ignored the available space. The height now yields to terminal size (EDITOR_MIN_CHROME_ROWS), reserving rows for the transcript and status line whenever the terminal can host both; on terminals too small for both, the editor collapses to its real bordered minimum instead of overshooting a fictitious cap.

  • Fixed ultrathink / orchestrate / workflowz keywords not glowing while typing — the editor appended a CURSOR_MARKER (ESC-prefixed) after each render, so the magic-keyword regex's right-boundary (?!\S) tripped on ESC and dropped the gradient until a trailing character was typed. The marker-aware editor decorate (packages/tui) plus a phase-aware highlightMagicKeywords overload restore the live glow and add a Claude-Code-style shimmer while the prompt is focused, gated on magicKeywords.enabled (#2475).

  • Fixed the compaction flow (/compact and plan-mode "Approve and compact context") leaving UI artifacts. executeCompaction added a Spacer(1) to the transcript that the sibling handoff path never adds and that leaked as an orphan blank line whenever compaction was cancelled or failed; that spacer is removed. On success the compaction loader is now stopped and the status container cleared before the transcript is rebuilt, so the live loader row no longer flickers over the reconciled transcript near the status seam (the idempotent finally still covers the cancel/fail paths) (#2486)

  • Fixed plan approval's "Approve and compact context" running the compaction summarizer on the pre-plan model instead of the plan model, cold-missing the plan model's prompt cache. Compaction now runs on the plan model (warm cache); the switch to the execution/pre-plan model happens only after a successful compaction and before any input queued during compaction is dispatched, so the queued turn runs on the post-compaction model. A cancelled compaction now also restores the pre-plan model (it previously stranded the session on the plan model), while a failed compaction stays on the plan model with its context intact.

  • Fixed Alt+Up (dequeue) reporting "No queued messages to restore" for messages — including skills — typed while the session was compacting. restoreQueuedMessagesToEditor now drains compactionQueuedMessages alongside the agent queue, so the Alt+Up to edit hint restores every pending message it advertises.

  • Fixed restoreQueuedMessagesToEditor (Alt+Up dequeue and Esc-abort) producing colliding [Image #N] markers when the editor draft already held pending image(s): queued text was prepended but queued images were appended, so positional marker → image lookup at submit time resolved to the wrong image. Each queued message's image markers are now renumbered by the running pending-image count before merge so the combined text stays aligned with the merged pendingImages order (#2531).

  • Fixed AgentBusyError ("Agent is already processing. Use steer() or followUp()...") surfacing on mode transitions — as Failed to finalize approved plan: ... when a plan was approved while the agent was still streaming the post-resolve continuation (or a turn started by the approve-time compaction/clear), and as an error toast when a loop auto-submit or goal continuation fired during a streaming/compaction race. Plan approval now aborts any in-flight turn before dispatching the executor's first prompt, and submitInteractiveInput routes streaming-time loop, goal-continuation, and manual submissions through the follow-up queue (streamingBehavior: "followUp") instead of throwing (synthetic continue-shortcuts stay developer-attributed and keep their prior behavior). Extends the manual-/goal fix in #2454 to the continuation and plan-approval paths.

  • Fixed /plan cycling between plan and plan_paused with no path back to mode none. handlePlanModeCommand had branches for entering and pausing but fell through to #enterPlanMode() when invoked from the paused state, so once a session entered plan mode the only operator-visible toggle re-entered it. The handler now matches planModePaused and fully exits — clearing planModeHasEntered and appending a mode_change to "none" — so /goal (and any other mode gated on planModeEnabled || planModePaused) can run again after a third /plan (#2510).

  • Fixed HTML session export rendering empty text tokens (text, userMessageText, customMessageText, toolTitle) as the dark-theme grey #e5e5e7 on every theme not literally named light, making transcripts illegible on custom light themes like sandstone, limestone, and porcelain. getResolvedThemeColors and the standalone isLightTheme helper now classify against the resolved statusLineBg luminance (the same surface Theme.isLight uses), so the HTML defaultText falls back to #000000 on light themes and the standalone helper stays in lockstep with Theme.isLight (#2516).

  • Fixed the Agent Hub opening on its own from a stray mouse click. The double-tap-← gesture (empty editor) fired on any two left keys within 500ms, but terminals with "click to move cursor" / pointer features (iTerm2 option-click, WezTerm, kitty, tmux) synthesize a burst of arrow keys on click — delivered sub-millisecond apart in one stdin read — so a single click could pop the hub with no key ever pressed. The gesture now requires the second tap to land a human-plausible interval after the first (≥40ms, <500ms) and ignores any third-or-later rapid tap, so synthesized bursts are rejected while a deliberate double-tap still works. The same hardening applies to the focused-subagent ←← "return to main" gesture.

  • Fixed the ctrl+p model-role cycle indicator (the default / gpt / fable / … chip track) stacking duplicate copies in the scrollback when other chat activity landed between two cycles. The track was emitted through showStatus, whose back-to-back coalescing only merges when the previous status is still the last transcript child; any interleaved append broke that identity check and appended a second track. It now renders into a dedicated anchored container above the editor (cleared and rebuilt in place each cycle, like the Todos HUD) and auto-clears after a short linger, so rapid presses or concurrent activity can never duplicate it.

  • Fixed the Working… loader vanishing for the rest of a turn after an auto-compaction (context-overflow recovery) or auto-retry. Those overlays took over the shared status container with a bare statusContainer.clear(), which detached the working loader but left loadingAnimation set; the resumed turn's agent_startensureLoadingAnimation() is guarded by if (!this.loadingAnimation), so it skipped re-attaching the loader and the spinner stayed gone while the agent kept streaming. The overlay handlers now fully tear the working loader down (stop + dereference) via #stopWorkingLoader(), so the next agent_start recreates and re-attaches it.

  • Fixed JS eval helper optional arguments rejecting Python-style positional calls. read(path, offset, limit) now works alongside read(path, { offset, limit }), null/undefined skip optional positional slots, and non-local URI reads such as artifact://... delegate through the read tool so line slicing works on spilled artifacts.

  • Fixed legacy extension compatibility remapping for @*-pi-ai/utils/oauth imports so background workers load the relocated @oh-my-pi/pi-ai/oauth exports instead of resolving missing src/utils/oauth/* files (#2566).

  • Fixed eager todo initialization prompting GPT-5.5 to emit unsupported task metadata fields, which could leave fresh sessions stuck on the forced first todo call (#2561).

  • Fixed Windows plan-mode task fan-out crashing the TUI when nested async task progress formed a cycle; task rendering now cuts recursive snapshots and long Windows local:// roots are shortened under temp storage (#2551).

  • Fixed a band of streaming assistant output being lost from native scrollback — committed nowhere, repainted nowhere — once a reply grew taller than the viewport. Markdown whose layout keeps changing above the streaming tail (most visibly a table whose columns re-align as rows arrive) never earns a byte-stable commit-safe end, so as its head scrolled above the window the rows fell into the gap between the commit boundary and the window top and vanished. TranscriptContainer now reports a getNativeScrollbackSnapshotSafeEnd() for commit-stable live blocks (their whole body is durable content), and the renderer commits those scrolled-off rows audit-exempt — a later layout change of an already-committed row freezes a slightly-stale row in scrollback (duplication never loss) instead of dropping it. Provisional blocks (collapsing tool/edit previews) are unaffected.

  • Fixed unknown ---prefixed flags being silently consumed as prompt text, which let a stale or typoed flag start a real agent session (connecting to MCP servers, waiting on the model) instead of failing fast. parseArgs now tracks unrecognized flag-shaped tokens and runRootCommand calls reportUnrecognizedFlags immediately after the post-extension reparse, exiting 2 with Error: unknown flag: --… before any session, MCP, or initial-message work runs. Extension-registered flags still pass cleanly since the validation runs after the extension-aware reparse, and -- is honored as a POSIX positional separator so flag-shaped prompts (omp -p -- --explain-this) survive the new guard (#2459).

  • Fixed /goal <objective> and /goal set <objective> during streaming so goal context is steered immediately but objective submission waits for the active turn to finish instead of spamming AgentBusyError. The interactive goal-continuation timer is now streaming-aware too: if a turn starts inside the 800 ms idle window the timer was scheduled in, it drops the tick instead of submitting a stale goal-continuation that would resurface the same AgentBusyError; the next agent_end reschedules (#2454).

  • Fixed ~/.agent[s]/skills not appearing as /skill:<name> commands when every named source toggle (skills.enableCodexUser, skills.enableClaudeUser, skills.enableClaudeProject, skills.enablePiUser, skills.enablePiProject) was off: loadSkills gated the agents provider on anyBuiltInSkillSourceEnabled, so a user who turned off the Claude/Codex/Pi sources to clean noise also lost their own canonical OMP-native skills. The agents provider now reads the dedicated enableAgentsUser/enableAgentsProject toggles, and the unknown-third-party fall-through gate is restricted to the named third-party toggles so keeping the default agents toggles on no longer silently re-enables opencode/github/claude-plugins/gemini skill sources (#2401).

  • Fixed Claude Code marketplace plugin skills installed under skills/<name>/SKILL.md to also appear as bare slash commands such as /understand, matching Claude-native plugin docs. The slash command name is taken from the skill directory basename so display-style frontmatter names like name: Understand Anything still resolve to /understand (#2415).

  • Fixed ACP /move builtin test expectations to compare the resolved destination path so the test is portable on Windows and Unix (#2381 by @oldschoola).

  • Fixed vLLM discovery so providers.vllm.baseUrl drives the built-in endpoint, additional OpenAI-compatible vLLM provider IDs work through openai-models-list, and discovered max_model_len or fallback context_length values set context windows instead of falling back to 128k.

  • Fixed extension discovery ignoring package directories symlinked into an extensions/ directory.

Removed

  • Removed the Python openai-whisper dependency and pip install path from speech-to-text — the bundled transcribe.py and all Python/whisper probes in omp setup speech are gone; the recorder (SoX/FFmpeg/arecord) remains the only external tool.
  • Changed the speech-to-text trigger from the Alt+H keybinding to a hold-Space push-to-talk gesture. Holding the space bar emits an OS auto-repeat burst; once more than 10 spaces land in the editor it recognizes the hold, deletes (tracks back) those inserted spaces, and starts recording, then stops and transcribes when the repeats stop (the space bar is released). app.stt.toggle is now unbound by default but can be rebound to a chord for press-to-toggle; the gesture is gated on stt.enabled, and Shift+Space still inserts a literal space.
  • Added an experimental, opt-in auto-learn loop (default off, autolearn.enabled). When enabled, after the agent stops it is nudged to capture reusable lessons: durable facts go to long-term memory and repeatable procedures become managed skillsSKILL.md files written to an isolated ~/.omp/agent/managed-skills directory that is discovered and surfaced like authored skills but never overwrites user-authored skills (authored names always win). Two tools back this: manage_skill (create/update/delete managed skills) and learn (record a lesson, optionally minting a managed skill in the same call). learn works with the hindsight, mnemopi, or file-based local memory backend; under local, lessons append to a learned.md in the project's memory root (kept separate from the consolidation artifacts so they survive a consolidation pass) and are injected into future sessions. The nudge is passive by default (a hidden reminder rides the next turn); autolearn.autoContinue instead auto-runs one capture turn at stop, and autolearn.minToolCalls (default 5) gates trivial turns. Plan/goal-mode turns and subagents are never nudged.
  • Fixed /plan cycling between plan and plan_paused with no path back to mode none, while preserving prompted paused-mode requests. The no-arg third toggle now fully exits — clearing planModeHasEntered and appending a mode_change to "none" — and /plan <prompt> from plan_paused re-enters plan mode and submits the prompt as the first turn (#2510).
  • Fixed HTML session export rendering empty text tokens (text, userMessageText, customMessageText, toolTitle) as the dark-theme grey #e5e5e7 on every theme not literally named light, making transcripts illegible on custom light themes like sandstone, limestone, and porcelain. isLightTheme now classifies against the resolved statusLineBg luminance (the same surface Theme.isLight uses), while HTML defaultText contrasts the actual export surface (export.cardBg / export.pageBg / derived userMessageBg) so light-status themes with dark export cards keep light transcript text (#2516).

What's Changed

  • fix(coding-agent): restore compaction-queued messages on Alt+Up dequeue by @metaphorics in #2530
  • fix(coding-agent): renumber image markers when restoring queued messages by @roboomp in #2533
  • feat(task): inline role specialization for spawned subagents by @metaphorics in #2478
  • feat(task): advertise role and tailored delegation in task prompts by @metaphorics in #2479
  • feat(task): nudge generic spawns toward specialization by @metaphorics in #2480
  • feat(irc): work-aware peer roster (role + current activity) by @metaphorics in #2481
  • feat(tui): add event-loop watchdog and phase breadcrumbs for stall diagnosis by @metaphorics in #2488
  • fix(coding-agent): memoize tool-result rendering to stop per-frame re-shape stalls by @metaphorics in #2489
  • fix(coding-agent): clean up compaction UI lifecycle by @metaphorics in #2490
  • feat(coding-agent): add guided-goal setup interview by @metaphorics in #2502
  • feat(tui): support ctrl+j as an additional newline key by @metaphorics in #2504
  • fix(magic-keywords): glow keywords through the cursor seam and animate a focused shimmer by @roboomp in #2506
  • feat(mnemopi): add embedding variant setting (English / multilingual) by @metaphorics in #2508
  • fix(native): handle Windows worker pressure and OpenCode MiMo tool choice by @roboomp in #2511
  • fix(plan): /plan exits paused state back to mode 'none' by @roboomp in #2513
  • fix(theme): classify HTML export defaults by status-line luminance by @roboomp in #2519
  • fix(coding-agent): run plan-approval compaction on the plan model by @metaphorics in #2520
  • fix(tui): clamp editor height and overlays on small terminals by @metaphorics in #2521
  • feat(coding-agent): make /fast default scope configurable by @metaphorics in #2522
  • fix: stop AgentBusyError on plan approval and loop/goal continuations by @metaphorics in #2524
  • fix: phase-lock tool-call spinners across parallel calls by @metaphorics in #2526
  • fix: emit per-turn token-usage row as a standalone block by @metaphorics in #2528
  • fix(tui): detect tmux/screen via TERM prefix when TMUX env is stripped by @roboomp in #2545
  • Fix resolve gate for OpenCode Go Kimi K2.7 Code by @roboomp in #2547
  • fix(ai): disable Bun native stream fetch timeout by @roboomp in #2428
  • fix(coding-agent): guard Windows task fan-out render cycles by @roboomp in #2552
  • fix(anthropic): drop eager_input_streaming on github-copilot transport by @roboomp in #2559
  • fix(natives): clean stale native cache versions by @roboomp in #2562
  • fix(agent): correct eager todo init prompt by @roboomp in #2563
  • fix(ci): scope release runs to per-sha concurrency group by @roboomp in #2565
  • fix(coding-agent): remap legacy OAuth imports by @roboomp in #2568
  • fix(cli): expose Claude plugin skills as slash commands by @roboomp in #2417
  • fix(plan-mode): allow hashline plan edits by @roboomp in #2477
  • fix(providers/google): avoid creating empty text blocks on stream ending by @usr-bin-roygbiv in #2538
  • feat(coding-agent): three-level eagerness enum for task.eager and todo.eager by @metaphorics in #2540
  • feat(coding-agent): opt-in experimental auto-learn (memory + isolated managed skills) by @metaphorics in #2542
  • fix(providers/google): avoid outputting system rules and preambles in Antigravity by @usr-bin-roygbiv in #2543
  • fix(agent-loop): detect repetition loops during assistant stream and abort gracefully by @usr-bin-roygbiv in #2549
  • fix(prompts): instruct models to avoid LaTeX math delimiters and commands by @usr-bin-roygbiv in #2550
  • fix(coding-agent): keep transport: pi-native on openrouter after catalog refresh (#2555) by @roboomp in #2556
  • fix: handle nested error object in openai responses stream by @usr-bin-roygbiv in #2570
  • fix(web-search): fall back on empty SearXNG results by @roboomp in #2572
  • fix(natives): avoid linux addon load hang by @roboomp in #2554
  • fix(mnemopi): rank exact memories above weak facts by @DarkPhilosophy in #2452
  • fix(cli): reject unloadable npm plugin installs by @roboomp in #2313
  • feat(coding-agent): emit tool approval lifecycle events by @381181295 in #2444
  • fix(catalog): pin MiniMax-M3 context window by @roboomp in #2578
  • fix(coding-agent): avoid placeholder crash before theme init by @camopy in #2581

New Contributors

Full Changelog: v15.12.6...v15.13.0

Don't miss a new oh-my-pi release

NewReleases is sending notifications on new releases.