Changed
- Extract
/octo:councilbenchmark routing helpers intoscripts/lib/benchmark-routing.shand load them through the orchestrator and direct council library usage. - Score council role fit from
agents/config.yamlcapability and expertise tags before falling back to persona-family heuristics. - Document the v1 MCP/OpenClaw decision as local adapter passthrough rather than a hosted council service.
Fixed
- Surface provider-diversity and chair-fallback council warnings in CLI output, with regression coverage.
- Keep fixture-mode critique dispatch consistent with
OCTOPUS_COUNCIL_FAIL_PERSONAS, with regression coverage.
[9.40.2] - 2026-05-23
Fixed
- Generate
/octo:councilsynthesis through chair dispatch using response, critique, and revision artifacts instead of writing a static placeholder synthesis. - Re-check
/octo:councilbudget caps before critique, revision, synthesis, and implementation planning so a run stops before the next phase would exceed--max-cost. - Normalize the current BullshitBench v2 upstream CSV schema in
scripts/refresh-benchmarks.shand refresh the checked-in snapshot to 158 model/reasoning rows. - Tighten council veto scanning so incidental
critical-vetotext does not trigger a critical veto. - Add regression coverage for directory-based skill entries in
/octo:doctor.
[9.40.1] - 2026-05-23
Fixed
- Fix
/octo:doctorskill existence checks for directory-based skills so v9.39+ installs no longer report false missing-skill failures (#414, #415).
[9.40.0] - 2026-05-22
Added
- Add
/octo:councilas a configurable multi-LLM council command with command/skill registration, dry-run preflight artifacts, provider status, benchmark metadata, persona-aware roster selection, provider diversity, budget validation, quorum tracking, critical veto handling, and gated implementation handoff metadata. - Add checked-in BullshitBench v2 snapshot data and a refresh script for benchmark-aware council routing.
[9.39.1] - 2026-05-22
Fixed
- Honor
--timeoutfor synthesis stages instead of hardcoding 180 seconds, so dense synthesis runs respect the caller's configured timeout (#408, #409). - Let
OCTOPUS_AGENT_TIMEOUToverride dispatch timeouts unconditionally and treat oversize provider rejections as skipped providers instead of aborting the whole dispatch (#410, #411).
[9.39.0] - 2026-05-21
Added
- Add Codex marketplace icon metadata and package the SVG asset for marketplace browsers (#385).
- Add session-scoped provider availability controls to
/octo:model-configso users can disable exhausted providers such as Codex without uninstalling them (#386).
Fixed
- Surface the first provider stderr line in orchestrator logs when a provider command fails, while still preserving the full transcript in the result file (#404).
- Align OpenCode model catalog metadata with the current
opencode/...namespace (#404). - Replace low-risk
ls/readshellcheck findings inorchestrate.shwith safer equivalents (#404).
[9.38.1] - 2026-05-21
Patch release covering the issue/PR triage queue after v9.38.0.
Added
- Add Mistral Vibe as a first-class provider, including setup/doctor detection, dispatch support, circuit-breaker visibility, and prompt validation (#402).
Fixed
- flow-develop: E2E verification agent now receives the original task description verbatim at prompt-construction time instead of a static generic reference (#398, closes #389)
- probe: compact synthesis fallback — bounded context and sanitized failure markers in synthesis (#396)
- ink: compact delivery context — bounded delivery bundle, sanitized upstream failure markers (#394)
- tangle: fall back to direct execution when decomposition produces no parseable subtasks (#391)
- tangle: preserve original task context in subtasks, require explicit disjoint write scopes, and accept root-level files such as
Makefile(#390). - tangle: validate explicit file coverage with exact file-token matching and require worktree evidence for implementation tasks (#393).
- embrace: stop on missing phase outputs, enforce requested debate gates, and reuse centralized cleanup for YAML runtime completion (#392).
- codex: document current non-interactive
codex execusage and include recovered stderr transcripts in result files (#387). - skills: support directory-format Claude skills across marketplace sync, smoke tests, OpenClaw, Codex generation, release validation, and agent skill loading (#397, closes #395).
- review publishing: respect explicit PR targets before branch fallback so review comments land on the intended PR (#406, closes #405).
- provider defaults: cover OpenCode namespace defaults in regression tests (#403).