nyldn/claude-octopus v9.40.3 on GitHub

Extract /octo:council benchmark routing helpers into scripts/lib/benchmark-routing.sh and load them through the orchestrator and direct council library usage.
Score council role fit from agents/config.yaml capability and expertise tags before falling back to persona-family heuristics.
Document the v1 MCP/OpenClaw decision as local adapter passthrough rather than a hosted council service.

Surface provider-diversity and chair-fallback council warnings in CLI output, with regression coverage.
Keep fixture-mode critique dispatch consistent with OCTOPUS_COUNCIL_FAIL_PERSONAS, with regression coverage.

[9.40.2] - 2026-05-23

Generate /octo:council synthesis through chair dispatch using response, critique, and revision artifacts instead of writing a static placeholder synthesis.
Re-check /octo:council budget caps before critique, revision, synthesis, and implementation planning so a run stops before the next phase would exceed --max-cost.
Normalize the current BullshitBench v2 upstream CSV schema in scripts/refresh-benchmarks.sh and refresh the checked-in snapshot to 158 model/reasoning rows.
Tighten council veto scanning so incidental critical-veto text does not trigger a critical veto.
Add regression coverage for directory-based skill entries in /octo:doctor.

Fix /octo:doctor skill existence checks for directory-based skills so v9.39+ installs no longer report false missing-skill failures (#414, #415).

Add /octo:council as a configurable multi-LLM council command with command/skill registration, dry-run preflight artifacts, provider status, benchmark metadata, persona-aware roster selection, provider diversity, budget validation, quorum tracking, critical veto handling, and gated implementation handoff metadata.
Add checked-in BullshitBench v2 snapshot data and a refresh script for benchmark-aware council routing.

Honor --timeout for synthesis stages instead of hardcoding 180 seconds, so dense synthesis runs respect the caller's configured timeout (#408, #409).
Let OCTOPUS_AGENT_TIMEOUT override dispatch timeouts unconditionally and treat oversize provider rejections as skipped providers instead of aborting the whole dispatch (#410, #411).

Add Codex marketplace icon metadata and package the SVG asset for marketplace browsers (#385).
Add session-scoped provider availability controls to /octo:model-config so users can disable exhausted providers such as Codex without uninstalling them (#386).

Surface the first provider stderr line in orchestrator logs when a provider command fails, while still preserving the full transcript in the result file (#404).
Align OpenCode model catalog metadata with the current opencode/... namespace (#404).
Replace low-risk ls/read shellcheck findings in orchestrate.sh with safer equivalents (#404).

Patch release covering the issue/PR triage queue after v9.38.0.

Add Mistral Vibe as a first-class provider, including setup/doctor detection, dispatch support, circuit-breaker visibility, and prompt validation (#402).

flow-develop: E2E verification agent now receives the original task description verbatim at prompt-construction time instead of a static generic reference (#398, closes #389)
probe: compact synthesis fallback — bounded context and sanitized failure markers in synthesis (#396)
ink: compact delivery context — bounded delivery bundle, sanitized upstream failure markers (#394)
tangle: fall back to direct execution when decomposition produces no parseable subtasks (#391)
tangle: preserve original task context in subtasks, require explicit disjoint write scopes, and accept root-level files such as Makefile (#390).
tangle: validate explicit file coverage with exact file-token matching and require worktree evidence for implementation tasks (#393).
embrace: stop on missing phase outputs, enforce requested debate gates, and reuse centralized cleanup for YAML runtime completion (#392).
codex: document current non-interactive codex exec usage and include recovered stderr transcripts in result files (#387).
skills: support directory-format Claude skills across marketplace sync, smoke tests, OpenClaw, Codex generation, release validation, and agent skill loading (#397, closes #395).
review publishing: respect explicit PR targets before branch fallback so review comments land on the intended PR (#406, closes #405).
provider defaults: cover OpenCode namespace defaults in regression tests (#403).