nyldn/claude-octopus v8.25.0 on GitHub

Dark Factory Mode (closes #37)

Spec-in, software-out autonomous pipeline. Feed it an NLSpec, get working software back — with blind holdout testing to verify quality.

New Command

/octo:factory --spec spec.md

Pipeline

Parse Spec — Extract behaviors, constraints, satisfaction target from NLSpec
Generate Scenarios — Multi-provider test scenario generation (Codex + Gemini)
Split Holdout — 80/20 split with behavior-diverse holdout selection
Embrace Workflow — Full 4-phase implementation (discover → define → develop → deliver) in autonomous mode
Holdout Tests — Blind cross-model evaluation against withheld scenarios
Satisfaction Scoring — Weighted 4-dimension assessment
Report — Markdown report + JSON session at .octo/factory/<run-id>/

Scoring

Dimension	Weight
Behavior Coverage	40%
Constraint Adherence	20%
Holdout Pass Rate	25%
Quality	15%

Verdicts: PASS (≥ target), WARN (≥ target − 0.05), FAIL (< target − 0.05)

On FAIL, automatically retries phases 3–4 with remediation context from failing scenarios.

What's New

7 new functions in orchestrate.sh: parse_factory_spec, generate_factory_scenarios, split_holdout_scenarios, run_holdout_tests, score_satisfaction, generate_factory_report, factory_run
/octo:factory command file + skill-factory.md enforced skill with 8-step execution contract
CLI flags: --spec, --holdout-ratio, --max-retries, --ci
49 new tests across 7 suites
OpenClaw registry updated (96 entries)

Architecture

Factory wraps embrace_full_workflow() — zero modifications to embrace. Sets AUTONOMY_MODE=autonomous + OCTOPUS_SKIP_PHASE_COST_PROMPT=true + OCTOPUS_FACTORY_MODE=true and injects visible scenarios into the prompt.

Cost

~$0.50–2.00 per run (~20–30 agent calls). Displayed upfront with approval gate unless --ci.

Full Changelog: v8.24.0...v8.25.0

nyldn/claude-octopus v8.25.0 v8.25.0 — Dark Factory Mode on GitHub