github nyldn/claude-octopus v8.25.0
v8.25.0 — Dark Factory Mode

latest releases: v9.29.2, v9.29.1, v9.29.0...
one month ago

Dark Factory Mode (closes #37)

Spec-in, software-out autonomous pipeline. Feed it an NLSpec, get working software back — with blind holdout testing to verify quality.

New Command

/octo:factory --spec spec.md

Pipeline

  1. Parse Spec — Extract behaviors, constraints, satisfaction target from NLSpec
  2. Generate Scenarios — Multi-provider test scenario generation (Codex + Gemini)
  3. Split Holdout — 80/20 split with behavior-diverse holdout selection
  4. Embrace Workflow — Full 4-phase implementation (discover → define → develop → deliver) in autonomous mode
  5. Holdout Tests — Blind cross-model evaluation against withheld scenarios
  6. Satisfaction Scoring — Weighted 4-dimension assessment
  7. Report — Markdown report + JSON session at .octo/factory/<run-id>/

Scoring

Dimension Weight
Behavior Coverage 40%
Constraint Adherence 20%
Holdout Pass Rate 25%
Quality 15%

Verdicts: PASS (≥ target), WARN (≥ target − 0.05), FAIL (< target − 0.05)

On FAIL, automatically retries phases 3–4 with remediation context from failing scenarios.

What's New

  • 7 new functions in orchestrate.sh: parse_factory_spec, generate_factory_scenarios, split_holdout_scenarios, run_holdout_tests, score_satisfaction, generate_factory_report, factory_run
  • /octo:factory command file + skill-factory.md enforced skill with 8-step execution contract
  • CLI flags: --spec, --holdout-ratio, --max-retries, --ci
  • 49 new tests across 7 suites
  • OpenClaw registry updated (96 entries)

Architecture

Factory wraps embrace_full_workflow() — zero modifications to embrace. Sets AUTONOMY_MODE=autonomous + OCTOPUS_SKIP_PHASE_COST_PROMPT=true + OCTOPUS_FACTORY_MODE=true and injects visible scenarios into the prompt.

Cost

~$0.50–2.00 per run (~20–30 agent calls). Displayed upfront with approval gate unless --ci.


Full Changelog: v8.24.0...v8.25.0

Don't miss a new claude-octopus release

NewReleases is sending notifications on new releases.