v2.1.0 — Complete Rebuild: Modular Architecture
Everything in this release has been rewritten from scratch. The entire codebase, all documentation, all guides, and the distribution system have been rebuilt from the ground up for v2.1.0.
Why a Complete Rewrite?
The v2.0.x monolithic SKILL.md (813 lines, ~100K tokens) loaded on every single invocation — even for a quick /autoresearch:plan. This was unsustainable:
- Token waste: ~100K tokens per invocation regardless of which command you used
- Slow cold starts: LLMs had to parse 813 lines before doing anything useful
- Tangled protocols: 13 reference files with overlapping, hard-to-maintain instructions
- Fragile distribution: Separate
sync-opencode.shandsync-codex.shscripts that frequently drifted
v2.1.0 solves all of this with a ground-up architectural rebuild.
Architecture: Monolith → Modular
Before (v2.0.x)
SKILL.md (813 lines, ~100K tokens) ← loaded EVERY invocation
├── 13 reference files (overlapping workflows)
├── autoresearch-command-spec.json (JSON command registry)
├── autoresearch_cli.py (Python wrapper)
├── sync-opencode.sh (manual sync)
└── sync-codex.sh (manual sync)
After (v2.1.0)
SKILL.md (41 lines, ~2K tokens) ← thin routing table only
├── 12 self-contained command files (~100 lines each, ~5-8K tokens)
├── 3 focused reference files (loaded only when needed)
└── scripts/transform.sh (single multi-platform transform)
Result: ~95% token reduction per invocation. Only the command you invoke gets loaded. A /autoresearch:plan call loads ~5K tokens instead of ~100K.
12 Commands (was 11 — evals is new)
Every command has been rewritten from scratch with bounded defaults, universal flags, and chain handoff support.
| Command | Type | Default | Purpose |
|---|---|---|---|
/autoresearch
| Loop | 25 iterations | Core metric-driven improvement loop |
/autoresearch:plan
| One-shot | — | Goal → config wizard (generates loop config) |
/autoresearch:debug
| Loop | 15 iterations | Hypothesis-driven bug investigation |
/autoresearch:fix
| Loop | 20 iterations | Error-count reduction (error → zero) |
/autoresearch:security
| Loop | 15 iterations | STRIDE + OWASP security audit with auto-fix |
/autoresearch:ship
| Linear | 8 phases | Pre-merge quality pipeline |
/autoresearch:scenario
| Loop | 20 iterations | 12-dimension edge case exploration |
/autoresearch:predict
| One-shot | — | 5-persona swarm analysis + debate |
/autoresearch:learn
| Loop | 10 iterations | Autonomous documentation engine |
/autoresearch:reason
| Loop | 8 iterations | Adversarial refinement (judge + critic) |
/autoresearch:probe
| Loop | 15 rounds | Requirement interrogation (8 personas) |
/autoresearch:evals
| One-shot (NEW) | — | TSV analysis: trends, plateaus, anomalies |
New: /autoresearch:evals
One-shot analysis of any *-results.tsv file. Dynamically detects TSV columns, identifies trends, plateaus, velocity changes, regressions, and diminishing returns. Adaptive checkpoints at floor(max_iterations/3) provide mid-loop feedback during long runs. Backward compatible with v2.0.x TSV format.
Can also be invoked inline on any looping command via --evals or --evals-interval N flags.
Bounded Defaults
Every looping command now has a sensible default iteration count. No more "runs forever unless you stop it":
Iterations: N— explicit capIterations: unlimited— opt-in infinite mode (you must ask for it)- Default values are tuned per command: core loop (25) needs more iterations than reason (8)
Chain Handoff via handoff.json
Commands produce structured handoff.json files that downstream commands consume automatically:
/autoresearch:predict --chain debug,fix,ship
This runs predict → passes findings to debug → passes root causes to fix → passes changes to ship. Zero manual context transfer between commands.
Universal Flags (All Commands)
| Flag | Purpose |
|---|---|
Iterations: N
| Override default iteration count |
Iterations: unlimited
| Remove iteration cap |
--evals
| Run evals checkpoint at floor(N/3)
|
--evals-interval N
| Custom evals checkpoint interval |
--chain <targets>
| Chain to downstream command(s) on completion |
Multi-Platform Support
All three platforms supported from a single canonical source:
| Platform | Syntax | Distribution |
|---|---|---|
| Claude Code | /autoresearch:debug
| .claude/commands/ + .claude/skills/
|
| OpenCode | /autoresearch_debug
| .opencode/commands/ + .opencode/skills/
|
| Codex | $autoresearch debug
| plugins/autoresearch/ + .agents/skills/
|
Single transform script: scripts/transform.sh replaces the old sync-opencode.sh + sync-codex.sh pair. One command generates all platform distributions.
Documentation — Completely Rewritten
Every documentation file has been rewritten from scratch for v2.1.0:
README.md (rewritten)
- v2.1.0 badge, 12-command table with defaults
- Updated architecture tree, FAQ, install instructions
- Evals section, bounded defaults explanation
docs/ (7 files — all rewritten)
system-architecture.md— Mermaid component + data flow diagrams for v2.1.0 modular architecturecodebase-summary.md— updated file inventory, key decisions tablecode-standards.md— self-contained command file pattern, naming conventionsproject-overview-pdr.md— product requirements for v2.1.0development-roadmap.md— historical milestones through v2.1.0changelog.md— full version historyproject-changelog.md— detailed change log for v2.1.0
guide/ (18 files — all rewritten or new)
README.md— v2.1.0 guide index with 12 commandsgetting-started.md— all 3 platforms, bounded defaults, chain handoff- Individual command guides (12 files) — each rewritten with flags tables, examples, chain patterns
autoresearch-evals.md— brand new guide for the evals commandchains-and-combinations.md— handoff.json protocol, all 12 commandsexamples-by-domain.md— 13 domains with v2.1.0 syntaxadvanced-patterns.md— guards, CI/CD, MCP, transform.shautoresearch-codex.md— v2.1.0 Codex distribution (no more JSON spec)
Other docs (rewritten)
CONTRIBUTING.md— v2.1.0 repo structure, new command pattern, transform.shCOMPARISON.md— 12 commands, evals feature, updated architecture comparison
Files Removed
| File | Reason |
|---|---|
plugins/autoresearch/resources/autoresearch-command-spec.json
| Command contracts now live in individual command files |
plugins/autoresearch/scripts/autoresearch_cli.py
| Python wrapper CLI no longer needed |
plugins/autoresearch/scripts/install_local_plugin.py
| Replaced by scripts/install.sh
|
scripts/sync-opencode.sh
| Replaced by scripts/transform.sh
|
scripts/sync-codex.sh
| Replaced by scripts/transform.sh
|
13 old reference files in claude-plugin/
| Replaced by 3 focused reference files |
Reference Files: 13 → 3
The old 13 workflow reference files (autonomous-loop-protocol, core-principles, debug-workflow, fix-workflow, learn-workflow, plan-workflow, predict-workflow, probe-workflow, reason-workflow, results-logging, scenario-workflow, security-workflow, ship-workflow) have been replaced by 3 focused files that are only loaded when their specific command needs them:
| File | Used by |
|---|---|
references/security-checklist.md
| /autoresearch:security
|
references/predict-personas.md
| /autoresearch:predict
|
references/reason-judge-protocol.md
| /autoresearch:reason
|
All other protocol is embedded directly in the self-contained command files — no external loading needed.
Install
npx skills add uditgoenka/autoresearchOr manual install via scripts/install.sh.
Migration from v2.0.x
No action needed — the plugin system handles the update automatically. Your existing TSV result files are backward compatible with the new evals command.
If you have custom scripts referencing old files (autoresearch-command-spec.json, sync-opencode.sh, sync-codex.sh), update them to use scripts/transform.sh and the individual command files.