v2.1.0 — Complete Rebuild: Modular Architecture

Everything in this release has been rewritten from scratch. The entire codebase, all documentation, all guides, and the distribution system have been rebuilt from the ground up for v2.1.0.

Why a Complete Rewrite?

The v2.0.x monolithic SKILL.md (813 lines, ~100K tokens) loaded on every single invocation — even for a quick /autoresearch:plan. This was unsustainable:

Token waste: ~100K tokens per invocation regardless of which command you used
Slow cold starts: LLMs had to parse 813 lines before doing anything useful
Tangled protocols: 13 reference files with overlapping, hard-to-maintain instructions
Fragile distribution: Separate sync-opencode.sh and sync-codex.sh scripts that frequently drifted

v2.1.0 solves all of this with a ground-up architectural rebuild.

Architecture: Monolith → Modular

Before (v2.0.x)

SKILL.md (813 lines, ~100K tokens) ← loaded EVERY invocation
├── 13 reference files (overlapping workflows)
├── autoresearch-command-spec.json (JSON command registry)
├── autoresearch_cli.py (Python wrapper)
├── sync-opencode.sh (manual sync)
└── sync-codex.sh (manual sync)

After (v2.1.0)

SKILL.md (41 lines, ~2K tokens) ← thin routing table only
├── 12 self-contained command files (~100 lines each, ~5-8K tokens)
├── 3 focused reference files (loaded only when needed)
└── scripts/transform.sh (single multi-platform transform)

Result: ~95% token reduction per invocation. Only the command you invoke gets loaded. A /autoresearch:plan call loads ~5K tokens instead of ~100K.

12 Commands (was 11 — evals is new)

Every command has been rewritten from scratch with bounded defaults, universal flags, and chain handoff support.

Command	Type	Default	Purpose
`/autoresearch`	Loop	25 iterations	Core metric-driven improvement loop
`/autoresearch:plan`	One-shot	—	Goal → config wizard (generates loop config)
`/autoresearch:debug`	Loop	15 iterations	Hypothesis-driven bug investigation
`/autoresearch:fix`	Loop	20 iterations	Error-count reduction (error → zero)
`/autoresearch:security`	Loop	15 iterations	STRIDE + OWASP security audit with auto-fix
`/autoresearch:ship`	Linear	8 phases	Pre-merge quality pipeline
`/autoresearch:scenario`	Loop	20 iterations	12-dimension edge case exploration
`/autoresearch:predict`	One-shot	—	5-persona swarm analysis + debate
`/autoresearch:learn`	Loop	10 iterations	Autonomous documentation engine
`/autoresearch:reason`	Loop	8 iterations	Adversarial refinement (judge + critic)
`/autoresearch:probe`	Loop	15 rounds	Requirement interrogation (8 personas)
`/autoresearch:evals`	One-shot (NEW)	—	TSV analysis: trends, plateaus, anomalies

New: `/autoresearch:evals`

One-shot analysis of any *-results.tsv file. Dynamically detects TSV columns, identifies trends, plateaus, velocity changes, regressions, and diminishing returns. Adaptive checkpoints at floor(max_iterations/3) provide mid-loop feedback during long runs. Backward compatible with v2.0.x TSV format.

Can also be invoked inline on any looping command via --evals or --evals-interval N flags.

Bounded Defaults

Every looping command now has a sensible default iteration count. No more "runs forever unless you stop it":

Iterations: N — explicit cap
Iterations: unlimited — opt-in infinite mode (you must ask for it)
Default values are tuned per command: core loop (25) needs more iterations than reason (8)

Chain Handoff via `handoff.json`

Commands produce structured handoff.json files that downstream commands consume automatically:

/autoresearch:predict --chain debug,fix,ship

This runs predict → passes findings to debug → passes root causes to fix → passes changes to ship. Zero manual context transfer between commands.

Universal Flags (All Commands)

Flag	Purpose
`Iterations: N`	Override default iteration count
`Iterations: unlimited`	Remove iteration cap
`--evals`	Run evals checkpoint at `floor(N/3)`
`--evals-interval N`	Custom evals checkpoint interval
`--chain <targets>`	Chain to downstream command(s) on completion

Multi-Platform Support

All three platforms supported from a single canonical source:

Platform	Syntax	Distribution
Claude Code	`/autoresearch:debug`	`.claude/commands/` + `.claude/skills/`
OpenCode	`/autoresearch_debug`	`.opencode/commands/` + `.opencode/skills/`
Codex	`$autoresearch debug`	`plugins/autoresearch/` + `.agents/skills/`

Single transform script: scripts/transform.sh replaces the old sync-opencode.sh + sync-codex.sh pair. One command generates all platform distributions.

Documentation — Completely Rewritten

Every documentation file has been rewritten from scratch for v2.1.0:

README.md (rewritten)

v2.1.0 badge, 12-command table with defaults
Updated architecture tree, FAQ, install instructions
Evals section, bounded defaults explanation

docs/ (7 files — all rewritten)

system-architecture.md — Mermaid component + data flow diagrams for v2.1.0 modular architecture
codebase-summary.md — updated file inventory, key decisions table
code-standards.md — self-contained command file pattern, naming conventions
project-overview-pdr.md — product requirements for v2.1.0
development-roadmap.md — historical milestones through v2.1.0
changelog.md — full version history
project-changelog.md — detailed change log for v2.1.0

guide/ (18 files — all rewritten or new)

README.md — v2.1.0 guide index with 12 commands
getting-started.md — all 3 platforms, bounded defaults, chain handoff
Individual command guides (12 files) — each rewritten with flags tables, examples, chain patterns
autoresearch-evals.md — brand new guide for the evals command
chains-and-combinations.md — handoff.json protocol, all 12 commands
examples-by-domain.md — 13 domains with v2.1.0 syntax
advanced-patterns.md — guards, CI/CD, MCP, transform.sh
autoresearch-codex.md — v2.1.0 Codex distribution (no more JSON spec)

Other docs (rewritten)

CONTRIBUTING.md — v2.1.0 repo structure, new command pattern, transform.sh
COMPARISON.md — 12 commands, evals feature, updated architecture comparison

Files Removed

File	Reason
`plugins/autoresearch/resources/autoresearch-command-spec.json`	Command contracts now live in individual command files
`plugins/autoresearch/scripts/autoresearch_cli.py`	Python wrapper CLI no longer needed
`plugins/autoresearch/scripts/install_local_plugin.py`	Replaced by `scripts/install.sh`
`scripts/sync-opencode.sh`	Replaced by `scripts/transform.sh`
`scripts/sync-codex.sh`	Replaced by `scripts/transform.sh`
13 old reference files in `claude-plugin/`	Replaced by 3 focused reference files

Reference Files: 13 → 3

The old 13 workflow reference files (autonomous-loop-protocol, core-principles, debug-workflow, fix-workflow, learn-workflow, plan-workflow, predict-workflow, probe-workflow, reason-workflow, results-logging, scenario-workflow, security-workflow, ship-workflow) have been replaced by 3 focused files that are only loaded when their specific command needs them:

File	Used by
`references/security-checklist.md`	`/autoresearch:security`
`references/predict-personas.md`	`/autoresearch:predict`
`references/reason-judge-protocol.md`	`/autoresearch:reason`

All other protocol is embedded directly in the self-contained command files — no external loading needed.

Install

npx skills add uditgoenka/autoresearch

Or manual install via scripts/install.sh.

Migration from v2.0.x

No action needed — the plugin system handles the update automatically. Your existing TSV result files are backward compatible with the new evals command.

If you have custom scripts referencing old files (autoresearch-command-spec.json, sync-opencode.sh, sync-codex.sh), update them to use scripts/transform.sh and the individual command files.

Full Changelog

v2.0.04...v2.1.0

uditgoenka/autoresearch v2.1.0 v2.1.0 — Complete Rebuild: Modular Architecture on GitHub