github aiming-lab/AutoResearchClaw v0.2.0
v0.2.0 — Multi-Agent Pipeline, Docker Sandbox & Quality Hardening

latest releases: v0.5.0, v0.4.0, v0.3.2...
3 months ago

Highlights

This release introduces three multi-agent subsystems, a hardened Docker sandbox, and 4 rounds of paper quality auditing — significantly improving the end-to-end quality of generated research papers.

New Multi-Agent Subsystems

CodeAgent (4-phase architecture)

  • LLM generates multi-file experiment code (main.py + setup.py + requirements.txt)
  • Static analysis & deep validation (AST-based class/method checks)
  • LLM-guided code review with structured JSON feedback
  • Iterative repair loop (up to 3 rounds) with automatic UnboundLocalError fix

BenchmarkAgent (4 sub-agents: Surveyor → Selector → Acquirer → Validator)

  • Domain-aware dataset and baseline selection from 13-domain knowledge base
  • Automatic benchmark acquisition with Docker compatibility validation
  • Integrated at Stage 9 (experiment_design), output injected into Stage 10

FigureAgent (5 sub-agents: Planner → CodeGen → Renderer → Critic → Integrator)

  • Academic-quality chart generation with SciencePlots, 300 DPI, colorblind-safe palette
  • 6 built-in chart templates + LLM fallback for custom visualizations
  • Tri-modal critic review (data accuracy, aesthetics, academic convention)

Docker Sandbox Enhancements

  • Network-policy-aware code generation: none | setup_only | pip_only | full
  • Dynamic dependency installation via requirements.txt
  • Pre-cached datasets: CIFAR-10/100, MNIST, FashionMNIST, STL-10, SVHN
  • Extended ML stack: torch, torchvision, timm, einops, transformers, etc.

Paper Quality Hardening (4-round audit)

  • Post-compilation quality checks, weasel/duplicate word lint
  • 7-dimension AI-Scientist-style review scoring
  • AI-slop detection (50+ phrases), statistical rigor validator
  • Cross-discipline support for 7 research domains (ML/physics/chem/econ/math/eng/bio)
  • NeurIPS checklist integration

Bug Fixes (15+)

  • Fix baselines dict-to-list crash in BenchmarkAgent
  • Fix Gymnasium environment versions (v4 → v5)
  • Fix experiment condition drift in iterative refinement (anchor to exp_plan.yaml)
  • Fix compute budget constraint for experiment design
  • Fix metric direction mismatch, citation verification batching
  • Fix LaTeX output sanitization, figure plan format handling
  • Add RL stability guidance (gradient clipping, NaN guard)
  • And more — see full commit message for details

Compatibility

All changes are backward-compatible with v0.1.0 configuration files.

Full Changelog: v0.1.0...v0.2.0

Don't miss a new AutoResearchClaw release

NewReleases is sending notifications on new releases.