AutoResearchClaw v0.5.0
Highlights
- Multi-Domain Architecture: Expanded beyond ML to support HEP Physics, Biology, Quantum Computing, and Statistics domains with profile-driven deployment
- ARC-Bench Evaluation Framework: Standardized benchmark suite with 50+ topics across 5 domains (ML01-ML25, P01-P10, Q01-Q10, B01-B07, S01-S03), rubric-based judging, and baseline adapters for AIDE, Agent Laboratory, and AI-Scientist-v2
- ColliderAgent Integration: Full HEP physics simulation pipeline support (MadGraph → Pythia → Delphes) with incremental experiment mode and Stage-12 re-entry
- Biology-Agent Integration: Metabolic modeling with COBRApy/Biopython skills, FBA simulation, and GSMM validation
- Quantum-Qiskit Skill: Qiskit-based quantum computing experiment support for quantum topics
- Statistics Domain Agent: Statistical method design, experiment evaluation, and theory analysis
- Requirements Gate: LLM capability validation before pipeline execution
- Profile-Driven Deployment: Interactive CLI for domain profile creation and management
- Incremental Experiment Mode: Resume experiments at Stage-12 with delta-prompt assembly
- Expanded Test Suite: New tests for HEP prompt hygiene, incremental experiments, and domain integrations
Breaking Changes
- Topic IDs renamed: T01-T25 → ML01-ML25 in ARC-Bench
- researchclaw/prompts.py refactored into researchclaw/prompts/ package (domain-aware prompt banks)
Documentation
- Domain Integration Guide for adding new scientific domains
- Tester guides in English, Chinese, and Japanese
- ARC-Bench experiment design docs and run guides
- Showcase papers demonstrating pipeline outputs
Full Changelog: v0.4.0...v0.5.0