What new
- Reproducible benchmark system —
benchmarks/run.pycall Claude API, measure real output token counts normal vs caveman, auto-update README table. No more fake numbers. - Real benchmark data — 10 coding prompts, actual API measurements. Average 65% token savings (range 22%–87%).
- Codex plugin support — caveman now work in OpenAI Codex too.
- Contributing guide + issue templates for bug reports and feature requests.
Run benchmarks yourself
cd benchmarks
pip install -r requirements.txt
python run.py --dry-run # preview, no API calls
python run.py --update-readme # run + update README tableFull Changelog: v1.0.0...v1.1.0