This release improves file editing reliability, adds session exit keywords, and fixes several issues with sub-sessions and evaluation handling.
What's New
- Adds support for "exit", "quit", and ":q" keywords to quit sessions immediately
- Adds per-eval Docker image override via evals.image property in evaluation configurations
- Adds run instructions to creator agent prompt for proper agent execution guidance
Bug Fixes
- Fixes handling of double-serialized edits argument in edit_file tool when LLMs send JSON strings instead of arrays
- Fixes sub-session thinking state being incorrectly derived from parent session instead of child agent
- Fixes --sandbox flag when running in CLI plugin mode
- Fixes cross-model Gemini function calls by using dummy thought_signature
- Fixes event timestamps for user messages in SessionFromEvents to prevent duration calculation issues
Improvements
- Displays breakdown of failure types in evaluation summary for better debugging
- Declines elicitations in run --exec --json mode
- Validates path field consistently in edit file operations
Technical Changes
- Removes unused fileWriteTracker from creator package
- Simplifies UnmarshalJSON implementation for better path validation
- Updates evaluation image build cache to handle different images per working directory
What's Changed
- docs: update CHANGELOG.md for v1.32.5 by @docker-read-write[bot] in #2147
- Better rendering in tmux and ghostty by @dgageot in #2146
- Fix --sandbox when running cli plugin mode by @gtardif in #2151
- Display breakdown of types of failures in eval summary by @gtardif in #2150
- feat: support "exit" as a keyword to quit the session by @trungutt in #2152
- Add per-eval Docker image override via evals.image property by @dgageot in #2153
- Add run instructions to creator agent prompt by @dgageot in #2154
- Decline elicitations in run --exec --json mode by @dgageot in #2156
- Remove unused fileWriteTracker from creator package by @dgageot in #2157
- fix: use dummy thought_signature for cross-model Gemini function calls by @dgageot in #2155
- fix: sub-session thinking state derived from child agent, not parent session by @dgageot in #2149
- fix: handle double-serialized edits argument in edit_file tool by @trungutt in #2144
- fix: use event timestamps for user messages in SessionFromEvents by @dgageot in #2158
Full Changelog: v1.32.5...v1.33.0