GPT-5.5 Agents: Manual QA Gate, Stronger Investigation, Apply-Patch Fix
A focused tune-up to the GPT-5.5 prompts driving Sisyphus, Sisyphus-Junior, and Hephaestus. If you run any of those agents on a GPT-5.5 model, this release tightens their behavior in five ways you'll notice in practice.
Manual QA Gate: agents have to actually USE what they build
End-to-end delegations on GPT-5.5 (e.g., ulw, "implement and finish", "ship it") now route through a non-negotiable surface-to-tool mapping:
| Surface | Required tool |
|---|---|
| TUI / CLI | interactive_bash (tmux)
|
| Web / browser | playwright
|
| HTTP service | curl against the running service
|
| Library / SDK | minimal driver script |
"Tests pass + lsp clean + build green" is no longer enough. Agents must drive the deliverable through the matching tool before declaring done. This closes the failure mode where a GPT-5.5 agent reports "implementation complete" without ever launching the binary or loading the page.
Investigate-before-acting is no longer a soft phrase
Bumped from a one-liner to a dedicated block: never speculate about unread code, re-read on every task hand-off, and treat the worktree as potentially mutated by parallel agents. If you've seen GPT-5.5 reason about a file it didn't open, this is the fix.
Parallelize-aggressively is now a first-class behavior
Reads, searches, diagnostics, and background sub-agents are expected to batch into a single response by default. Sequential tool calls when the work is independent now stand out as a violation rather than the norm.
apply_patch ↔ permission contradiction resolved
Earlier prompts told GPT-5.5 to use apply_patch while the platform-level permission denies it on GPT models. The agent now reaches for edit / write directly via GPT_APPLY_PATCH_GUIDANCE, removing the contradiction that was triggering tool-denial loops.
Hard invariants & dig-deeper trio restored
- Sisyphus now carries explicit hard-invariant blocks: no
as any/@ts-ignore, no destructive git without confirmation, never deliver before Oracle returns. - Sisyphus-Junior gains a review-tasks block plus a sensible default-behavior fallback when category context is missing or sparse.
- Hephaestus regains optional category delegation while keeping direct execution as the default.
- The dig-deeper trio (tool persistence / dig deeper / dependency checks) is split back into orthogonal paragraphs so each one carries its own cognitive trigger instead of being fused into a single sentence.
No config changes, no migration steps. Update and the new behavior takes effect on the next GPT-5.5 agent run.