The 3-Month Battle Against Complexity
TL;DR: For three months, Claude's instinct to add code instead of delete it caused the same bugs to recur. What should have been 5 lines of code became ~1000 lines, 11 useless methods, and 7+ failed "fixes." The timestamp corruption that finally broke things was just a symptom. The real achievement: 984 lines of code deleted.
What Actually Happened
Every Claude Code hook receives a session ID. That's all you need.
But Claude built an entire redundant session management system on top:
- An
sdk_sessionstable with status tracking, port assignment, and prompt counting - 11 methods in
SessionStoreto manage this artificial complexity - Auto-creation logic scattered across 3 locations
- A cleanup hook that "completed" sessions at the end
Why? Because it seemed "robust." Because "what if the session doesn't exist?"
But the edge cases didn't exist. Hooks ALWAYS provide session IDs. The "defensive" code was solving imaginary problems while creating real ones.
The Pattern of Failure
Every time a bug appeared, Claude's instinct was to ADD more code:
| Bug | What Claude Added | What Should Have Happened |
|---|---|---|
| Race conditions | Auto-create fallbacks | Delete the auto-create logic |
| Duplicate observations | Validation layers | Delete the code path allowing duplicates |
| UNIQUE constraint violations | Try-catch with fallbacks | Use INSERT OR IGNORE (5 characters)
|
| Session not found | Silent auto-creation | FAIL LOUDLY (it's a hook bug) |
The 7+ Failed Attempts
- Nov 4: "Always store session data regardless of pre-existence." Complexity planted.
- Nov 11:
INSERT OR IGNORErecognized. But complexity documented, not removed. - Nov 21: Duplicate observations bug. Fixed. Then broken again by endless mode.
- Dec 5: "6 hours of work delivered zero value." User requests self-audit.
- Dec 20: "Phase 2: Eliminated Race Conditions" — felt like progress. Complexity remained.
- Dec 24: Finally, forced deletion.
The user stated "hooks provide session IDs, no extra management needed" seven times across months. Claude didn't listen.
The Fix
Deleted (984 lines):
- 11
SessionStoremethods:incrementPromptCounter,getPromptCounter,setWorkerPort,getWorkerPort,markSessionCompleted,markSessionFailed,reactivateSession,findActiveSDKSession,findAnySDKSession,updateSDKSessionId - Auto-create logic from
storeObservationandstoreSummary - The entire cleanup hook (was aborting SDK agent and causing data loss)
- 117 lines from
worker-utils.ts
What remains (~10 lines):
createSDKSession(sessionId) {
db.run('INSERT OR IGNORE INTO sdk_sessions (...) VALUES (...)');
return db.query('SELECT id FROM sdk_sessions WHERE ...').get(sessionId);
}That's it.
Behavior Change
- Before: Missing session? Auto-create silently. Bug hidden.
- After: Missing session? Storage fails. Bug visible immediately.
New Tools
Since we're now explicit about recovery instead of silently papering over problems:
GET /api/pending-queue- See what's stuckPOST /api/pending-queue/process- Manually trigger recoverynpm run queue:check/npm run queue:process- CLI equivalents
Dependencies
- Upgraded
@anthropic-ai/claude-agent-sdkfrom^0.1.67to^0.1.76
The evidence: Observations #3646, #6738, #7598, #12860, #12866, #13046, #15259, #20995, #21055, #30524, #31080, #32114, #32116, #32125, #32126, #32127, #32146, #32324—the complete record of a 3-month battle.