github thedotmack/claude-mem v10.0.3

4 hours ago

Fix: Prevent chroma-mcp spawn storm (PR #1065)

Fixes a critical bug where killing the worker daemon during active sessions caused 641 chroma-mcp Python processes to spawn in ~5 minutes, consuming 75%+ CPU and ~64GB virtual memory.

Root Cause

ChromaSync.ensureConnection() had no connection mutex. Concurrent fire-and-forget syncObservation() calls from multiple sessions raced through the check-then-act guard, each spawning a chroma-mcp subprocess via StdioClientTransport. Error-driven reconnection created a positive feedback loop.

5-Layer Defense

Layer Mechanism Purpose
0 Connection mutex via promise memoization Coalesces concurrent callers onto a single spawn attempt
1 Pre-spawn process count guard (execFileSync('ps')) Kills excess chroma-mcp processes before spawning new ones
2 Hardened close() with try-finally + Unix pkill -P fallback Guarantees state reset even on error, kills orphaned children
3 Count-based orphan reaper in ProcessManager Kills by count (not age), catches spawn storms where all processes are young
4 Circuit breaker (3 failures → 60s cooldown) Stops error-driven reconnection positive feedback loop

Additional Fix

  • Process guards now use etime-based sorting instead of PID ordering for reliable age determination (PIDs wrap and don't guarantee ordering)

Testing

  • 16 new tests for mutex, circuit breaker, close() hardening, and count guard
  • All tests pass (947 pass, 3 skip)

Closes #1063, closes #695. Relates to #1010, #707.

Contributors: @rodboev

Don't miss a new claude-mem release

NewReleases is sending notifications on new releases.