Telemetry Reliability Signals (Plan 14)
claude-mem instrumented success well — failure was invisible. This release adds the five highest-value missing reliability signals (#2874). Everything is closed-enum/count-only, whitelisted in the scrubber, and disclosed in both the public docs and claude-mem telemetry.
Search retrieval quality (search_performed)
result_count,search_strategy(chroma|fts|filter_only),chroma_available,fallback_reason(none|chroma_connection|chroma_error|chroma_not_initialized)- Zero-result rate is now computable, and Chroma's silent degradation to FTS is visible.
Compression trust (session_compressed)
fabrication_detected/fabricated_count— commit-hash fabrication by the observer model, on every emit path- Respawn-gated invalid-output events:
invalid_output_class(xml|idle|prose|poisoned),consecutive_invalid_outputs,respawn_triggered outcome: abortedwithabort_reason(idle|shutdown|overflow|restart_guard|quota|poisoned|none), emitted where all abort flows converge
Worker lifecycle
- New
worker_stoppedevent:uptime_seconds,shutdown_reason(stop|restart|signal) - Crash detection via clean-shutdown sentinel:
worker_startednow reportsprevious_shutdown(crash|clean|unknown) andprevious_uptime_seconds - Memory health: integer
process_rss_mb/heap_used_mbon lifecycle events and the heartbeat
Hook failures
- New
hook_failedevent over the direct CLI transport (the worker being unreachable IS the failure being reported), threshold-gated on the fail-loud counter and awaited before process exit so events survive short-lived hook processes
Fixes
- CI: the PostHog
disableGeoipregression test was order-dependent and failed full-suite runs (CI on main had been red since v13.5.4).posthog-nodeis now mocked globally via a bun test preload — which also guarantees test runs can never construct a real PostHog client and flush fabricated events into production analytics. - Windows-managed shutdown IPC now forwards the restart reason for
shutdown_reasonfidelity.