Release v8.47.1
Critical Bug Fixes
This release fixes three critical issues discovered in the ONNX quality evaluation system:
🐛 Fixed Issues
-
Self-Match Bug - ONNX quality scores were artificially inflated to ~1.0
- Root cause: Bulk evaluation used memory content as its own query
- Fix: Generate queries from tags/metadata representing what memory is about
- Result: Realistic distribution (avg 0.468, with 42.9% high / 3.2% medium / 53.9% low)
-
Association Pollution - System-generated memories were being scored
- Root cause: No filtering for memory_type before evaluation
- Fix: Filter out 'association' and 'compressed_cluster' types
- Result: 948 system memories excluded from evaluation
-
Sync Queue Overflow - 278 Cloudflare sync failures during bulk operations
- Root cause: Queue capacity (1,000) overwhelmed by 4,478 updates
- Fix: Increased queue size to 2,000, batch size to 100, added monitoring
- Result: 0 sync failures with real-time queue draining
Changes
New Files:
scripts/quality/reset_onnx_scores.py- Reset bad ONNX scores to implicit baseline
Modified Files:
src/mcp_memory_service/config.py- Configurable HYBRID_QUEUE_SIZE (2000) and HYBRID_BATCH_SIZE (100)src/mcp_memory_service/storage/hybrid.py- Added wait_for_sync_completion() monitoring methodscripts/quality/bulk_evaluate_onnx.py- Fixed query generation and association filteringCLAUDE.md- Documented ONNX model limitations
Testing
- ✅ Reset 4,518 bad ONNX scores successfully
- ✅ Corrected bulk evaluation produced realistic distribution
- ✅ Zero sync failures with improved queue configuration
- ✅ Association filtering excluded 948 system-generated memories
Documentation
Updated CLAUDE.md with warnings about ONNX cross-encoder limitations and proper usage guidelines.
Version: 8.47.1
Type: Patch (Critical Bug Fixes)
Migration Required: No (automatic - existing bad scores remain as implicit 0.5)