v8.7.0: Cosine Similarity & Maintenance Tools
🔧 Fixed
Cosine Similarity Migration - 0% → 70-79% Search Accuracy
Problem: Search results consistently showed 0% similarity scores due to L2 distance metric limitations.
- L2 distance formula
max(0, 1 - distance)returned 0 for all distances > 1.0 - Semantic search completely broken - no meaningful similarity ranking
- Exact match accuracy only 61% despite having correct content
Solution: Complete migration to cosine distance metric with intelligent retry logic.
- Updated Vector Index - Added
distance_metric=cosineto vec0 virtual table - Fixed Score Calculation - Changed to
1.0 - (distance/2.0)for proper 0-2 cosine range - Migration Strategy - Drop-and-recreate embeddings table (vec0 doesn't support ALTER)
- Retry Logic - Exponential backoff (1s → 2s → 4s) for database locking
- Regeneration Script -
scripts/maintenance/regenerate_embeddings.pyfor manual recovery
Impact:
- ✅ 2605 embeddings successfully regenerated
- ✅ Search similarity scores now 70-79% (was 0%)
- ✅ Exact match accuracy improved to 79.2% (was 61%)
- ✅ Automatic migration on first startup - zero user intervention required
File: src/mcp_memory_service/storage/sqlite_vec.py:187
Dashboard Search Improvements
Fixed: Search threshold parameter was always sent even when not explicitly set by user.
Solution: Only include similarity_threshold in API payload when user explicitly configures it.
File: src/mcp_memory_service/web/static/app.js:283
🚀 Added
Maintenance Scripts Suite - 1800x Performance Boost
Complete suite of database maintenance tools with unprecedented performance improvements.
regenerate_embeddings.py
- Purpose: Regenerate all embeddings after migrations or corruption
- Performance: ~5 minutes for 2600 memories using all-MiniLM-L6-v2
- Use Cases: Post-migration recovery, embedding corruption repair
- Features: Progress tracking, configurable batch sizes, safe to rerun
fast_cleanup_duplicates.sh
- Purpose: Bulk duplicate removal using direct SQL
- Performance: <5 seconds for 100+ duplicates (vs 2.5 hours via API)
- Speedup: 1800x faster than API-based deletion
- Strategy: Stops HTTP server, runs direct SQL DELETE, restarts service
- Safety: Keeps newest copy, normalizes timestamps before comparison
find_all_duplicates.py
- Purpose: Fast duplicate detection with content normalization
- Performance: <2 seconds for 2000 memories (vs 30s via API)
- Speedup: 15x faster than API-based detection
- Detection: MD5 hash of normalized content (removes timestamps)
- Output: Detailed duplicate groups with memory counts
Comprehensive README.md
- Quick reference table with performance benchmarks
- Detailed usage for each script with examples
- Best practices for before/after maintenance
- Troubleshooting guide for common issues
- Performance comparison table (API vs Direct SQL)
Location: scripts/maintenance/
📦 Installation & Upgrade
New Installations
pip install mcp-memory-service==8.7.0
# Cosine migration happens automatically on first startup
# No manual intervention required!Upgrades from v8.6.0 or Earlier
pip install --upgrade mcp-memory-service==8.7.0
# Migration runs automatically on next startup
# Watch logs for "Cosine migration complete" message
# Optional: Manual embedding regeneration if needed
python scripts/maintenance/regenerate_embeddings.pyPost-Upgrade Verification
# Check migration status
curl -s "http://127.0.0.1:8000/api/health" | jq
# Run duplicate detection (should find 0 duplicates)
python scripts/maintenance/find_all_duplicates.py
# Search should now show 70-79% similarity scores
curl -s -X POST "http://127.0.0.1:8000/api/search" \
-H "Content-Type: application/json" \
-d '{"query": "test search", "n_results": 5}'📊 Impact Summary
- Search Accuracy: 0% → 70-79% similarity scores
- Exact Matches: 61% → 79.2% accuracy
- Embeddings Regenerated: 2605 memories
- Maintenance Performance: 1800x speedup for bulk operations
- Migration Strategy: Automatic with zero downtime
📖 Documentation
- Full CHANGELOG: CHANGELOG.md
- Maintenance Scripts: scripts/maintenance/README.md
- Wiki: Troubleshooting Guide
🔗 Links
Full Changelog: v8.6.0...v8.7.0