v8.7.0: Cosine Similarity & Maintenance Tools

🔧 Fixed

Cosine Similarity Migration - 0% → 70-79% Search Accuracy

Problem: Search results consistently showed 0% similarity scores due to L2 distance metric limitations.

L2 distance formula max(0, 1 - distance) returned 0 for all distances > 1.0
Semantic search completely broken - no meaningful similarity ranking
Exact match accuracy only 61% despite having correct content

Solution: Complete migration to cosine distance metric with intelligent retry logic.

Updated Vector Index - Added distance_metric=cosine to vec0 virtual table
Fixed Score Calculation - Changed to 1.0 - (distance/2.0) for proper 0-2 cosine range
Migration Strategy - Drop-and-recreate embeddings table (vec0 doesn't support ALTER)
Retry Logic - Exponential backoff (1s → 2s → 4s) for database locking
Regeneration Script - scripts/maintenance/regenerate_embeddings.py for manual recovery

Impact:

✅ 2605 embeddings successfully regenerated
✅ Search similarity scores now 70-79% (was 0%)
✅ Exact match accuracy improved to 79.2% (was 61%)
✅ Automatic migration on first startup - zero user intervention required

File: src/mcp_memory_service/storage/sqlite_vec.py:187

Dashboard Search Improvements

Fixed: Search threshold parameter was always sent even when not explicitly set by user.

Solution: Only include similarity_threshold in API payload when user explicitly configures it.

File: src/mcp_memory_service/web/static/app.js:283

🚀 Added

Maintenance Scripts Suite - 1800x Performance Boost

Complete suite of database maintenance tools with unprecedented performance improvements.

`regenerate_embeddings.py`

Purpose: Regenerate all embeddings after migrations or corruption
Performance: ~5 minutes for 2600 memories using all-MiniLM-L6-v2
Use Cases: Post-migration recovery, embedding corruption repair
Features: Progress tracking, configurable batch sizes, safe to rerun

`fast_cleanup_duplicates.sh`

Purpose: Bulk duplicate removal using direct SQL
Performance: <5 seconds for 100+ duplicates (vs 2.5 hours via API)
Speedup: 1800x faster than API-based deletion
Strategy: Stops HTTP server, runs direct SQL DELETE, restarts service
Safety: Keeps newest copy, normalizes timestamps before comparison

`find_all_duplicates.py`

Purpose: Fast duplicate detection with content normalization
Performance: <2 seconds for 2000 memories (vs 30s via API)
Speedup: 15x faster than API-based detection
Detection: MD5 hash of normalized content (removes timestamps)
Output: Detailed duplicate groups with memory counts

Comprehensive README.md

Quick reference table with performance benchmarks
Detailed usage for each script with examples
Best practices for before/after maintenance
Troubleshooting guide for common issues
Performance comparison table (API vs Direct SQL)

Location: scripts/maintenance/

📦 Installation & Upgrade

New Installations

pip install mcp-memory-service==8.7.0

# Cosine migration happens automatically on first startup
# No manual intervention required!

Upgrades from v8.6.0 or Earlier

pip install --upgrade mcp-memory-service==8.7.0

# Migration runs automatically on next startup
# Watch logs for "Cosine migration complete" message

# Optional: Manual embedding regeneration if needed
python scripts/maintenance/regenerate_embeddings.py

Post-Upgrade Verification

# Check migration status
curl -s "http://127.0.0.1:8000/api/health" | jq

# Run duplicate detection (should find 0 duplicates)
python scripts/maintenance/find_all_duplicates.py

# Search should now show 70-79% similarity scores
curl -s -X POST "http://127.0.0.1:8000/api/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "test search", "n_results": 5}'

📊 Impact Summary

Search Accuracy: 0% → 70-79% similarity scores
Exact Matches: 61% → 79.2% accuracy
Embeddings Regenerated: 2605 memories
Maintenance Performance: 1800x speedup for bulk operations
Migration Strategy: Automatic with zero downtime

📖 Documentation

Full CHANGELOG: CHANGELOG.md
Maintenance Scripts: scripts/maintenance/README.md
Wiki: Troubleshooting Guide

🔗 Links

Full Changelog: v8.6.0...v8.7.0

doobidoo/mcp-memory-service v8.7.0 v8.7.0: Cosine Similarity & Maintenance Tools on GitHub